[BioPython] error on insert new sequences from GenBank: no annotations saved in BioSQL database

Eric Gibert ericgibert at yahoo.fr
Sat Nov 10 06:16:40 EST 2007


Dear Peter,

 

My problem is not that I do not have entries in the tables but it is that no
interpretation on the feature is perform.

 

Example:

In the tutorial and in BioJava, 'source' is an annotation:

 

      # from the Biopython Tutorial and Cookbook

      print "from: %s" % seq_record.annotations['source']

 

This returns a "KeyError: 'source'"

 

On the other hand, after some tinkering, I found that I can have a a feature
from the list Seq.features with the type='source' which contains a
"qualifiers['organism']"... Quite cumbersome.

 

But maybe there is another way, more straight forward, that I did not find.
Can you tell me?

 

 

--------  For you information, I went thru the tables of my BioSQL database:

 

Here are my findings with BioPython insertion in BioSQL using
"myDataBase.load(list_of_seq)":

 

(note: one test sequence was fetch by GenBank.download_many() and the other
using GenBank.NCBIDictionary)

 

1) table bioentry: all column populated except for 'taxon_id' which is NULL
(maybe I need an extra call for populating the 'taxon' table before?)

 

(FYI BioJava sequences are not filling all columns correctly)

 

2) table bioentry_dbxref: no data inserted (always empty, even with BioJava)

 

3) table bioentry_qualifier_value:

One entry only, for the 'term_id' = 149, rank = 1, and value = '07-JUL-2005'
or other 'DD-MMM-YYYY' dates (see my remarks below)

 

4) table bioentry_reference: two records per sequence with reference _id
correctly mapping the 'reference' table, rank, start_pos and end_pos also
correctly filled

 

5) table bioentry_relationships: no entry found (always empty, even with
BioJava)

 

6) table biosequence: one entry per seq, the 'seq' field is correct. Note:
the 'version' is set to 0 whereas it should be 1... (length is correct and
we have "dna" is lower case :-) )

 

7) table comment: no entry found (always empty, even with BioJava)

 

8) table dbxref: some records are generated, for dbname 'PUBMED' and 'Taxon'
with the correct value

(FYI: I think that my BioJava is not managing this table...)

 

9) table dbxref_qualifier_value: (always empty, even with BioJava)

 

10) table location: all locations loaded correctly, note that 'term_id' and
'dbxref_id' remain NULL for these seq but I have value for other seq.

 

11) table location_qualifier_value: always empty, even with BioJava

 

12) table ontology: some rows but not related to the sequences

 

13) Table reference: entries correct, note 'dbxref_id' remains NULL for
these seq but I have value for other seq.

 

14) table seqfeature: entries are there (same as in table 'location').
FYI:'display_name is always NULL.

 

15) table seqfeature_dbxref: always empty, even with BioJava

 

16) table seqfeature_qualifier_value: filled correctly

 

17) table seqfeature_relationship: always empty, even with BioJava

 

18) table taxon: always empty, even with BioJava)

 

19) table taxon_name: I have one but not from this test (I tried to tinker a
little bit with taxon but stopped)

 

20) table term: always empty, even with BioJava

 

21) table term_dbxref: always empty, even with BioJava

 

22) table term_relationship_term: have some entries

 

23) table term_synonym: always empty, even with BioJava

 

------------

 

Thank you

 

Eric

 

 

 

 

-----Original Message-----
From: Peter [mailto:biopython at maubp.freeserve.co.uk] 
Sent: Friday, November 09, 2007 3:19 AM
To: Eric Gibert
Cc: biopython at lists.open-bio.org
Subject: Re: [BioPython] error on insert new sequences from GenBank: no
annotations saved in BioSQL database

 

Eric Gibert wrote:

> Once I look in the table bioentry_qualifier_value

> 

> * 20 records for a Sequence imported by BioJava

> * 1 only for a Sequence inserted by BioPython: the date which should be
inserted by "_load_bioentry_date" in BioSQL/Loader.py

> 

> Quite a few annotations missing, no? 

> 

> Any idea?

 

So Biopython is recording nothing in table bioentry_qualifier_value 

(apart from the date), but is recording other essential things (in other 

tables) like the sequence itself?

 

Could you double check your schema, as from the issue I filed as bug 

2394 based on your earlier email, your schema doesn't seem to be up to date:

 

http://bugzilla.open-bio.org/show_bug.cgi?id=2394

 

Peter



More information about the BioPython mailing list