[BioPython] error on insert new sequences from GenBank: no annotations saved in BioSQL database
Eric Gibert
ericgibert at yahoo.fr
Sat Nov 10 06:16:40 EST 2007
Dear Peter,
My problem is not that I do not have entries in the tables but it is that no
interpretation on the feature is perform.
Example:
In the tutorial and in BioJava, 'source' is an annotation:
# from the Biopython Tutorial and Cookbook
print "from: %s" % seq_record.annotations['source']
This returns a "KeyError: 'source'"
On the other hand, after some tinkering, I found that I can have a a feature
from the list Seq.features with the type='source' which contains a
"qualifiers['organism']"... Quite cumbersome.
But maybe there is another way, more straight forward, that I did not find.
Can you tell me?
-------- For you information, I went thru the tables of my BioSQL database:
Here are my findings with BioPython insertion in BioSQL using
"myDataBase.load(list_of_seq)":
(note: one test sequence was fetch by GenBank.download_many() and the other
using GenBank.NCBIDictionary)
1) table bioentry: all column populated except for 'taxon_id' which is NULL
(maybe I need an extra call for populating the 'taxon' table before?)
(FYI BioJava sequences are not filling all columns correctly)
2) table bioentry_dbxref: no data inserted (always empty, even with BioJava)
3) table bioentry_qualifier_value:
One entry only, for the 'term_id' = 149, rank = 1, and value = '07-JUL-2005'
or other 'DD-MMM-YYYY' dates (see my remarks below)
4) table bioentry_reference: two records per sequence with reference _id
correctly mapping the 'reference' table, rank, start_pos and end_pos also
correctly filled
5) table bioentry_relationships: no entry found (always empty, even with
BioJava)
6) table biosequence: one entry per seq, the 'seq' field is correct. Note:
the 'version' is set to 0 whereas it should be 1... (length is correct and
we have "dna" is lower case :-) )
7) table comment: no entry found (always empty, even with BioJava)
8) table dbxref: some records are generated, for dbname 'PUBMED' and 'Taxon'
with the correct value
(FYI: I think that my BioJava is not managing this table...)
9) table dbxref_qualifier_value: (always empty, even with BioJava)
10) table location: all locations loaded correctly, note that 'term_id' and
'dbxref_id' remain NULL for these seq but I have value for other seq.
11) table location_qualifier_value: always empty, even with BioJava
12) table ontology: some rows but not related to the sequences
13) Table reference: entries correct, note 'dbxref_id' remains NULL for
these seq but I have value for other seq.
14) table seqfeature: entries are there (same as in table 'location').
FYI:'display_name is always NULL.
15) table seqfeature_dbxref: always empty, even with BioJava
16) table seqfeature_qualifier_value: filled correctly
17) table seqfeature_relationship: always empty, even with BioJava
18) table taxon: always empty, even with BioJava)
19) table taxon_name: I have one but not from this test (I tried to tinker a
little bit with taxon but stopped)
20) table term: always empty, even with BioJava
21) table term_dbxref: always empty, even with BioJava
22) table term_relationship_term: have some entries
23) table term_synonym: always empty, even with BioJava
------------
Thank you
Eric
-----Original Message-----
From: Peter [mailto:biopython at maubp.freeserve.co.uk]
Sent: Friday, November 09, 2007 3:19 AM
To: Eric Gibert
Cc: biopython at lists.open-bio.org
Subject: Re: [BioPython] error on insert new sequences from GenBank: no
annotations saved in BioSQL database
Eric Gibert wrote:
> Once I look in the table bioentry_qualifier_value
>
> * 20 records for a Sequence imported by BioJava
> * 1 only for a Sequence inserted by BioPython: the date which should be
inserted by "_load_bioentry_date" in BioSQL/Loader.py
>
> Quite a few annotations missing, no?
>
> Any idea?
So Biopython is recording nothing in table bioentry_qualifier_value
(apart from the date), but is recording other essential things (in other
tables) like the sequence itself?
Could you double check your schema, as from the issue I filed as bug
2394 based on your earlier email, your schema doesn't seem to be up to date:
http://bugzilla.open-bio.org/show_bug.cgi?id=2394
Peter
More information about the BioPython
mailing list