[BioPython] error on insert new sequences from GenBank: no annotations saved in BioSQL database

Eric Gibert ericgibert at yahoo.fr
Fri Nov 9 08:35:12 EST 2007

Dear Hilmar,

Thank you for this reply. Now I would like to know where BioPythin has
stored "SOURCE" or "ORGANISM" in BioSQL? I cannot find them.

Then, supposing they are somewhere, how can I get them back?

Thank you


-----Original Message-----
From: Hilmar Lapp [mailto:hlapp at gmx.net] 
Sent: Friday, November 09, 2007 4:28 AM
To: Eric Gibert
Cc: biopython at lists.open-bio.org; BioJava
Subject: Re: [BioPython] error on insert new sequences from GenBank: no
annotations saved in BioSQL database

Maybe we need to hold some mini-hackathon to make the different  
toolkits compatible in how they map annotation to the schema.  
Obviously I don't know whether you have the latest Biojava setup  
here, but I'll just comment how BioPerl/Bioperl-db would map this:

'ORIGIN' - if I'm not mistaken this is only a token that introduces  
the actual sequence. I'm not sure what Biojava is storing as value here.

'DIVISION' - this maps to column division in table bioentry (though I  
agree that if  perfectly following the weak typing principle this  
should be tag/value association, but at present it's still an actual  

'genbank_accessions' - secondary accession numbers indeed go into the  
qualifier value table. The primary accession maps to column accession  
in table bioentry

'TITLE' - this is part of a publication reference, and should map to  
column title in table reference (which it does in bioperl-db)

'cross_references' - not sure where these would be coming from in  
GenBank format; for EMBL this will map to the dbxref table

'data_file_division' - not sure what this is (same as DIVISION?)

'VERSION' - in BioPerl we parse this apart into a version for the  
accession (which is column version in table bioentry) and the GI  
number, which maps to column identifier in table bioentry

'references' - these map to table reference (and bioentry_reference  
for association with the bioentry)

'KEYWORDS' - indeed these map to bioentry_qualifier_value

'GI' - maps to column identifier in table bioentry

'SIZE' - not sure what size that is. If it is the length of the  
sequence, it should (and in BioPerl/bioperl-db does) map to column  
length in table biosequence

'DEFINITION' - maps to column description in table bioentry

'REFERENCE' - should be the same as for 'references'

'MDAT' - not sure what this is

'ORGANISM' - this is the organism and maps to the table taxon (and  
taxon_name), with a foreign key in bioentry pointing to the taxon

'JOURNAL' - this is part of a reference, see 'references'

'ACCESSION' - the primary accession, maps to column accession in  
table bioentry

'LOCUS' - in the file itself this is an entire line consisting of  
multiple fields; BioPerl/bioperl-db maps the locus name (the first  
token after the literal token LOCUS) to column name in table bioentry

'SOURCE' - this is the organism, see 'ORGANISM'

'PUBMED' - this is part of a literature reference, and maps to a  
foreign key in the reference table (reference.dbxref) to a dbxref  
entry with PUBMED or PMID as the database and the pubmed ID as the  

'AUTHORS' - part of a literature reference, maps to column authors in  
table reference

'TYPE' - not sure what this is. If it's the alphabet, it maps to  
table biosequence, column alphabet

'CIRCULAR' - this at present indeed maps to bioentry_qualifier_value,  
though there have been plans to make it a column in table biosequence.

Note that this could in fact be the way Biojava stores it too, but  
upon retrieval represents it in the way you are seeing it.



On Nov 8, 2007, at 12:50 PM, Eric Gibert wrote:

> Dear all,
> When I retrieve a BioSQL.BioSeq.DBSeqRecord which was inserted  
> previously by my BioJava application, I have:
> print "Debug on Seq:", Seq.id, "=", Seq.annotations.keys()
> Debug on Seq: AJ459190.1 = ['ORIGIN', 'DIVISION',  
> 'genbank_accessions', 'TITLE', 'cross_references',  
> 'data_file_division', 'VERSION', 'references', 'KEYWORDS', 'GI',  
> but a freshly inserted BioSeq by BioPython 1.44 only gives me:
> Debug on Seq: EF631597.1 =  ['cross_references', 'dates',  
> 'references', 'gi', 'data_file_division']
> Once I look in the table bioentry_qualifier_value
> * 20 records for a Sequence imported by BioJava
> * 1 only for a Sequence inserted by BioPython: the date which  
> should be inserted by "_load_bioentry_date" in BioSQL/Loader.py
> Quite a few annotations missing, no?
> Any idea?
> Eric
> ______________________________________________________________________ 
> _______
> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers  
> Yahoo! Mail
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :

More information about the BioPython mailing list