[BioSQL-l] BioSQL : GenBank db_xref names in dbxref table

Peter biopython at maubp.freeserve.co.uk
Tue Nov 20 19:36:34 UTC 2007


Dear all,

I'm one of the Biopython developers.  I've recently got going with
BioSQL and have been getting to grips with the Biopython BioSQL
interface.  I'm aware that we need to try and be consistent with
BioPerl and BioJava, so I'd like to pose my first question related to
that.

When loading GenBank records, many features have db_xref qualifiers,
e.g. from a random CDS feature in E. coli K12:

                     /db_xref="ASAP:1309"
                     /db_xref="GI:16128366"
                     /db_xref="ECOCYC:EG10213"
                     /db_xref="GeneID:945313"

Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC",
"GeneID" before using recording these entries in the seqfeature_dbxref
and dbxref tables.  For example, "GI" becomes "GeneIndex".
Biopython's current mapping is as follows:

# Dictionary of database types, keyed by GenBank db_xref abbreviation
db_dict = {'GeneID': 'Entrez',
           'GI': 'GeneIndex',
           'COG': 'COG',
           'CDD': 'CDD',
           'DDBJ': 'DNA Databank of Japan',
           'Entrez': 'Entrez',
           'GeneIndex': 'GeneIndex',
           'PUBMED': 'PubMed',
           'taxon': 'Taxon',
           'ATCC': 'ATCC',
           'ISFinder': 'ISFinder',
           'GOA': 'Gene Ontology Annotation',
           'ASAP': 'ASAP',
           'PSEUDO': 'PSEUDO',
           'InterPro': 'InterPro',
           'GEO': 'Gene Expression Omnibus',
           'EMBL': 'EMBL',
           'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot',
           'ECOCYC': 'EcoCyc',
           'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL'
           }

In my testing, I've found several GenBank db_xref abbreviation for
which we don't have a mapping defined, such as "LocusID", "dbSNP",
"MGD", "MIM", or from an EMBL file, "REMTREMBL".

I'd like to know if BioPerl and/or BioJava and/or BioRuby define a
similar mapping in their BioSQL code (or GenBank parser), so that
Biopython can follow your example.

Thank you,

Peter

P.S. See also Biopython bug 2405
http://bugzilla.open-bio.org/show_bug.cgi?id=2405



More information about the BioSQL-l mailing list