[Biopython-dev] [Bug 2826] SeqRecord dbxrefs not written to GenBank by SeqIO

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon May 11 20:29:02 UTC 2009


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
            Summary|when creating a de-novo     |SeqRecord dbxrefs not
                   |SeqRecord, the dbxrefs are  |written to GenBank by SeqIO
                   |not written by SeqIO.write  |

------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-05-11 16:29 EST -------
Hi David,

Thank you for another interesting bug report. See here for what the NCBI uses
in a GenPept file for this example protein, NP_418483.1

The ASAP and GeneID numbers are not recorded at the sequence level - there is
nowhere in the GenBank file format to but them.  They are however recorded
within a CDS feature on the link above.  So, if you want these recorded, you'd
have to create a SeqFeature with the information (you can't use the SeqRecord's
dbxrefs list).

The GI number would get written, but due to an anomology in the GenBank parser
this is currently stored in the annotations dictionary under the key "gi", so
this is where the GenBank writer looks for this.  We should probably switch to
recording this in the dbxrefs as "gi:12345" as well/instead, and look for this
GI number there instead/as well.

Currently when parsing GenBank files, the only thing stored in the SeqRecord's
dbxref list is a PROJECT line cross reference (see Bug 2225).  Looking at the
code, we don't currently record that - we should.


Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the Biopython-dev mailing list