[Biopython-dev] [Bug 2591] New: GenBank files misparsed for long organism names
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Fri Sep 19 18:26:17 UTC 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2591
Summary: GenBank files misparsed for long organism names
Product: Biopython
Version: 1.47
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: joelb at lanl.gov
I've noticed a problem with BioPython 1.47 mis-parsing the organism and lineage
in GenBank files from certain bacteria. All of the problem organisms have
names longer than 61 characters, and a line wrap is introduced into the SOURCE
and ORGANISM records, which causes the mis-parsing.
My reading of the GenBank file docs says that lines should be of variable
length rather than being split, so it appears this bug is GenBank's problem
rather than BioPython's. I have sent e-mail to info at ncbi.nlm.nih.gov about the
issue just now. GenBank doesn't seem to have a bug tracker, though, so I'm
writing the issue here to document it for other people. The issue exists for a
number of organisms (more than 6, though I haven't done the exact count).
One example may be found at
ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Salmonella_enterica_serovar_Paratyphi_A_AKU_12601/NC_011147.gbk
or
http://tinyurl.com/47yg5g
When parsing this file, the taxonomy list returned begins with
["AKU_12601 Bacteria","Proteobacteria"...
Some of the other examples have made it onto web sites which have included the
mis-parsed data, e.g. Superfam
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/cgi-bin/gen_list.cgi?genome=x6
which shows the error for Salmonella enterica subsp. enterica serovar
Choleraesuis str. SC-B67.
I'll append the response from GenBank to this bug if and when I get one. If I
don't get one, then I'll try to come up with a workaround.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list