[Biopython-dev] [Bug 1942] New: GenBank RecordParser fails on
particular qualifier structure
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Fri Feb 3 04:44:54 EST 2006
http://bugzilla.open-bio.org/show_bug.cgi?id=1942
Summary: GenBank RecordParser fails on particular qualifier
structure
Product: Biopython
Version: Not Applicable
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: lpritc at scri.sari.ac.uk
When parsing some GenBank record files, the GenBank.RecordParser throws an
error at a (poorly-formatted) qualifier entry:
Python 2.3.4 (#1, Feb 2 2005, 12:11:53)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio.GenBank import RecordParser
>>> parser = RecordParser()
>>> record = parser.parse(file('NC_002758.gbk'))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py", line 240, in
parse
self._scanner.feed(handle, self._consumer)
File "/usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py", line 1533,
in feed
assert line[0:1]=='/', \
AssertionError: Expected start of new qualifier, not:
similar to bacteriophage terminase small subunit"
This problem has been observed for several GenBank .gbk files, including
NC_002758 above, and NC_002929. It appears to be caused by qualifiers
structured like /note in the following example:
CDS 878043..878612
/locus_tag="SAV0800"
/note="
similar to bacteriophage terminase small subunit"
/codon_start=1
/transl_table=11
/product="similar to bacteriophage terminase small
subunit"
/protein_id="NP_371324.1"
/db_xref="GI:15923790"
/db_xref="GeneID:1120775"
/translation="MSELTAKQARFVNEYIRTLNVTQSAIKAGYSANSAHVTGCRLLK
KPHIKQYIQEQKDKIIDENVLTAKELLHVLTNAAVGDETETKEVVVKRGEYKENPQSG
KVQLVYNEHVELIEVPIKPSDRLKARDMLGKYHKLFTDKHDINGNVPIFINIGEWDGD
DEELDKTVKDVSNANPNHTVIVDDIPLED"
where the first double-quotes in the qualifier value are directly followed by
'\n', and the description continues on the next line. Editing the source .gbk
file directly to remove this resolves the problem.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list