[Biopython-dev] [Biopython - Bug #3318] (New) Bio.GenBank.LocationParserError on RefSeq files
redmine at redmine.open-bio.org
redmine at redmine.open-bio.org
Thu Jan 19 18:11:10 UTC 2012
Issue #3318 has been reported by John Eppley.
----------------------------------------
Bug #3318: Bio.GenBank.LocationParserError on RefSeq files
https://redmine.open-bio.org/issues/3318
Author: John Eppley
Status: New
Priority: Normal
Assignee:
Category:
Target version:
URL:
I get the following error trying to parse a file from NCBI RefSeq:
<pre>
Traceback (most recent call last):
File "/Users/jmeppley/work/delong/projects/scripts/getSequencesFromGbk.py", line 62, in main
translateStream(instream,options.formatIn,outstream, options.formatOut, options.cds, options.translate)
File "/Users/jmeppley/work/delong/projects/scripts/getSequencesFromGbk.py", line 87, in translateStream
for record in records:
File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/SeqIO/__init__.py", line 532, in parse
for r in i:
File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/Scanner.py", line 440, in parse_records
record = self.parse(handle, do_features)
File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/Scanner.py", line 423, in parse
if self.feed(handle, consumer, do_features):
File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/Scanner.py", line 395, in feed
self._feed_feature_table(consumer, self.parse_features(skip=False))
File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/Scanner.py", line 347, in _feed_feature_table
consumer.location(location_string)
File "/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/__init__.py", line 975, in location
raise LocationParserError(location_line)
LocationParserError: join(complement(149815..150200),complement(293787..295573),NC_016402.1:6618..6676,181647..181905)
</pre>
The file can be found here (release 51):
The relevant section is:
<pre>
gene join(complement(149815..150200),
complement(293787..295573),NC_016402.1:6618..6676,
181647..181905)
/gene="nad1"
/trans_splicing
/note="exons 1, 2, 3, and 5 on chromosome 1 are
trans-spliced with exon 4 on chromosome 3 to form the
complete coding region"
/db_xref="GeneID:11447159"
</pre>
I think the underscore in the sequence name is confounding the regular expressions in Bio/GenBank/__init__.py. I was able to stop the error by making the following change (first line is mine, second is original):
<pre>
99c99
< _complex_location = r"([a-zA-z][a-zA-Z0-9_]*(\.[a-zA-Z0-9]+)?\:)?(%s|%s|%s|%s|%s)" \
---
> _complex_location = r"([a-zA-z][a-zA-Z0-9]*(\.[a-zA-Z0-9]+)?\:)?(%s|%s|%s|%s|%s)" \
</pre>
I haven't really tested it to see if the output is correct, though. It does pass the doctests.
----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin
--
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
More information about the Biopython-dev
mailing list