[Biopython-dev] [Biopython - Bug #3318] (New) Bio.GenBank.LocationParserError on RefSeq files

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Thu Jan 19 18:11:10 UTC 2012


Issue #3318 has been reported by John Eppley.

----------------------------------------
Bug #3318: Bio.GenBank.LocationParserError on RefSeq files
https://redmine.open-bio.org/issues/3318

Author: John Eppley
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


I get the following error trying to parse a file from NCBI RefSeq:
<pre>
Traceback (most recent call last):
  File "/Users/jmeppley/work/delong/projects/scripts/getSequencesFromGbk.py", line 62, in main
    translateStream(instream,options.formatIn,outstream, options.formatOut, options.cds, options.translate)
  File "/Users/jmeppley/work/delong/projects/scripts/getSequencesFromGbk.py", line 87, in translateStream
    for record in records:
  File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/SeqIO/__init__.py", line 532, in parse
    for r in i:
  File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/Scanner.py", line 440, in parse_records
    record = self.parse(handle, do_features)
  File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/Scanner.py", line 423, in parse
    if self.feed(handle, consumer, do_features):
  File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/Scanner.py", line 395, in feed
    self._feed_feature_table(consumer, self.parse_features(skip=False))
  File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/Scanner.py", line 347, in _feed_feature_table
    consumer.location(location_string)
  File "/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/__init__.py", line 975, in location
    raise LocationParserError(location_line)
LocationParserError: join(complement(149815..150200),complement(293787..295573),NC_016402.1:6618..6676,181647..181905)
</pre>

The file can be found here (release 51):

The relevant section is:
<pre>
     gene            join(complement(149815..150200),
                     complement(293787..295573),NC_016402.1:6618..6676,
                     181647..181905)
                     /gene="nad1"
                     /trans_splicing
                     /note="exons 1, 2, 3, and 5 on chromosome 1 are
                     trans-spliced with exon 4 on chromosome 3 to form the
                     complete coding region"
                     /db_xref="GeneID:11447159"
</pre>

I think the underscore in the sequence name is confounding the regular expressions in Bio/GenBank/__init__.py. I was able to stop the error by making the following change (first line is mine, second is original):
<pre>
99c99
< _complex_location = r"([a-zA-z][a-zA-Z0-9_]*(\.[a-zA-Z0-9]+)?\:)?(%s|%s|%s|%s|%s)" \
---
> _complex_location = r"([a-zA-z][a-zA-Z0-9]*(\.[a-zA-Z0-9]+)?\:)?(%s|%s|%s|%s|%s)" \
</pre>

I haven't really tested it to see if the output is correct, though. It does pass the doctests.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org




More information about the Biopython-dev mailing list