[Biopython-dev] [Biopython - Bug #3318] (New) Bio.GenBank.LocationParserError on RefSeq files

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Thu Jan 19 18:11:10 UTC 2012

Issue #3318 has been reported by John Eppley.

Bug #3318: Bio.GenBank.LocationParserError on RefSeq files

Author: John Eppley
Status: New
Priority: Normal
Target version: 

I get the following error trying to parse a file from NCBI RefSeq:
Traceback (most recent call last):
  File "/Users/jmeppley/work/delong/projects/scripts/getSequencesFromGbk.py", line 62, in main
    translateStream(instream,options.formatIn,outstream, options.formatOut, options.cds, options.translate)
  File "/Users/jmeppley/work/delong/projects/scripts/getSequencesFromGbk.py", line 87, in translateStream
    for record in records:
  File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/SeqIO/__init__.py", line 532, in parse
    for r in i:
  File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/Scanner.py", line 440, in parse_records
    record = self.parse(handle, do_features)
  File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/Scanner.py", line 423, in parse
    if self.feed(handle, consumer, do_features):
  File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/Scanner.py", line 395, in feed
    self._feed_feature_table(consumer, self.parse_features(skip=False))
  File "/nfs/Isilon/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/Scanner.py", line 347, in _feed_feature_table
  File "/common/lib/python/2.6/biopython-1.57-py2.6-macosx-10.6-universal.egg/Bio/GenBank/__init__.py", line 975, in location
    raise LocationParserError(location_line)
LocationParserError: join(complement(149815..150200),complement(293787..295573),NC_016402.1:6618..6676,181647..181905)

The file can be found here (release 51):

The relevant section is:
     gene            join(complement(149815..150200),
                     /note="exons 1, 2, 3, and 5 on chromosome 1 are
                     trans-spliced with exon 4 on chromosome 3 to form the
                     complete coding region"

I think the underscore in the sequence name is confounding the regular expressions in Bio/GenBank/__init__.py. I was able to stop the error by making the following change (first line is mine, second is original):
< _complex_location = r"([a-zA-z][a-zA-Z0-9_]*(\.[a-zA-Z0-9]+)?\:)?(%s|%s|%s|%s|%s)" \
> _complex_location = r"([a-zA-z][a-zA-Z0-9]*(\.[a-zA-Z0-9]+)?\:)?(%s|%s|%s|%s|%s)" \

I haven't really tested it to see if the output is correct, though. It does pass the doctests.

You have received this notification because this email was added to the New Issue Alert plugin

You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org

More information about the Biopython-dev mailing list