[Biopython-dev] [Bug 1968] New: GenBank parsing fails if REFERNCE (bases.. )line is split

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Mar 7 14:36:26 EST 2006


http://bugzilla.open-bio.org/show_bug.cgi?id=1968

           Summary: GenBank parsing fails if REFERNCE (bases.. )line is
                    split
           Product: Biopython
           Version: Not Applicable
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: kael.fischer at gmail.com


Using the  Bio/GenBank/__init__.py from the CVS HEAD, parsing of:  J01917.1 
GI:209811 fails.

The file pointer is at the end of the record and the traceback is:

Traceback (most recent call last):
  File "gb2fasta.py", line 21, in ?
    for gbr in GenBank.Iterator(f,parser=gbParser):
  File "/usr/local/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line
146, in next
    return self._parser.parse(File.StringHandle(data))
  File "/usr/local/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line
212, in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/local/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line
1518, in feed
    line = self._feed_header(handle, consumer)
  File "/usr/local/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line
1386, in _feed_header
    assert line[0:GENBANK_INDENT] <> GENBANK_SPACER, \
AssertionError: Unexpected continuation of an entry:
            28259)


The _feed_header method does not deal with REFERENCE ... (bases ....) being
split across lines.

This diff fixes it (form is wordwrapping it):
***************
*** 1425,1431 ****
--- 1425,1437 ----
                  #Need to call consumer.reference_num() and
consumer.reference_bases()
                  #e.g.
                  # REFERENCE   1  (bases 1 to 86436)
+ 
                  data = data.strip()
+ 
+                 #check for closing pren
+                 while data.find(')') == -1:
+                     data=data+handle.readline().strip()
+                 
                  while data.find('  ')<>-1:
                      data = data.replace('  ',' ')
                  if data.find(' ')==-1 :




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list