[Biopython-dev] [Bug 3071] New: EMBL parser does not parse RP lines correctly.

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Apr 30 04:52:45 UTC 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=3071

           Summary: EMBL parser does not parse RP lines correctly.
           Product: Biopython
           Version: 1.54b
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: laserson at mit.edu
                CC: laserson at mit.edu


The EMBL parser makes an incorrect assert statement at line 679 of
Bio/GenBank/Scanner.py:

elif line_type == 'RP':
    # Reformat reference numbers for the GenBank based consumer
    # e.g. '1-4639675' becomes '(bases 1 to 4639675)'
    assert data.count("-")==1
    consumer.reference_bases("(bases " + data.replace("-", " to ") + ")")

The EMBL specification states that there can be multiple ranges in this line:
http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_10_3

This breaks at least one record in IMGT (which will be attached shortly).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list