[Biopython-dev] [Biopython - Bug #3309] (New) GenBank Scanner expects sequence lines to start at position 9

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Thu Oct 27 14:03:42 UTC 2011


Issue #3309 has been reported by Liam Childs.

----------------------------------------
Bug #3309: GenBank Scanner expects sequence lines to start at position 9
https://redmine.open-bio.org/issues/3309

Author: Liam Childs
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 1.57
URL: 


Some programs (eg. Vector NTI and Lasegene) produce GenBank files where the sequences start at an index on the line other than index 9. I don't know how tightly defined the GenBank file format is, but if the indent for the start of the sequence can be variable, it seems to me there is a simple fix.

Current version (Bio/GenBank/Scanner.py:904):
line = self.line
... 15 lines
if len(line) > 9 and line[9:10]!=' ':
    raise ValueError("Sequence line mal-formed, '%s'"% line)
seq_lines.append(line[idx + 1:]) #remove spaces later

Simple fix 1 (variable per file):
line = self.line
idx = line.find('1') + 1
... 15 lines
if len(line) > idx and line[idx:idx + 1]!=' ':
    raise ValueError("Sequence line mal-formed, '%s'"% line)
seq_lines.append(line[idx + 1:]) #remove spaces later

The index can be obtained in any number of ways, this was the simplest I could think of off the top of my head. If sequences are allowed to start at a position other than '1', then maybe a regular expression should be used instead.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org




More information about the Biopython-dev mailing list