[Biopython] GenBank.Scanner use of os.linesep

Reece Hart reece at berkeley.edu
Fri Apr 9 04:46:36 UTC 2010


Hi All-

I recently discovered that the GenBank parser doesn't work on Google App 
Engine because os.linesep is undefined (GenBank/Scanner.py:746):

    745    #            if self.line[-1] == "\n" : self.line = 
self.line[:-1]
    746                self.line = self.line.rstrip(os.linesep)
    747                misc_lines.append(self.line)

Defining os.linesep is sufficient to fix the problem (thanks to Brad 
Chapman).

It seems to me that this use of os.linesep is probably mistaken here. If 
the file comes from efetch, the line separator will be \n regardless of 
platform [1] and that is what should be used in rstrip. It's possible 
that the file might come from a dog-foresaken CRLF platform and 
therefore contain that line separator.

So, I humbly propose that 746 be changed to either rstrip('\n') or, 
perhaps, rstrip('\n\r'). Although the need for the latter is probably 
rare, I don't see that it costs anything to cover that case by adding \r.

I'm new to this community, so I don't know whether we now have ferocious 
debate about the merits of line terminators or, rather, I submit a lame 
one-liner patch against the git HEAD.

Thanks for Biopython.

Cheers,
Reece



[1] For reference, here's a web request that should be equivalent to the 
efetch. On line 5, 0a is LF is \n.
apt12j$ curl -s 
'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=238018044&rettype=gb' 
| hexdump -C | head
00000000  4c 4f 43 55 53 20 20 20  20 20 20 20 4e 4d 5f 30  |LOCUS       
NM_0|
00000010  30 34 30 30 36 20 20 20  20 20 20 20 20 20 20 20  
|04006           |
00000020  20 20 20 31 33 39 39 33  20 62 70 20 20 20 20 6d  |   13993 
bp    m|
00000030  52 4e 41 20 20 20 20 6c  69 6e 65 61 72 20 20 20  |RNA    
linear   |
00000040  50 52 49 20 32 35 2d 4d  41 52 2d 32 30 31 30 0a  |PRI 
25-MAR-2010.|
00000050  44 45 46 49 4e 49 54 49  4f 4e 20 20 48 6f 6d 6f  |DEFINITION  
Homo|

-- 
Reece Hart, Ph.D.
Chief Scientist, Genome Commons                http://genomecommons.org/
Center for Computational Biology               324G Stanley Hall
UC Berkeley / QB3                              Berkeley, CA 94720




More information about the Biopython mailing list