[Biopython] GenBank.Scanner use of os.linesep
Reece Hart
reece at berkeley.edu
Fri Apr 9 04:46:36 UTC 2010
Hi All-
I recently discovered that the GenBank parser doesn't work on Google App
Engine because os.linesep is undefined (GenBank/Scanner.py:746):
745 # if self.line[-1] == "\n" : self.line =
self.line[:-1]
746 self.line = self.line.rstrip(os.linesep)
747 misc_lines.append(self.line)
Defining os.linesep is sufficient to fix the problem (thanks to Brad
Chapman).
It seems to me that this use of os.linesep is probably mistaken here. If
the file comes from efetch, the line separator will be \n regardless of
platform [1] and that is what should be used in rstrip. It's possible
that the file might come from a dog-foresaken CRLF platform and
therefore contain that line separator.
So, I humbly propose that 746 be changed to either rstrip('\n') or,
perhaps, rstrip('\n\r'). Although the need for the latter is probably
rare, I don't see that it costs anything to cover that case by adding \r.
I'm new to this community, so I don't know whether we now have ferocious
debate about the merits of line terminators or, rather, I submit a lame
one-liner patch against the git HEAD.
Thanks for Biopython.
Cheers,
Reece
[1] For reference, here's a web request that should be equivalent to the
efetch. On line 5, 0a is LF is \n.
apt12j$ curl -s
'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=238018044&rettype=gb'
| hexdump -C | head
00000000 4c 4f 43 55 53 20 20 20 20 20 20 20 4e 4d 5f 30 |LOCUS
NM_0|
00000010 30 34 30 30 36 20 20 20 20 20 20 20 20 20 20 20
|04006 |
00000020 20 20 20 31 33 39 39 33 20 62 70 20 20 20 20 6d | 13993
bp m|
00000030 52 4e 41 20 20 20 20 6c 69 6e 65 61 72 20 20 20 |RNA
linear |
00000040 50 52 49 20 32 35 2d 4d 41 52 2d 32 30 31 30 0a |PRI
25-MAR-2010.|
00000050 44 45 46 49 4e 49 54 49 4f 4e 20 20 48 6f 6d 6f |DEFINITION
Homo|
--
Reece Hart, Ph.D.
Chief Scientist, Genome Commons http://genomecommons.org/
Center for Computational Biology 324G Stanley Hall
UC Berkeley / QB3 Berkeley, CA 94720
More information about the Biopython
mailing list