[Biopython] GenBank.Scanner use of os.linesep

Peter biopython at maubp.freeserve.co.uk
Fri Apr 9 08:54:53 UTC 2010


On Fri, Apr 9, 2010 at 5:46 AM, Reece Hart <reece at berkeley.edu> wrote:
> Hi All-
>
> I recently discovered that the GenBank parser doesn't work on Google App
> Engine because os.linesep is undefined (GenBank/Scanner.py:746):
>
>   745    #            if self.line[-1] == "\n" : self.line = self.line[:-1]
>   746                self.line = self.line.rstrip(os.linesep)
>   747                misc_lines.append(self.line)
>
> Defining os.linesep is sufficient to fix the problem (thanks to Brad
> Chapman).
>
> It seems to me that this use of os.linesep is probably mistaken here.

I agree.

> If the
> file comes from efetch, the line separator will be \n regardless of platform
> [1] and that is what should be used in rstrip. It's possible that the file
> might come from a dog-foresaken CRLF platform and therefore contain that
> line separator.

I think it would break in a more common setting - passing a file on
Windows with CRLF, since Python will turn that into just \n.

> So, I humbly propose that 746 be changed to either rstrip('\n') or, perhaps,
> rstrip('\n\r'). Although the need for the latter is probably rare, I don't
> see that it costs anything to cover that case by adding \r.

A plain rstrip() would also work and get rid of any trailing whitespace.
I've checked that in.

> I'm new to this community, so I don't know whether we now have ferocious
> debate about the merits of line terminators or, rather, I submit a lame
> one-liner patch against the git HEAD.

For something this trivial, your verbal patch is fine. Would you like
to be added to the NEWS and CONTRIB file?

Peter




More information about the Biopython mailing list