[Biopython-dev] Martel-0.3 available
Andrew Dalke
dalke at acm.org
Mon Oct 9 20:15:34 EDT 2000
Brad:
> One thing that had me concerned about your parser, though, was the
> use of Str("\n") to detect end of new lines. I was using this with
> lots o' luck with all of my unix formatted files, but it didn't seem
> to work right for me when I was using it on the Windows formatted (I
> think) files in the Fasta test directory. I ended up having to use
> Martel.MaxRepeat(Martel.Re("[\s]"), 0, 2) to detect end o' lines,
> which seems to work properly, but it pretty ugly looking.
Yeah, I'm worried about that as well, but I haven't really looked at
the problem. Dug around for a bit now. Under MS Windows, reading a
native file (which "od -c" shows as having "\r\n"), open("test.dat").read()
only shows "\n", so it's been translated as I expect. Using
open("test.dat", "rb").read() shows the "\r\n".
So so long as the file is read in text mode and is used on an OS with the
same line endings, then it will be fine. However, it does mean my byte
counts will be off, depending on your viewpoint :(
There might be a problem with interoperability between difference OSes.
That could be addressed in one of several ways:
1) require the input to be converted to the local line ending and provide
no support for doing so
2) supply some adapters ("FromMac", "FromUnix", "FromDos") but don't use
them; instead leaving the decision up to the client code
3) provide a tool which autodetects endings and uses the right adapter
4) http://members.nbci.com/_XOOM/meowing/python/index.html
5) define an EOL = Re(r"\n|\r\n?")
I prefer 2-4, but would like to stick with 1 for now. I don't like 5
because people will forget to use it.
> I don't really know anything at all about line-break madness.
I've been a unix weenie for too long, and agree with you.
> I didn't even see DocumentHandler in 2.0 -- I think that
> ContentHandler is DocumentHandler (at least in 2.0), but I'm not
> positive. Hard to follow all of the changes in that stuff...
According to my XML book, it's Document Handler, and it works with DOM
and the other XML tools, so it's likely correct.
Andrew
More information about the Biopython-dev
mailing list