[Biopython-dev] Martel-0.4 available
Brad Chapman
chapmanb at arches.uga.edu
Wed Dec 6 03:50:55 EST 2000
Hi Andrew;
Sorry I haven't had a chance to comment on new Martel features yet
-- I have a bit of feedback in the areas you mentioned based on
working with it for writing the GenBank parser.
> New regexp syntax - \R
> \R means "\n|\r\n?"
> [\R] means "[\n\r]"
>
> New Expression Node - AnyEOL
> implements the \R test
In general, the \R syntax worked great for me. I'm not a regexp purist
or anything, so I have no issues with adding this. The new feature of
being able to handle any kind of line feed is very nice. One thing
that I ended up doing was not using the AnyEOL test at all, and
instead only using the \R syntax. As I starting using it I realized
why it was so nice to be able to embed the \R inside of any regular
expression, so I ended up only using \R to be consistent (so I used
Martel.Re("\R") to detect end of lines. Just thought I would mention
it if it helpful to you. But in general, \R seems great by me.
I also thought it would be nice if the RecordReader would accept \R as
a newline as well, so you could do something like
RecordRecorder.EndsWith(handle, "//\R"). Even further along these
lines, it would have been nice to be able to set the end with an
arbitrary regular expression. For GenBank, I would have wanted
"//[\R]+" (okay, I would have to escape those //'s, but I'm not sure
how many /s that would leave me with :-), so that the end would be
// plus an arbitrary number of newlines. I ran into problems with
files like the biojava genbank test file, where there are a bunch of
linefeeds at the end of the file, but this could be a problem with a
file of cut'n'pasted records that had differing amounts of
linebreaks. I was able to get around this for GenBank by using
StartsWith(handle, "LOCUS"), but just thought I would mention the thought.
> RecordReaders rewritten to use mxTextTools to find record
> begin and end characters rather than using readline/readlines.
I have a quick question about mxTextTools importing -- you are now
importing with:
from mx import TextTools
When did it get a mx meta-directory? Is this a new version or anything
fancy? It was no big deal, I was just curious.
> - how to make an iterator (would like a bit more feedback)
(pausing to read your other mails right now... thanks for the
feedback!)
One thing that I didn't use is a Martel based iterator -- I just stuck
with the type of iterator that Jeff uses in other Biopython parsers
but used the RecordReader to implement it. I'm not sure if it could be
done in a better way with a Martel iterator...
BTW, the debug_level = 2 option on the parser is incredibly nice. It
really helps get at why a parse is failing and makes it much easier to
correct the problem. I probably would still be pulling my hair out
trying to regexp right without this. Thanks!
Brad
More information about the Biopython-dev
mailing list