[Biopython-dev] Martel changes
Andrew Dalke
adalke at mindspring.com
Fri Dec 14 07:22:18 EST 2001
Jeff:
>Oops, I just looked over the code. I'm in fact not using the
>iterator, but thre RecordReader. Sorry about the confusion!
No problem, and fewer changes for you!
Me:
>> When do you use Unprintable? When do you use Punctuation?
>I use them both for matching things in english text. Sometimes the
>text contains unprintable characters from foreign character sets.
Okay, if you say it's useful, I'll add it. What do you
define as punctuation?
>> My 'Float' isn't very powerful, as it only understands
>> numbers of the form (with optional +/-)
>It gets pretty complicated, e.g.
>1.315E2.24
That's not a valid floating point number -- the exponent must
be an integer.
BTW, I'm working on a 'Time' submodule, which should make it
easier to parse time and date data structures. The language
I used is based on strptime, plus some experimental extensions
to make it easier for me to use.
The idea is to make it easier to parse something like
1970-08-22
using a pattern like
%(4-year)-%m-%d
than having to write
(?P<year>\d{4})-(?P<month>\d{2})-(?<day>\d{2})
all the time.
(Plus, the patterns I use are stricter, in that you can't
use a day like "43".)
For example, (with judicious newlines for clarity)
>>> from Martel import Time
>>> print Time.make_pattern("%m/%d/%Y")
(?P<month?type=numeric>(0[0-9]|1[012]))/
(?P<day?type=numeric>(0[1-9]|[12][0-9]|3[01]))/
(?P<year?type=long>\d{4})
>>>
>>> parser = Time.make_expression("%(Jan) %(year)\n").make_parser()
>>> from xml.sax import saxutils
>>> parser.setContentHandler(saxutils.XMLGenerator())
>>> parser.parseString("Dec 2001\n")
<?xml version="1.0" encoding="iso-8859-1"?>
<month type="short">Dec</month> <year type="any">2001</year>
>>>
It's nearly done - only about an hour of work left. Then
to add the useful patterns, and the SimpleFields (or whatever
I decide to call it). I should be able to finish it by
Friday .. today.
The code is temporarily at
http://www.biopython.org/~dalke/Time.py
but it uses a new 'NullOp' Expression not yet in CVS for
doing the 'make_expression' function.
Andrew
dalke at dalkescientific.com
More information about the Biopython-dev
mailing list