[Biopython-dev] Martel changes

Andrew Dalke adalke at mindspring.com
Wed Dec 12 15:05:55 EST 2001


Me:
>> Is anyone using the iterator facility in Martel?

Jeff:
>Yes.  I'm using it in Bio/Medline/NLMMedlineXML to parse the
>XML-formatted PubMed records.  Each XML file contains about ~30000
>records and is too big to keep in memory at once.

Do you pass in just the constructor (no args) or do you need
to create a factory function instance which knows how to
pass in the args?

Can the handler object you use be reinitialized via calling
'startDocument'?

>Sure.  Let me know if you do it, so that I can update my files
>accordingly.  I don't think it'll be hard to handle what you describe.

It shouldn't be.  I'm remember the reasons I didn't do it that
way the first time, and I want to see if my concerns (mentioned
above) are true or not.

>Looking through my code, other ones I use are Digits
>(more general name for Integer), Punctuation,
>and Unprintable(AnyBut(string.printable)).
>
>Actually, could you make more general equivalents of some of the
>names?  For example, presumably Digits and Integer would match the
>same things, but a lot of times you want to match some numerical
>characters and calling it an integer might be a tad confusing...

Ah! Yes, 'Digits' is better than 'Integer'.  It also lets
me replace 'SignedInteger' with 'Integer'.

When do you use Unprintable?  When do you use Punctuation?

My 'Float' isn't very powerful, as it only understands
numbers of the form (with optional +/-)
  1
  1.
  1.2
  .2

It doesn't handle things like 1E-3, or IEEE values
like NaN or +Inf.  I could (and probably should) support
the first of these.  I'm not sure if I should the second.

                    Andrew
                    dalke at dalkescientific.com





More information about the Biopython-dev mailing list