[Biopython-dev] 0.90-d04 coming soon...
Andrew Dalke
dalke at acm.org
Sun Nov 12 04:56:21 EST 2000
Brad:
>I think the saxlib errors are coming from Martel, which is not 2.0
>friendly, yet. I attached a patch to Martel-0.3/Martel/Parser.py which
>should make Martel work with only the 2.0 libraries (ie. no need to
>install the PyXML package). I believe this should also work with 1.5.2
>with PyXML 0.6.1 installed, but I haven't verified this.
There are saxlib errors with Martel-0.3 when using Python 2.0. Several
things changed between the old PyXML package and the new builtin module.
They include:
o switch from SAX 1.0 to 2.0 support
- different methods (eg, 'setContentHandler' instead of
'setDocumentHandler')
- different method arguments (eg, 'characters(content)' instead of
'characters(text, start, size)' )
o removal or renaming of several classes
- DocumentHandler -> ContentHandler
- no XML Canonicalization class
- no 'BaseHandler'
- no ErrorRaiser class - functionality merged into ErrorHandler
and ErrorHandler now needs its __init__ to be called.
Brad's patch doesn't catch all of the problems. This evening I finally
switch all my code over to use Python 2.0 - at least enough that my
regression tests work :)
These changes should probably be included in the upcoming version. However,
they are *not* backwards compatible either to the Martel 0.3 API or to
Python 1.5.2. How does that affect the 0.90-d04 release? How does a
dependency on 2.0 affect a 1.0 release?
(Actually, I should say it's dependent on the PyXML package and not 1.5.2
per se. It's still tricky because of the API changes between SAX 1.0 and
SAX 2.0 and because I've started using Python 2.0 syntax, like "import
X as Y".)
I've also finished off the iterator support Brad wanted, excepting for
some documentation. It works, but it's built on top of the callback
method so will always be slower than the SAX-like interface - until
someone spends the time needed to rewrite the code to talk to mxTextTools
directly.
Here's my to-do list for Martel, not all of which will be done for a
hypothetical 1.0:
o resolve the newline issue
o interface for version detection
- only need to read part of a file to determine the format/version
- support categories? (Eg, "a PDB format" or "a sequence format")
o cache tag tables for faster parser creation
o attribute lists and XML namespaces
- could be useful for version labels (eg, <swissprot version="38">
instead of <swissprot38>
- how to store in a regular expression pattern string
- I just don't know enough about namespaces to know if I'm doing
this one correctly. Any offers to help?
o better debugging support
- somehow identify the lastmost character attempted to parse
(perhaps with a specialized tag table? Or modify mxTextTools?)
- SAX Locator support
o more formats, examples, testing, documentation, etc.
However, I think the core API is now stable, which means it should be
stable enough for people to starting writing parsers based off of it
and not have things change from underneath.
So Jeff, how would you like things to be scheduled?
Andrew
More information about the Biopython-dev
mailing list