[BioPython] Martel-0.1

Andrew Dalke dalke@acm.org
Mon, 14 Aug 2000 04:10:44 -0600


Okay, I've got the first external of Martel, my parser generator
for regular languages, up and available.  It currently supports
swissprot38, PDB 2.1 and Blast 2.0.10 (all only lightly tested).
The code is available at
  http://biopython.org/~dalke/Martel-0.1.tar.gz
(it unpacks to 'Martel/').  The README is
  http://biopython.org/~dalke/Martel.README
Allow me to add that it's way cool.  :)

It uses regular expressions to describe a file format.  That regex
is used to create a state table for mxTextTools.  Named groups
(?P<like>this) are used for tag names.  After a string is parsed,
the tag lists are traversed and used to generate the events needed
by SAX.

That means you can convert, for example, swissprot records into
XML, or turn it into a DOM.  The output well-formed only, but
I've an idea for how to generate DTDs if given the data type
information for the tags... and knowledge of how DTDs are
supposed to work. :)

Unlike other most other parsers, the output from Martel also
preserves the physical information, so you can use it to convert
a record to HTML without changing how the file looks like.

To use it you'll need Python, mxTextTools and the xml module for
Python (I used v0.5.2; Python 1.6 has newer and incompatible code.)

There aren't any real examples, but try the test code for one of
the formats by doing something akin to:

from Martel.test import test_swissprot38
test_swissprot38.dump()

That produces an XML-like display of all the events for each of
the test cases.  There's also a function in Martel.test.support
called 'test_file' which works like:

from Martel.test import support
from Martel.formats import PDB_2_1
support.test_file(PDB_2_1.format, open("pdb1plm.ent"))

(Assuming you have pdb1plm handy.)

I'm presenting a poster about this at BOSC on Friday, so would
appreciate feedback.

                    Andrew
                    dalke@acm.org