[Biopython-dev] MetaTool and Martel

Andrew Dalke adalke at mindspring.com
Wed Aug 15 10:17:37 EDT 2001

>   Does Martel handle embedded size fields?

Yes!  I needed it for support for the MDL file format.

Suppose you have something like

2 1 this and that
3 1 but not the other
1 3 this is a test

which should be turned into

record 1 == ("this and", "that")
record 2 == ("but not the", "other")
record 3 == ("this", "is a test")

Then you can use something like

>>> from Martel import Integer, Str, RepN, Group, AnyEol, Re, Rep
>>> word = Group("word", Re("[^ \R]+"))
>>> record = Integer("n1") + Str(" ") + Integer("n2") + \
...     Group("group1", RepN(Str(" ") + word, "n1")) + \
...     Group("group2", RepN(Str(" ") + word, "n2")) + \
...     AnyEol()
>>> from xml.sax import saxutils
>>> format = Rep(record)
>>> parser = format.make_parser()
>>> parser.setContentHandler(saxutils.XMLGenerator())
>>> parser.parseString("""\
... 2 1 this and that
... 3 1 but not the other
... 1 3 this is a test
... """)
<?xml version="1.0" encoding="iso-8859-1"?>
<n1>2</n1> <n2>1</n2><group1> <word>this</word>
<word>and</word></group1><group2> <word>that</word></group2>
<n1>3</n1> <n2>1</n2><group1> <word>but</word> <word>not</word>
<word>the</word></group1><group2> <word>other</word></group2>
<n1>1</n1> <n2>3</n2><group1> <word>this</word></group1><group2>
<word>is</word> <word>a</word> <word>test</word></group2>

A couple more details are at:

This is only usable if the number and the repeat count are the
same.  Eg, if the count value is N to mean N-1 repeats then it
isn't possible to support it.  (N+1 is doable as a repeat of N
then a repeat of 1.)

But I've not come across that case.  Yet.

>   It's not strictly necessary but without it Martel would accept matrixes
>that were not consistent with the size fields.

There are other formats (MDL mol format) where the counts are required
else things get out of synch.


