[Biopython-dev] MetaTool and Martel
Andrew Dalke
adalke at mindspring.com
Wed Aug 15 10:17:37 EDT 2001
> Does Martel handle embedded size fields?
Yes! I needed it for support for the MDL file format.
Suppose you have something like
2 1 this and that
3 1 but not the other
1 3 this is a test
which should be turned into
record 1 == ("this and", "that")
record 2 == ("but not the", "other")
record 3 == ("this", "is a test")
Then you can use something like
>>> from Martel import Integer, Str, RepN, Group, AnyEol, Re, Rep
>>> word = Group("word", Re("[^ \R]+"))
>>>
>>> record = Integer("n1") + Str(" ") + Integer("n2") + \
... Group("group1", RepN(Str(" ") + word, "n1")) + \
... Group("group2", RepN(Str(" ") + word, "n2")) + \
... AnyEol()
>>>
>>> from xml.sax import saxutils
>>> format = Rep(record)
>>> parser = format.make_parser()
>>> parser.setContentHandler(saxutils.XMLGenerator())
>>> parser.parseString("""\
... 2 1 this and that
... 3 1 but not the other
... 1 3 this is a test
... """)
<?xml version="1.0" encoding="iso-8859-1"?>
<n1>2</n1> <n2>1</n2><group1> <word>this</word>
<word>and</word></group1><group2> <word>that</word></group2>
<n1>3</n1> <n2>1</n2><group1> <word>but</word> <word>not</word>
<word>the</word></group1><group2> <word>other</word></group2>
<n1>1</n1> <n2>3</n2><group1> <word>this</word></group1><group2>
<word>is</word> <word>a</word> <word>test</word></group2>
>>>
A couple more details are at:
http://www.dalkescientific.com/Martel/ebi-talk/img35.htm
This is only usable if the number and the repeat count are the
same. Eg, if the count value is N to mean N-1 repeats then it
isn't possible to support it. (N+1 is doable as a repeat of N
then a repeat of 1.)
But I've not come across that case. Yet.
> It's not strictly necessary but without it Martel would accept matrixes
>that were not consistent with the size fields.
There are other formats (MDL mol format) where the counts are required
else things get out of synch.
Andrew
More information about the Biopython-dev
mailing list