[Biopython-dev] Effective ways to use Martel

Andrew Dalke adalke at mindspring.com
Tue Jan 22 20:53:32 EST 2002


Cayte:
>My format used embedded tags and EventGenerator does not
>support them.  Andrew recommended his new Dispatch module
>as an alternative.

The new Dispatch module; an example.

Start by reading previous email.

Here's a simple format definition for a FASTA file.

from Martel import Str, Group, UntilEol, AssertNot, Rep, AnyEol

header = (Str(">") +
          Group("description", UntilEol()) + 
          AnyEol())
seqline = (AssertNot(Str(">")) + 
           Group("sequence", UntilEol()) +
           AnyEol())

record = Group("record", header + Rep(seqline))

format = Rep(record)

Suppose you want to print the sequence length and the
header definition.  Here's how to do it with the Dispatcher.

from Martel import Dispatch

class SeqLength(Dispatch.Dispatcher):
  def start_record(self, tag, attrs):
    self.seqlen = 0

  def start_description(self, tag, attrs):
    self.save_characters()
  def end_description(self, tag):
    self.description = self.get_characters()

  def start_sequence(self, tag, attrs):
    self.save_characters()
  def end_sequence(self, tag):
    self.seqlen += len(self.get_characters())

  def end_record(self, tag):
    print self.seqlen, "in", self.description

This Dispatcher is a regular ContentHandler so is used
like this, assuming that "test.fasta" contains a FASTA
file.

p = format.make_parser()
p.setContentHandler(SeqLength())
p.parse(open("test.fasta"))

On my test data set, it looks like this:

378 in AK1H_ECOLI/114-431
389 in AKH_HAEIN/114-431
389 in AKH1_MAIZE/117-440
378 in AK2H_ECOLI/112-431
381 in AK1_BACSU/66-374
411 in AK2_BACST/63-370
411 in AK2_BACSU/63-373
411 in AKAB_CORFL/63-379
411 in AKAB_MYCSM/63-379
377 in AK3_ECOLI/106-407
391 in AK_YEAST/134-472


The new thing in this example is the "save_characters()"
and "get_characters()".  This is a stack-based approach
for getting all the characters between a start-tag and
and end-tag.  So long as the calls are balanced then
many different elements can get characters without
trouncing on each other's feet.

Hmmm, need an example which shows this support for
overlaps.

> Shud this be documented?  Andrew says in the future Dispatch
>will be the preferred tool.  But without documentation what
>keeps users from uing the
>old technique and running into the same issue?

Yes, it should be documented.  It also depends on if the
work I've been doing has gotten to the point where we
can start thinking about deprecating the existing code.
In which case the documentation is easy - at the top of
the module say "DEPRECATED - SEE XXX.py"

                    Andrew
                    dalke at dalkescientific.com





More information about the Biopython-dev mailing list