[Biopython-dev] Martel-0.5
Andrew Dalke
dalke at acm.org
Tue Jan 9 00:02:32 EST 2001
Cayte <katel at worldpath.net>:
> Does the current version of Martel support backtracking?
Sadly, no more than it ever did. There is no backtracking
with the "*" operator. I haven't been clever enough in how
to use mxTextTools to support that ability. But so far there
have been ways around it.
> The parser gets stuck on this line:
>UniGene Cluster Hs.222015
>
> The expression is:
>unigene_title = Martel.Group( "unigene_title", Martel.Str(
> "UniGene Cluster " ) +
> Martel.Re( "[A-Z]" ) + Martel.Re( "[a-z]" ) + Martel.Re( "\.\d+" ) +
> Martel.AnyEol() )
>
> After this it goes into a loop until it runs out of characters.
I can't see why it would do that there. Every operation must
consume at least a character so it can't be stuck in an infinite
loop. The only operator to consume newlines is the AnyEol so
at most it should read up until the end of a line.
Have you tried using the make_parser(debug_level = 2) option to
see which operation is consuming characters?
Also, you can merge the Re operations into one, as in
Martel.Re(r"[A-Z][a-z]\.\d+") + Martel.AnyEol()
or even use \R at the end of the pattern to replace the AnyEol.
I just tested your expression out and it seems to work fine for
me. Here's what I did:
>>> import Martel
>>> unigene_title = Martel.Group( "unigene_title",
Martel.Str( "UniGene Cluster ") + Martel.Re( "[A-Z]" ) +
Martel.Re( "[a-z]" ) + Martel.Re( "\.\d+" ) + Martel.AnyEol())
>>> parser = unigene_title.make_parser()
>>> from Martel.test import support
>>> parser.setContentHandler(support.Dump())
>>> parser.parseString("UniGene Cluster Hs.222015\n")
-------> Start
<unigene_title>UniGene Cluster Hs.222015
</unigene_title>
-------> End
If you still can't get it working, email me what you have and
I'll take a closer look at it.
Andrew
More information about the Biopython-dev
mailing list