[Biopython-dev] Re: Gobase
Jeffrey Chang
jchang at SMI.Stanford.EDU
Mon Aug 28 19:44:58 EDT 2000
At BOSC, Andrew Dalke gave a really nice presentation on Martel, his
parser generator. Basically, it takes a regular language description of a
file format and creates an event-oriented parser for the format.
We're currently looking into using Martel to create parsers for biopython.
While it's not certain that we'll definitely adopt Martel, it currently
appears likely.
I think the advantages Martel has:
- more optimizable
- SAX-like parser more familiar than scanner/consumer
- syntax descriptions can be cross language
Disadvantages:
- regular expressions hard to debug (alleviated by good help messages?)
- currently slower for swissprot tests (but that may change)
- unclear how to handle exceptional cases (e.g. errors in format)
- not yet stable
Unclear:
- which is easier to maintain?
- which is easier to create?
So to answer your question, you may want to try to create a Gobase parser
in Martel, and then let us know what you think. It would be a good test
case, and probably helpful to Andrew to know whether it can handle the
format.
Jeff
On Mon, 21 Aug 2000, Cayte wrote:
> I've been looking into Gobase, a mitochondrial database, and
> wondering whether to use a line oriented or a streaming approach.
> The Gobase pages don't use as much formatting as Rebase, so the
> ParserSupport routines would work. But the streaming lets the utility
> strip off all the HTML, so the user doesn'y have to delete the
> preamble. The streaming is also less brittle if the format should
> change. On the other hand, it's more bug prone because it removes
> linefeeds before they can be used as delimiters.
>
>
> Cayte
>
>
More information about the Biopython-dev
mailing list