[Biopython-dev] Re: Gobase

Jeffrey Chang jchang at SMI.Stanford.EDU
Mon Aug 28 19:44:58 EDT 2000


At BOSC, Andrew Dalke gave a really nice presentation on Martel, his
parser generator.  Basically, it takes a regular language description of a
file format and creates an event-oriented parser for the format.

We're currently looking into using Martel to create parsers for biopython.  
While it's not certain that we'll definitely adopt Martel, it currently
appears likely.

I think the advantages Martel has:
  - more optimizable
  - SAX-like parser more familiar than scanner/consumer
  - syntax descriptions can be cross language

Disadvantages:
  - regular expressions hard to debug  (alleviated by good help messages?)
  - currently slower for swissprot tests (but that may change)
  - unclear how to handle exceptional cases (e.g. errors in format)
  - not yet stable

Unclear:
  - which is easier to maintain?
  - which is easier to create?

So to answer your question, you may want to try to create a Gobase parser
in Martel, and then let us know what you think.  It would be a good test
case, and probably helpful to Andrew to know whether it can handle the
format.

Jeff



On Mon, 21 Aug 2000, Cayte wrote:

>    I've been looking into Gobase, a mitochondrial database, and
> wondering whether to use a line oriented or a streaming approach.  
> The Gobase pages don't use as much formatting as Rebase, so the
> ParserSupport routines would work.  But the streaming lets the utility
> strip off all the HTML, so the user doesn'y have to delete the
> preamble.  The streaming is also less brittle if the format should
> change.  On the other hand, it's more bug prone because it removes
> linefeeds before they can be used as delimiters.
> 
>                                                                           
> Cayte
> 
> 




More information about the Biopython-dev mailing list