[Biopython-dev] Martel timings

Brad Chapman chapmanb at arches.uga.edu
Thu Oct 12 20:51:45 EDT 2000


Andrew wrote:
>  I'm starting to compare Martel parsing with the existing biopython
>  code. I wrote a Martel document handler called SwissProtBuilder.py
>  (attached) which creates Bio.SwissProt.Record objects.
>  
>  The biopython code is about 8%
>  slower than the Martel code.  The Martel code takes about 25 minutes
to
>  parse sprot38.

Cool! It is great to hear they are both comparible in terms of times. I'm
definately not a speed freak myself, but it is very nice to have a slight
speed improvement (and at least not a speed decrease) on switching over
to Martel based parser stuff.

>  Because of the new RecordReader, it only needed about 4MB of memory.
>  I assume the biopython code is at least that good.

Hmmm, one side not about RecordReader. I really like the way you can
interface with the parsers in multiple ways in the current Biopython
parsing. I think it is really useful to be able to iterate over a record
and get the record back, instead of automatically having to parse it (I
find this useful for pulling a "bad" record out of a big file of
records). 

Do you think there is a way to make the RecordReader act similar to the
Iterators in this regard? Right now, the fact that it is reading things
one record at a time is kind of hidden inside the parse, and I'm not
exactly positive how you can make the record reader just return the raw
info making up the record that is being parsed.

BTW, I like the StartsWith, EndsWith in the new RecordReader! When I was
doing the FASTA stuff I couldn't figure out any way to recognize new
files with only the EndsWith behavior :-).
  
>  One of the reasons for the good performance on the Martel side is that
>  I'm pruning the expression tree to get rid of events which aren't
>  handled by the callback object.  That eliminates a lot of function
call
>  overhead.

Very cool idea to reduce the size of the XML generated and returned.
Nifty stuff!

Brad



More information about the Biopython-dev mailing list