[BioPython] performance problem in ParserSupport.EventGenerator._get_set_flags

Andreas Kuntzagk andreas.kuntzagk at mdc-berlin.de
Mon Jun 16 12:32:17 EDT 2003


> Let me know if this works for you (and still provides the
> performance enhancements).

Works and gives the performance.

> > But there is still potential in it :-)
> > At the moment, (after my optimization) about 90% of my performance test
> > goes into Parser._do_callback. Of this, 60% is spend in endElement, 5%
> > in startElement and 7% in characters. The remaining time is spend in
> > _do_callback itself and for the recursion. So to get faster we could:
> > 1. make _do_callback self faster (Don't see how)
> > 2. make endElement faster.
> > 3. reduce recursion somehow(?) function-calls are expensive in python.
> > 4. Invent some clever algorithm

This is still true.

> I did some clean up in endElement (getting rid of the _get_set_flags
> function, mainly) so this might provide some speed-ups for that
> problem. I don't have any genius ideas for the other points right
> now, but maybe these simple clean-up changes will improve
> performance decently.

It improves performance against the original version (slow
_get_set_flags) but not against my optimized version.
But getting rid of this flags is a Good Thing(tm), clearer code make for
better understanding of algorithm which makes (hopefully) for better
algorithms.
 
> Hopefully this code is a little cleaner (damn, it was ugly before).
> If you want to send me your diffs on top of this I'm happy to
> commit 'em.

And the moment I have some ideas, but no ready diff. For long taglist
Parser._do_callback could maybe optimized by storing characters,
startElement, endElement in local variables, so it doesn't need to be
looked up every loop. 
I tried this, but could not measure any speed differences, so this is
probably only a theoretical improvement. 

Another question: in _do_callback you have two tests, if a tag is a
special tag whith a loop over sub_tags in between. Can special tag have
sub_tags? If not, you can reduce it to one test. This improves only
readability but still ...

Andreas



More information about the BioPython mailing list