[BioPython] performance problem in
ParserSupport.EventGenerator._get_set_flags
Andreas Kuntzagk
andreas.kuntzagk at mdc-berlin.de
Mon Jun 16 12:32:17 EDT 2003
> Let me know if this works for you (and still provides the
> performance enhancements).
Works and gives the performance.
> > But there is still potential in it :-)
> > At the moment, (after my optimization) about 90% of my performance test
> > goes into Parser._do_callback. Of this, 60% is spend in endElement, 5%
> > in startElement and 7% in characters. The remaining time is spend in
> > _do_callback itself and for the recursion. So to get faster we could:
> > 1. make _do_callback self faster (Don't see how)
> > 2. make endElement faster.
> > 3. reduce recursion somehow(?) function-calls are expensive in python.
> > 4. Invent some clever algorithm
This is still true.
> I did some clean up in endElement (getting rid of the _get_set_flags
> function, mainly) so this might provide some speed-ups for that
> problem. I don't have any genius ideas for the other points right
> now, but maybe these simple clean-up changes will improve
> performance decently.
It improves performance against the original version (slow
_get_set_flags) but not against my optimized version.
But getting rid of this flags is a Good Thing(tm), clearer code make for
better understanding of algorithm which makes (hopefully) for better
algorithms.
> Hopefully this code is a little cleaner (damn, it was ugly before).
> If you want to send me your diffs on top of this I'm happy to
> commit 'em.
And the moment I have some ideas, but no ready diff. For long taglist
Parser._do_callback could maybe optimized by storing characters,
startElement, endElement in local variables, so it doesn't need to be
looked up every loop.
I tried this, but could not measure any speed differences, so this is
probably only a theoretical improvement.
Another question: in _do_callback you have two tests, if a tag is a
special tag whith a loop over sub_tags in between. Can special tag have
sub_tags? If not, you can reduce it to one test. This improves only
readability but still ...
Andreas
More information about the BioPython
mailing list