[BioPython] performance problem in ParserSupport.EventGenerator._get_set_flags

Andreas Kuntzagk andreas.kuntzagk at mdc-berlin.de
Thu Jun 12 14:36:34 EDT 2003


> Well, I'm glad you're working on it. Two non-experts working on
> something is better then one :-).
> 
> > I just applied the patch against 1.10 (don't use cvs at the moment)
> > Ran the setup.py test and it failed the same tests as before.
> 
> Okay, so that means it didn't break anything new, right? That's a
> good thing.

probably. But what confuses me is that one of the failing tests is
GenBank. (ValueError: Unexpected function name: CONTIGjoin)
It's also failing for cvs version. 

But would the new code be wrong, I think some other tests should fail,
because ParserSupport is used all over the place.
 
> Cool, so it's twice as fast? Sweet. I'll look at the code again over
> the weekend, see if I can do anything else, and then test it out.
> Getting rid of my crap code to speed it up is definitely a good
> thing.
> 
> > If I have some time, I will do some more profiling. Can't be, that the
> > bioperl-parser for GenBank my colleague is using is much fasten then
> > biopython.

I have some ideas for this. First I want to know if I can make following
assumptions:

1. the values in ParserSupport.EventGenerator.flags are allways 0 or 1
2. the keys are allways same as
ParserSupport.EventGenerator.interest_tags
3. Nobody outside ParserSupport.EventGenerator uses these flags
(Probably hardest to confirm.)

Then we could get rid of _get_set_flags and just put only the set flags
into the dict.

This would give another 10% performance boost for my test prog. 
 
> Python is always better. It has to be :-)

Always better? No question. Always faster? We're working on it :-)

Andreas

PS: Returning this discussion to the list. Maybe somebody else has
something to say about this.



More information about the BioPython mailing list