[Biojava-dev] blast parsing slowness

Michael L. Heuer heuermh@acm.org
Mon, 2 Dec 2002 15:56:36 -0500 (EST)


sounds like a bug to me.  At a minimum those ALWAYS_VETO listeners should
be optimized away.

I'm of the opinion that it would be better to have truely immutable
implementations of the interfaces in addition to the mutable ones with
ChangeEvent support than the current design.

What do others think?

   michael


On Mon, 2 Dec 2002, Doug Rusch wrote:

> This is a good topic for consideration with BioJava2.
>
> The circumstances are this: I have my blast parser working in my personal experimental biojava package. The blast data I am parsing was generated by blasting 1 mb human genomic chunks against small sequences (basically ests), so 1 query many different subjects. Anyways, I did comparisons of the java code against a home brewed perl blast parser. The biojava was much slower (at least an order of magnatitude slower) than the perl code. Now this isnt quite a fair test because the design of the two parsers is completely different but if anything I would still expect Java to be faster than perl.
>
> I profiled the code and found that the vast majority of the processing time was being spent in org.biojava.utils.ChangeSupport.growIfNecessary. Everytime it creates an alignment (org.biojava.bio.program.ssbind.BlastLikeSearchBuilder.makeSubHit) it is adding a changeListener to the generic alphabet (org.biojava.bio.symbol.SimpleSymbolList.addListener) it is using for alignments. Obviously it is adding many thousands of change listeners to the alphabet, but to add insult to injury, the listeners are all ALWAYS_VETO. So this poor alphabet has thousands of listeners telling it not to change.
>
> Is this really what was intended? I get the impression that the ALWAYS_VETO changeListener is a special case. Perhaps ALWAYS_VETO listeners should just be kept track of by a counter? Should alphabets be changable at all? I do not know what use cases prompted this design but is there any concensus on a fix?
>
> Doug Rusch
> drusch@tcag.org
>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev@biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
>