[Biojava-dev] blast parsing slowness

Schreiber, Mark mark.schreiber@agresearch.co.nz
Tue, 3 Dec 2002 09:57:17 +1300


Hmm, me thinks there should only be one listener. I think this should be
fixed before BJ1.3.

Following your description I wasn't able to find the exact point where
all these listeners get added. Could you give me a line number or
something?

Profiling of the common methods in general could be a good thing before
BJ1.3 and BJ2. It has helped spot things that work but do unneeded
stuff.

- Mark

> -----Original Message-----
> From: Doug Rusch [mailto:drusch@tcag.org] 
> Sent: Tuesday, 3 December 2002 9:48 a.m.
> To: biojava-dev@biojava.org
> Subject: RE: [Biojava-dev] blast parsing slowness
> 
> 
> This is a good topic for consideration with BioJava2.
> 
> The circumstances are this: I have my blast parser working in 
> my personal experimental biojava package. The blast data I am 
> parsing was generated by blasting 1 mb human genomic chunks 
> against small sequences (basically ests), so 1 query many 
> different subjects. Anyways, I did comparisons of the java 
> code against a home brewed perl blast parser. The biojava was 
> much slower (at least an order of magnatitude slower) than 
> the perl code. Now this isnt quite a fair test because the 
> design of the two parsers is completely different but if 
> anything I would still expect Java to be faster than perl.
> 
> I profiled the code and found that the vast majority of the 
> processing time was being spent in 
> org.biojava.utils.ChangeSupport.growIfNecessary. Everytime it 
> creates an alignment 
> (org.biojava.bio.program.ssbind.BlastLikeSearchBuilder.makeSub
> Hit) it is adding a changeListener to the generic alphabet 
> (org.biojava.bio.symbol.SimpleSymbolList.addListener) it is 
> using for alignments. Obviously it is adding many thousands 
> of change listeners to the alphabet, but to add insult to 
> injury, the listeners are all ALWAYS_VETO. So this poor 
> alphabet has thousands of listeners telling it not to change.
> 
> Is this really what was intended? I get the impression that 
> the ALWAYS_VETO changeListener is a special case. Perhaps 
> ALWAYS_VETO listeners should just be kept track of by a 
> counter? Should alphabets be changable at all? I do not know 
> what use cases prompted this design but is there any 
> concensus on a fix?
> 
> Doug Rusch
> drusch@tcag.org 
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev@biojava.org 
> http://biojava.org/mailman/listinfo/biojava-dev
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================