[Biojava-dev] blast parsing slowness

Doug Rusch drusch@tcag.org
Fri, 6 Dec 2002 16:20:38 -0500


So I have tested the fix and it seems good. Now the parsing is much faster and most of the time is spent handling the regex's.

Thanks alot!
Doug

-----Original Message-----
From:	Matthew Pocock [mailto:matthew_pocock@yahoo.co.uk]
Sent:	Wed 12/4/02 6:17 PM
To:	Doug Rusch
Cc:	biojava-dev@biojava.org
Subject:	Re: [Biojava-dev] blast parsing slowness
Doug,

Could you try again now? Thomas has committed a fix to the event 
meta-data. We'd kind of mucked some of the plumbing up.

Matthew

Doug Rusch wrote:
> This is a good topic for consideration with BioJava2.
> 
> The circumstances are this: I have my blast parser working in my personal experimental biojava package. The blast data I am parsing was generated by blasting 1 mb human genomic chunks against small sequences (basically ests), so 1 query many different subjects. Anyways, I did comparisons of the java code against a home brewed perl blast parser. The biojava was much slower (at least an order of magnatitude slower) than the perl code. Now this isnt quite a fair test because the design of the two parsers is completely different but if anything I would still expect Java to be faster than perl.
> 
> I profiled the code and found that the vast majority of the processing time was being spent in org.biojava.utils.ChangeSupport.growIfNecessary. Everytime it creates an alignment (org.biojava.bio.program.ssbind.BlastLikeSearchBuilder.makeSubHit) it is adding a changeListener to the generic alphabet (org.biojava.bio.symbol.SimpleSymbolList.addListener) it is using for alignments. Obviously it is adding many thousands of change listeners to the alphabet, but to add insult to injury, the listeners are all ALWAYS_VETO. So this poor alphabet has thousands of listeners telling it not to change.
> 
> Is this really what was intended? I get the impression that the ALWAYS_VETO changeListener is a special case. Perhaps ALWAYS_VETO listeners should just be kept track of by a counter? Should alphabets be changable at all? I do not know what use cases prompted this design but is there any concensus on a fix?
> 
> Doug Rusch
> drusch@tcag.org 
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev@biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
> 


-- 
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk

__________________________________________________

Do You Yahoo!?

Everything you'll ever need on one web page

from News and Sport to Email and Music Charts

http://uk.my.yahoo.com