[Bioperl-l] SearchIO speed up

Chris Fields cjfields at uiuc.edu
Thu Aug 10 21:04:29 UTC 2006


...
> > You might use this same strategy have the handler return simple hashes
> > instead of objects,
> 
> Yes, the main change I have made that provides the speed increase is to
> make the handler (SearchResultEventBuilder) return hashes instead of
> objects.
> 
> It's a transparent change when combined with the lazy instantiation.

I agree, and may be the best way to proceed initially.  There are other ways
to optimize.  I personally like Aaron's 'chunk' idea using nested parsers,
which should fly; I could envision a way to take advantage of that with
Perl6's regex objects.

> > Alternatively, create a new SearchIO class (call it fastblast; okay,
> > terrible name) that doesn't use a handler and just returns hashes.  I
> > think Jason pointed out previously that the handler isn't required.
> 
> But I didn't see any particular harm in keeping them. Not having a
> handler might shave a percent or two off run times, but you need to
> balance speed with power and flexibility. I don't know where that
> balance lies, hence my question to the community.

Depends on the person, hence flexibility is probably the best way to go.
I'm like you in that I prefer using the various objects.  

The cool thing about SearchIO is you could design a module to your liking.
The tools are there (SearchIO module, Generic* Search objects, the
handlers), you just have to know how they work together and where to
optimize.  It's up to the user.

If someone wants a streamlined BLAST parser, they can build a specialized
SearchIO module that returns hashes straight out with no handler and no
internal caching (my fastblast suggestion).  Or use a specialized handler to
dole out hashes (your method).  Or use full-blown interleaved objects
(current implementation).  

The learning curve is somewhat high if you don't have a strong computer
science background like me (the molecular microbiologist).  You have to grok
how the system works, how the Handler works, the various Search* objects
that are returned, how they are implemented, etc.  But...

The system is flexible if you know how to use it.

Chris




More information about the Bioperl-l mailing list