[Bioperl-l] SearchIO Performance

Sendu Bala bix at sendu.me.uk
Fri Mar 21 23:17:59 UTC 2008

Jason Stajich wrote:
> On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote:
>> Hi. I am pretty new to BioPerl, and have a question about performance 
>> with regard to Blast (nucleotide) file parsing.
>> What is substantially longer? Well, the existing code takes about 0.25 
>> seconds, and the BioPerl call takes about 4.5 seconds. I find that to 
>> be a dramatic difference, and that kind of time difference becomes 
>> significant when I have to parse 30 Blast files in a row. I understand 
>> that SearchIO is parsing the entire file and storing it all for easy 
>> retrieval later, and maybe this time penalty is what I have to pay for 
>> that convenience and organization.
> Sendu has written a pull parser that 
> doesn't require creation of all the objects until the user requests them.
> As I've said in the past, if someone wrote SearchIO event-listener that 
> created lightweight objects (or just hashes) instead this would also 
> provide a substantial speedup.

Yeah, you'll need BioPerl 1.5.2 (or the latest from svn) and to set the 
format to 'blast_pull'. Depending on the cirumstance and thoughtful 
usage, you can see orders of magnitude speed up.


The only disadvantage to the normal parser is that the pull parser 
currently only supports NCBI BLASTN and BLASTP.

