[Bioperl-l] SearchIO Performance

Sendu Bala bix at sendu.me.uk
Fri Mar 21 19:17:59 EDT 2008


Jason Stajich wrote:
> 
> On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote:
> 
>> Hi. I am pretty new to BioPerl, and have a question about performance 
>> with regard to Blast (nucleotide) file parsing.
[...]
>> What is substantially longer? Well, the existing code takes about 0.25 
>> seconds, and the BioPerl call takes about 4.5 seconds. I find that to 
>> be a dramatic difference, and that kind of time difference becomes 
>> significant when I have to parse 30 Blast files in a row. I understand 
>> that SearchIO is parsing the entire file and storing it all for easy 
>> retrieval later, and maybe this time penalty is what I have to pay for 
>> that convenience and organization.
[...]
> Sendu has written a pull parser that 
> doesn't require creation of all the objects until the user requests them.
> As I've said in the past, if someone wrote SearchIO event-listener that 
> created lightweight objects (or just hashes) instead this would also 
> provide a substantial speedup.

Yeah, you'll need BioPerl 1.5.2 (or the latest from svn) and to set the 
format to 'blast_pull'. Depending on the cirumstance and thoughtful 
usage, you can see orders of magnitude speed up.

http://doc.bioperl.org/bioperl-live/Bio/SearchIO/blast_pull.html

The only disadvantage to the normal parser is that the pull parser 
currently only supports NCBI BLASTN and BLASTP.


More information about the Bioperl-l mailing list