[Bioperl-l] dealing with large files

Sendu Bala bix at sendu.me.uk
Thu Dec 20 23:29:30 UTC 2007


Amir Karger wrote:
>> Amir Karger wrote:
>>>> It would be nice to code up a lazy sequence object and related  
>>>> parsers; maybe for the next dev release.
>>> Also, BLAST parsing. Blasting the proteome against the 
>>> genome makes for rather large result files.
>> This has already been done. Use Bio::SearchIO::blast_pull. In a 
>> situation like yours I dropped run time from 20223s to
>> 951s (~20x faster) and memory usage from over 8GB to less 
>> than 5GB (~40% less).
> 
> Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
> can put in my own perl lib for this, or does it require large bunches of
> new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
> here, but I don't see our whole center using CVS Bioperl.

blast_pull is only in CVS (and needs a whole bunch of associated modules 
to work), though 1.5.2 also contains significant improvements to 
SearchIO generally which should provide you with significant speed 
improvements during blast parsing with the normal Bio::SearchIO::blast.



More information about the Bioperl-l mailing list