[Bioperl-l] Proposed bioperl module for local running of the NCBI standalone blastpackage

David Block dblock@gene.pbi.nrc.ca
Mon, 16 Oct 2000 09:42:19 -0600 (CST)


On a related note, check http://bioinfo.pbi.nrc.ca/dblock/wiki for
ParallelBlast, a project we have been working on to dynamically split the
database and blast hits on a cluster (32 nodes right now).  Pretty much
linear speedup, and automatic parsing by the server.  It's amazing, and it
will be open source soon, as well.

I'll keep you posted.

-Dave

On Mon, 16 Oct 2000, Hilmar Lapp wrote:

> Peter Schattner wrote:
> > 
> > So instead I propose to write a relatively "light weight"  bioperl
> > wrapper module for running the NCBI standalone blast package.  Its
> > format would be similar to that of the Clustalw.pm module. I believe its
> > approach would also be similar to that of the Jeff Chang's biopython
> > NCBIStandalone.py module (thanks to Brad Chapman for bringing this
> > module to my attention).
> > 
> > The syntax of the proposed module would involve creating a local blast
> > "factory object". The constructor would be passed the name of the blast
> > method and database to be used, the desired method for parsing the blast
> > report (Blast or BPlite) and an optional array of (non-default)
> > parameters to be used by the factory, eg:
> > 
> > @params = ('method' => 'blastn', 'database' => 'ecoli.nt','outformat' => 'BPlite');
> > $factory = Bio::Tools::StandAloneBlast->new(@params);
> > 
> 
> Sounds good to me, and is certainly useful. We (and certainly a lot of
> others :) are already calling the stand-alone BLAST from within Perl as a
> system call, but your proposal is certainly much more transparent and
> re-usable, and I like the factory idea. 
> 
> Basically, I have one comment. It would be very helpful if such a module
> could also support running stand-alone BLASTs in parallel, e.g., if
> you've got a multi-processor machine. I know that the current NCBI BLAST
> supports multi-threading, but on well-equipped machines it often scales
> better to run multiple processes. So, the idea is then that I can pass an
> array of Seq objects and these will be run in parallel, returning an
> array of BPlite or Blast.pm objects. At the low-level, there may be a
> memory issue if the array is a few hundreds or thousands seqs long (which
> it is for us). So, instead of returning a full array of result objects,
> one may consider a callback invoked for each finished report.
> 
> 	Hilmar
> 
> 

-- 
David Block
dblock@gene.pbi.nrc.ca
http://bioinfo.pbi.nrc.ca/dblock/wiki
Plant Biotechnology Institute
National Research Council of Canada
Saskatoon, Saskatchewan