[Bioperl-l] Standaloneblastplus: update

Fields, Christopher J cjfields at illinois.edu
Fri Sep 13 13:01:57 UTC 2013


If you want to include this as an edit to the code, you could simply fork the code on github, make and commit the changes to the fork, then submit a pull request.  We do request that you test the code out or add new tests (e.g. using the bioperl test suite) prior to submitting it, just to make sure everything works fine.

chris

On Sep 13, 2013, at 3:30 AM, dimitark at bii.a-star.edu.sg wrote:

> Hi guys,
> i managed to solve my problem by modifying the StandAloneBlastPLus.pm and BlastMethods.pm.
> 
> In 'sub run()' from BlastMethods added an option TEMPDIR:
> 
> sub run {
>    my $self = shift;
>    my @args = @_;
>    # DIMITAR: added $tempdir so i can pass a tempdir for each thread i create
>    my ($method, $query, $outfile, $outformat, $method_args,$tempdir) = $self->_rearrange( [qw(
> 					METHOD
>                                         QUERY
>                                         OUTFILE
>                                         OUTFORMAT
>                                         METHOD_ARGS
> 					 TEMPDIR
>                                         )], @args);
> 
> Then line 261 in BlastMethods, passing the tempdir:
> 
>    $blast_args{-query} = $self->_fastize($query,$tempdir);
> 
> Then in StandAloneBlastPLus in _fastize():
> 
>   sub _fastize {
>    my $self = shift;
>    my $data = shift;
>    my $tempdir=shift; # <--- ADDED THIS
> 
> 
> And further changed here:
> 
>   		my $fh = File::Temp->new(TEMPLATE => 'DBDXXXXXXXXXX',
> 					 UNLINK => 0,
> 					 DIR => $tempdir, # <--- CHANGED HERE
> 					 SUFFIX => '.fas');
> 
> 
> Well its quite dirty workaround but it works fine. Now i can do the following:
>  In my script i can create start several threads which have the same factory and for each thread i create a separate TEMPDIR in which is created the temp .FAS(holding the query). That way i can make a better use of my CPU threads.
> For example: instead of running a single blast with 40 CPU threads which process a fasta file with 250K seqs now i can start 5 instances of blast processing 50K seqs each. And each instance using 8 CPU threads.
> 
> I did this because:
> 
> a) when i run several instances of blast and they all create their temp files in the same directory. And even tho the temp files use this RAND mix of characters still some weird errors were happening and some blast instances were broken.
> 
> b) i noticed that when i process a large fasta file the blast at first starts well but is getting slower with time. I mean slower with each seq being blasted. The further down the fasta the slower the blast.
> 
> If someone else is interested in this kind of functionality i suppose i can edit further the file so that is cleaner and consistent throughout. Also now the tempdir must be explicitly  given i can make it like that:
> 
> if(! $tempdir){
>   $tempdir=$self->db_dir;
> }
> 
> which will default it as before in DB_DIR. Or some other way which achieves the same.
> 
> Cheers
> D.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list