[Bioperl-l] Remote blast fork errors / Process limit restrictions

Mon Dec 7 21:08:40 UTC 2009

Robert, 

If you use the relative components directly (by that I mean use Bio::DB::GenBank and Bio::Tools::Run::RemoteBlast instead of Bio::Perl), you can control whether the process forks or not.  All Bio::Perl does is wrap those modules for simple beginner tasks; if you want full control over the various parts of the pipeline you will need to use those tools directly.

See the POD for those specific modules for more information.

chris

On Dec 7, 2009, at 2:41 PM, Robert Bradbury wrote:

> This comment could also have a subject line: "Why does Bioperl/get_sequence>
> fork at all!  Why are not all operations sequential?  And if this is a
> "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl
> script if I have little or no capability of what the program uses when it
> runs?  I may have days so I can bear the burden of relatively slow results
> (and so can use sequential processing rather than parallel).
> 
> I've got a perl script that uses remote blast to blast a sequence against a
> subset of the NCBI sequences.  It "mostly" works, in that it returns a
> seemingly complete .bls result file but when attempting to look at the
> sequences (so it can more accurately summarize the information from the
> results than a standard blast report allows) it terminates prematurely with
> errors.
> 
> The error is:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Couldn't fork: Resource temporarily unavailable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::WebDBSeqI::_open_pipe
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
> STACK: Bio::Perl::get_sequence
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
> STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
> STACK: /home/bradbury/Genomes/bin/RB.pl:155
> -----------------------------------------------------------
> 
> The precise line (in my code) whcih appears to be generating the error is:
>    $seq = get_sequence('GenBank', $accsn);
> 
> Now this can be a problem if NCBI/Genbank fails due to load conditions --
> but this specific failure (which is repeatable is due to most likely hitting
> the user process limit restrictions) -- but the small blast results work
> fine -- its only if the Blast has returned several hundred hits that it runs
> into this problem.
> 
> Now what it sounds like to me is an attempt to do multiple asynchronous NCBI
> queries (to get a sequence) with complete disregard of the environment
> (process limits, NCBI limits, etc.).  But I do not know enough about how
> this works to point a finger at some specific function.  As a result
> get_sequence process results are accumulated, summarized, etc. without ever
> having issued to respect "wait-variant()) calls to collect former children
> [This IMO would clearly be a bug.]
> 
> It could be adjusted to by allowing the BioPerl library to run in 3 modes.
> (1) completely synchronous -- if you fork you wait until its done -- and
> you collect "it" and any fork fails then one either collects the process or
> switches to the non-conservative mode.
> 
> Robert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l