[Bioperl-l] Remote blast fork errors / Process limit restrictions
Jason Stajich
jason at bioperl.org
Mon Dec 7 16:24:54 EST 2009
Robert -
You seem to be mixing the blast remote and the sequence query
retrieval problems. These messages are related to the remote retrieval
of sequences.
It is hard to tell from your message specifically which modules you
are using or how you are querying NCBI - there are several ways to do
this either with the NCBI tools or the Bio::DB::GenBank.
If you are using Bio::DB::Query::GenBank that allows for async
access and has built in controls to adhere to the wait variant that
NCBI requests but I don't think Bio::DB::GenBank get_Seq_by_acc method
does any sort of thing (at least when it was originally written).
I always advocate if you want highly available and reliable access to
sequences you should download the nr or whichever DB and use the local
indexing tools for the retrieval. Once you start doing hundreds of
queries I don't see any good reason to be doing the query against NCBI
directly given unreliabilities of the web and services. Local
databases are faster and more reliable for most people so I urge you
take advantage of the tools which provide local database access with
the same APIs.
I would like to comment that the tone of your posts to the list are
not particularly helpful. I wonder if you are actually asking for
help or just interested in complaining about when things don't work as
you expect? This is a collaborative and volunteer-only project, with
the principles of working together to make useful toolkit. We
encourage you to build programs and applications from this base that
suit your needs, but not all things will be directly implemented in
the toolkit if they aren't generic enough (at least that is my
feeling, the other Core devs help with these decisions).
If there is a useful, generic, and reusable part we would like that
to be part of the API. Otherwise we suggest the new application that
fits a developer's vision. We encourage you to write (and publish)
that application separately, but certainly encourage bug (and fixes)
submissions and also code contributions for new features where they
can be seen as generally useful.
-jason
On Dec 7, 2009, at 12:41 PM, Robert Bradbury wrote:
> This comment could also have a subject line: "Why does Bioperl/
> get_sequence>
> fork at all! Why are not all operations sequential? And if this is a
> "default" mode that I'm unaware of -- How to I ever write a reliable
> BioPerl
> script if I have little or no capability of what the program uses
> when it
> runs? I may have days so I can bear the burden of relatively slow
> results
> (and so can use sequential processing rather than parallel).
>
> I've got a perl script that uses remote blast to blast a sequence
> against a
> subset of the NCBI sequences. It "mostly" works, in that it returns a
> seemingly complete .bls result file but when attempting to look at the
> sequences (so it can more accurately summarize the information from
> the
> results than a standard blast report allows) it terminates
> prematurely with
> errors.
>
> The error is:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Couldn't fork: Resource temporarily unavailable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::WebDBSeqI::_open_pipe
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
> STACK: Bio::Perl::get_sequence
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
> STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
> STACK: /home/bradbury/Genomes/bin/RB.pl:155
> -----------------------------------------------------------
>
> The precise line (in my code) whcih appears to be generating the
> error is:
> $seq = get_sequence('GenBank', $accsn);
>
> Now this can be a problem if NCBI/Genbank fails due to load
> conditions --
> but this specific failure (which is repeatable is due to most likely
> hitting
> the user process limit restrictions) -- but the small blast results
> work
> fine -- its only if the Blast has returned several hundred hits that
> it runs
> into this problem.
>
> Now what it sounds like to me is an attempt to do multiple
> asynchronous NCBI
> queries (to get a sequence) with complete disregard of the environment
> (process limits, NCBI limits, etc.). But I do not know enough about
> how
> this works to point a finger at some specific function. As a result
> get_sequence process results are accumulated, summarized, etc.
> without ever
> having issued to respect "wait-variant()) calls to collect former
> children
> [This IMO would clearly be a bug.]
>
> It could be adjusted to by allowing the BioPerl library to run in 3
> modes.
> (1) completely synchronous -- if you fork you wait until its done --
> and
> you collect "it" and any fork fails then one either collects the
> process or
> switches to the non-conservative mode.
>
> Robert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
More information about the Bioperl-l
mailing list