[BioPython] Non blocking blast.
Frank Kauff
fkauff at duke.edu
Tue Apr 6 10:32:56 EDT 2004
Folks,
On Tue, 2004-04-06 at 01:00, Andrew Dalke wrote:
> pieter at kotnet.org:
> > Is there a way to use biopython to submit let say 100 jobs to blast.
> > Without waiting for them, storing the request ids. And than afterwards
> > reading the ids and checking which results are available?
>
> NCBI won't like it if you do 100 BLASTs at once, but let's suppose
> it's a hypothetical.
>
> Biopython's BLAST looks like a function call. That it, it hides
> that it's doing network I/O. The standard way to parallalize it
> is to use threads, and for this the standard idiom is boss/worker.
> One thread creates two Queue.Queue instances, one for job
> requests and the other for job results. It then starts up N
> other threads, each of which know about the Queues. The boss
> thread submits the jobs (as a simple data structure) to the
> queue. Each worker thread does a get on the queue to get the
> next job and does the Biopython BLAST request. When done, the
> worker thread returns the information in the results Queue.
> While waiting the boss thread can do whatever else is needed.
>
I've a little (crude) script ready that does that, blasting a fasta file
of sequences using threads. It can be useful for blasting a 96 plate of
sequences overnight.
But be careful - as Jeff mentioned, blast is a shared resource:
- for each additional request in the blast queue, you'll get a 60 (or
so) seconds penalty from NCBI: 60s for the second, 120s for the third,
etc. Makes to many threads quite unattractive...
- If you start too many blasts in a short time, after hitting some limit
the only response will be a nice page saying 'Access denied due to
possible misuse', and your IP will be blocked from further access to
ncbi blast... You'll then have to write them a nice email and beg for
grace. Happend to me while testing some automated blast feature :-) But
the limit seems to be several 100 requests in like 24h, which is quite a
lot.
If you're interested in the script, send me an email.
Frank
> Aahz wrote some documentation about this idiom ... probably
> http://starship.python.net/crew/aahz/OSCON2001/
>
> Andrew
> dalke at dalkescientific.com
>
> _______________________________________________
> BioPython mailing list - BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
--
Frank Kauff
Dept. of Biology
Duke University
Box 90338
Durham, NC 27708
USA
Phone 919-660-7382
Fax 919-660-7293
More information about the BioPython
mailing list