[BioPython] Non blocking blast.

Frank Kauff fkauff at duke.edu
Tue Apr 6 10:32:56 EDT 2004


Folks,

On Tue, 2004-04-06 at 01:00, Andrew Dalke wrote:
> pieter at kotnet.org:
> > Is there a way to use biopython to submit let say 100 jobs to blast.
> > Without waiting for them, storing the request ids. And than afterwards
> > reading the ids and checking which results are available?
> 
> NCBI won't like it if you do 100 BLASTs at once, but let's suppose
> it's a hypothetical.
> 
> Biopython's BLAST looks like a function call.  That it, it hides
> that it's doing network I/O.  The standard way to parallalize it
> is to use threads, and for this the standard idiom is boss/worker.
> One thread creates two Queue.Queue instances, one for job
> requests and the other for job results.  It then starts up N
> other threads, each of which know about the Queues.  The boss
> thread submits the jobs (as a simple data structure) to the
> queue.  Each worker thread does a get on the queue to get the
> next job and does the Biopython BLAST request.  When done, the
> worker thread returns the information in the results Queue.
> While waiting the boss thread can do whatever else is needed.
> 

I've a little (crude) script ready that does that, blasting a fasta file
of sequences using threads. It can be useful for blasting a 96 plate of
sequences overnight.
But be careful - as Jeff mentioned, blast is a shared resource:
- for each additional request in the blast queue, you'll get a 60 (or
so) seconds penalty from NCBI: 60s for the second, 120s for the third,
etc. Makes to many threads quite unattractive...
- If you start too many blasts in a short time, after hitting some limit
the only response will be a nice page saying 'Access denied due to
possible misuse', and your IP will be blocked from further access to
ncbi blast... You'll then have to write them a nice email and beg for
grace. Happend to me while testing some automated blast feature :-) But
the limit seems to be several 100 requests in like 24h, which is quite a
lot.

If you're interested in the script, send me an email.

Frank

> Aahz wrote some documentation about this idiom ... probably
>    http://starship.python.net/crew/aahz/OSCON2001/
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> BioPython mailing list  -  BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
-- 
Frank Kauff
Dept. of Biology
Duke University
Box 90338
Durham, NC 27708
USA

Phone 919-660-7382
Fax 919-660-7293



More information about the BioPython mailing list