[Biopython] multiple sequence blast

Peter Cock p.j.a.cock at googlemail.com
Sun Jul 3 23:52:34 UTC 2011


On Sun, Jul 3, 2011 at 8:27 PM, Dilara Ally wrote:
> Hi Peter
>
> How long will it take then to do a big BLAST job that has over
> 600,000 contigs.

How long is a piece of string? ;)

What I mean is this is hard to say without looking at your data.
Do you know the total sequence length of the contigs?

Try doing 60 representative contigs on your machine for an
estimate (note their lengths are important - shorter contigs
should be faster to run as BLAST queries).

Remember that standalone BLAST+ can be run multi-threaded.
It will depend on the number of CPUs and how much RAM you
have.

> Wouldn't downloading the databasese and doing a standalone
> BLAST take a lot of cpu memory?

Yes, it will take a lot of CPU time, and a moderate amount of
RAM (if you are doing genome assembly to get the contigs,
that will probably have needed far more RAM than running
BLAST will).

> Should I be doing this on a cluster?

It would probably be worth while. You *might* manage with a
powerful multicore desktop (like a recent MacPro or similar)
or powerful server.

Peter




More information about the Biopython mailing list