[Biopython] Blast DB keeps crashing nodes

Dilara Ally dilara.ally at gmail.com
Sat Oct 15 21:55:21 UTC 2011


How many hits per sequence have you requested to get back - the default 
on the blastall is 250?   I did blast search on ~600,000 contigs but I 
set up simultaneous jobs across 34 nodes.  I used only the top 20 hits.  
Each file had 1000 fasta formatted sequences and each node was given ~12 
files.  But I still had to do it in two parts to get all sequences 
blasted. I waited until the first set finished to set up the second 
blast job.  The job finished in 2 days.  Before I ran it on the cluster 
I tested a single file to see how long and how much memory it took.  The 
cluster I used had 34 computing nodes, with 16-48 cores and 16-64GB of 
memory.

Hope that helps.

On 10/15/11 1:59 PM, Willis, Jordan R wrote:
> Hello Biopython,
>
> I was wondering if anyone has worked extensively with the Blast Database locally.
>
> I am blasting millions of sequences using Biopython as my backend framework. I am using a high throughput computer cluster to blast each sequence. Rather than submit two million jobs, I have divided the fast files up into 50 or so.
>
> The problem I am facing is a memory issue. I'm not sure, but I think that the Database is cacheing itself and not clearing before the next sequence is queried. In that regard, the next job calls upon the database again, and so on….
>
> The memory builds up until it finally crashes the node. Has anyone dealt with this issue before?
>
> Thanks,
> Jordan
>
>
>
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



More information about the Biopython mailing list