[Biopython] Parsing large blast files

Stefanie Lück lueck at ipk-gatersleben.de
Tue Apr 28 08:23:02 UTC 2009


Thanks Peter!
>You could set the expectation threshold (I don't think there is an
>identity threshold which would be ideal for your example).

I can't say what will be the expectation treshold. This won't work.

>If you only want the single BEST hit for a query, set the number of
>alignments and/or descriptions to show to just one (these do different
>things in the plain text output - maybe for XML output you only need
>to limit the number of alignments).  This should give a much smaller
>file, which will be fast to parse.

This is to risky. There might be several 100 % hits which I need.

>Finally, and perhaps most importantly - don't do an individual BLAST
>query for each record.  Instead, prepare a FASTA file of ALL your
>queries, and use that as the input to BLAST.  This way there is only
>one command line call, and the BLAST database is only loaded into
>memory once.

Cool, I didn't know that this will work! Great, that's very nice! 50 % time 
speed up!

Thanks Peter and have a nice day!
Stefanie




More information about the Biopython mailing list