[Biopython] Looking for a way to apply pairwise2 but really fast
Ivan Gregoretti
ivangreg at gmail.com
Fri Jul 12 12:59:46 UTC 2013
Hello Biopythonians,
The pairwise2 function provides a very convenient way of aligning two
sequences. For example:
from Bio import pairwise2
aln = pairwise2.align.globalms(qseq1, sseq1, 2, -1, -.5, -.1)
where qseq1 and sseq1 are, to use BLAST jargon, query and subject sequences.
Now, I find that routinely I need to compare qseq1 to a set of many
subject sequences like, for example, [sseq1, sseq2, ..., sseq300].
When I do that, I notice that pairwise2 is extremely slow.
It gets worse: most of the time I need to pairwise align a million
query sequences to the set of 300 subjects. It is just impossible to
use pairwise2 as a solution.
Can somebody offer a strategy to make pairwise comparisons a doable
task within Biopython?
Note: I tried BLASTing from within Python but although it works, for
large number of sequences, it is only a matter of time before a BLAST
output bug shows up and it stalls your analysis pipeline. Not cool.
Thnak you.
Ivan
Ivan Gregoretti, PhD
More information about the Biopython
mailing list