[Biopython] Looking for a way to apply pairwise2 but really fast

Ivan Gregoretti ivangreg at gmail.com
Fri Jul 12 12:59:46 UTC 2013


Hello Biopythonians,

The pairwise2 function provides a very convenient way of aligning two
sequences. For example:

from Bio import pairwise2
aln = pairwise2.align.globalms(qseq1, sseq1, 2, -1, -.5, -.1)

where qseq1 and sseq1 are, to use BLAST jargon, query and subject sequences.


Now, I find that routinely I need to compare qseq1 to a set of many
subject sequences like, for example, [sseq1, sseq2, ..., sseq300].
When I do that, I notice that pairwise2 is extremely slow.


It gets worse: most of the time I need to pairwise align a million
query sequences to the set of 300 subjects. It is just impossible to
use pairwise2 as a solution.

Can somebody offer a strategy to make pairwise comparisons a doable
task within Biopython?

Note: I tried BLASTing from within Python but although it works, for
large number of sequences, it is only a matter of time before a BLAST
output bug shows up and it stalls your analysis pipeline. Not cool.

Thnak you.

Ivan


Ivan Gregoretti, PhD



More information about the Biopython mailing list