[Biopython] comparing sequences.qustion

George Devaniranjan devaniranjan at gmail.com
Wed Feb 8 01:01:31 UTC 2012


Hi,

I have a list of > 200, 000   UNIQUE short EQUAL length sequences.
I do the following

I am comparing ALL sequences against ALL sequences so there will be (200000
* 199999 )/2 comparisons
Once a sequence is compared, if they differ from one another by ONE letter
only . then I do another more detailed alignment using a BLOSUM matrix.

Currently I use the pairwise sequence comparison code found in BIOPYTHON
for both comparison, simple comparison where I set
match = 0
mismatch = -1
If the total alignment score is equal to -1 (meaning only one mismatch)
then I go a further step and do a BLOSUM alignment.

This works but its taking a long long time, I suspect its because I am
using TWO alignments but I think there could be a way to do the first
simple alignment WITHOUT using the pairwise alignment code for the first
part will speed up this calculation.
Unfortunately I don't have much more than a desktop to do this, so if
someone can suggest a quicker way to do this, I would appreciate it.

Thank you,
George



More information about the Biopython mailing list