[Biopython] comparing sequences.qustion
George Devaniranjan
devaniranjan at gmail.com
Wed Feb 8 01:01:31 UTC 2012
Hi,
I have a list of > 200, 000 UNIQUE short EQUAL length sequences.
I do the following
I am comparing ALL sequences against ALL sequences so there will be (200000
* 199999 )/2 comparisons
Once a sequence is compared, if they differ from one another by ONE letter
only . then I do another more detailed alignment using a BLOSUM matrix.
Currently I use the pairwise sequence comparison code found in BIOPYTHON
for both comparison, simple comparison where I set
match = 0
mismatch = -1
If the total alignment score is equal to -1 (meaning only one mismatch)
then I go a further step and do a BLOSUM alignment.
This works but its taking a long long time, I suspect its because I am
using TWO alignments but I think there could be a way to do the first
simple alignment WITHOUT using the pairwise alignment code for the first
part will speed up this calculation.
Unfortunately I don't have much more than a desktop to do this, so if
someone can suggest a quicker way to do this, I would appreciate it.
Thank you,
George
More information about the Biopython
mailing list