[Biopython] Levenshtein vs. blast sequence similarity

Frederico Moraes Ferreira ferreirafm at usp.br
Tue Mar 25 13:48:35 UTC 2014


Biopython list,
Sorry about this perhaps off-topic question concerning more to the use 
than the algorithm implementation of sequence similarity tools. Feel 
free to send answers directly to my e-mail if you judge it's 
inappropriate to the list contends.
I would like to compare the sequence similarity (Blast "Positive" 
output) and/or the Levenshtein score of four groups of sequences 
(variable region!) against a given peptide and use a multiple comparison 
test to support the hypothesis that such peptide is more closely relate 
to one group than another. My original implementation was done using the 
ratio between the Blast positive score and the peptide length. Well, 
I've read that the Levenshtein distance is generally considered to be 
more suitable for distance measures of biological sequences. On the 
other side, similarity includes additional information like conservative 
and semi-conservative replacements. So, I'm writing to ask your opinion 
about this topic and perhaps get another score function to tackle this 
problem. Any comments are appreciated.
Best,
Fred

P.S.: at the moment I'm ignoring the multiple Blats hsps matches and 
considering only the highest positives per comparison mate.





More information about the Biopython mailing list