[Biopython] Levenshtein vs. blast sequence similarity
Frederico Moraes Ferreira
ferreirafm at usp.br
Tue Mar 25 13:48:35 UTC 2014
Biopython list,
Sorry about this perhaps off-topic question concerning more to the use
than the algorithm implementation of sequence similarity tools. Feel
free to send answers directly to my e-mail if you judge it's
inappropriate to the list contends.
I would like to compare the sequence similarity (Blast "Positive"
output) and/or the Levenshtein score of four groups of sequences
(variable region!) against a given peptide and use a multiple comparison
test to support the hypothesis that such peptide is more closely relate
to one group than another. My original implementation was done using the
ratio between the Blast positive score and the peptide length. Well,
I've read that the Levenshtein distance is generally considered to be
more suitable for distance measures of biological sequences. On the
other side, similarity includes additional information like conservative
and semi-conservative replacements. So, I'm writing to ask your opinion
about this topic and perhaps get another score function to tackle this
problem. Any comments are appreciated.
Best,
Fred
P.S.: at the moment I'm ignoring the multiple Blats hsps matches and
considering only the highest positives per comparison mate.
More information about the Biopython
mailing list