[Biopython] Calculating the Hamming distance
Michiel de Hoon
mjldehoon at yahoo.com
Thu Jun 27 07:13:25 UTC 2013
Dear all,
I am trying to align a small RNA sequence to a (shortish) DNA sequence.
The alignment I am looking for is:
AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGAT--TCCCGGTCAGGGAACCA-
GGATGATCCCGGTCAGGGAACCAA
where the first sequence is the DNA and the second sequence is the RNA.
The Hamming distance is 4 (the initial mismatch, the 2 insertions, and the gap at the end).
If I try to calculate this alignment with Bio.pairwise2, I get the following if I use
globalms(dna, rna, 0, -1, -1, -1, penalize_end_gaps=True):
AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGATTCCCGGTCAGGGAACC-A
-GGAT--G--------A---------------------TCCCGGTCAGGGAACCAA
However, if I set penalize_end_gaps to False, I get
-----------------------AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGATTCCCGGTCAGGGAACCA
GGATGATCCCGGTCAGGGAACCAA------------------------------------------------------
I guess the solution is to penalize end gaps in the DNA but not in the RNA.
I could modify Bio.parwise2 to allow for that possibility, but before I do so, I was wondering if there are any other ways to find the desired alignment with Biopython (preferably without using 3rd-party software).
Thanks,
-Michiel.
More information about the Biopython
mailing list