[Biopython] Calculating the Hamming distance

Michiel de Hoon mjldehoon at yahoo.com
Thu Jun 27 07:13:25 UTC 2013


Dear all,

I am trying to align a small RNA sequence to a (shortish) DNA sequence.
The alignment I am looking for is:


AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGAT--TCCCGGTCAGGGAACCA-
                                  GGATGATCCCGGTCAGGGAACCAA

where the first sequence is the DNA and the second sequence is the RNA.
The Hamming distance is 4 (the initial mismatch, the 2 insertions, and the gap at the end).

If I try to calculate this alignment with Bio.pairwise2, I get the following if I use
globalms(dna, rna, 0, -1, -1, -1, penalize_end_gaps=True):

AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGATTCCCGGTCAGGGAACC-A
-GGAT--G--------A---------------------TCCCGGTCAGGGAACCAA

However, if I set penalize_end_gaps to False, I get

-----------------------AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGATTCCCGGTCAGGGAACCA
GGATGATCCCGGTCAGGGAACCAA------------------------------------------------------

I guess the solution is to penalize end gaps in the DNA but not in the RNA.
I could modify Bio.parwise2 to allow for that possibility, but before I do so, I was wondering if there are any other ways to find the desired alignment with Biopython (preferably without using 3rd-party software).

Thanks,
-Michiel.



More information about the Biopython mailing list