[Biopython] Calculating the Hamming distance

Philipp Schiffer philipp.schiffer at gmail.com
Thu Jun 27 07:47:37 UTC 2013


Hi Michiel, 

maybe I am thick here (or lack the biological) knowledge, but to me it looks as if your sequence just don't match. Thus the Bio.pairwise2 alignment is 'correct' in terms if alignment.

Cheers

Philipp 

-- 
Philipp Schiffer
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday, 27. June 2013 at 09:13, Michiel de Hoon wrote:

> Dear all,
> 
> I am trying to align a small RNA sequence to a (shortish) DNA sequence.
> The alignment I am looking for is:
> 
> 
> AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGAT--TCCCGGTCAGGGAACCA-
>                                   GGATGATCCCGGTCAGGGAACCAA
> 
> where the first sequence is the DNA and the second sequence is the RNA.
> The Hamming distance is 4 (the initial mismatch, the 2 insertions, and the gap at the end).
> 
> If I try to calculate this alignment with Bio.pairwise2, I get the following if I use
> globalms(dna, rna, 0, -1, -1, -1, penalize_end_gaps=True):
> 
> AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGATTCCCGGTCAGGGAACC-A
> -GGAT--G--------A---------------------TCCCGGTCAGGGAACCAA
> 
> However, if I set penalize_end_gaps to False, I get
> 
> -----------------------AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGATTCCCGGTCAGGGAACCA
> GGATGATCCCGGTCAGGGAACCAA------------------------------------------------------
> 
> I guess the solution is to penalize end gaps in the DNA but not in the RNA.
> I could modify Bio.parwise2 to allow for that possibility, but before I do so, I was wondering if there are any other ways to find the desired alignment with Biopython (preferably without using 3rd-party software).
> 
> Thanks,
> -Michiel.
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org (mailto:Biopython at lists.open-bio.org)
> http://lists.open-bio.org/mailman/listinfo/biopython
> 
> 





More information about the Biopython mailing list