[Biopython] Calculating the Hamming distance

Olav Zimmermann olav.zimmermann at fz-juelich.de
Thu Jun 27 11:31:02 UTC 2013


Hi Michiel,

you can use globalmd instead of globalms and then choose different gap
penalties for DNA and RNA, e.g.

align.localmd(dna,rna,3,-1,-2,-1,-6,-4) , i.e.
align.globalmd(dna,rna,3,-1,-2,-1,-6,-4,penalize_end_gaps=False)

should give your preferred alignment.

Cheers

Olav



On 06/27/13 11:08, Michiel de Hoon wrote:
> Hi Philipp,
>
> Maybe the sequence alignment doesn't show up clearly in the email, but the two sequences do match very well. The Hamming distance is only 4 (i.e. 4 mismatches/insertions/deletions).
>
> Best,
> -Michiel.
>
>
>
>
> ________________________________
>  From: Philipp Schiffer <philipp.schiffer at gmail.com>
> To: Michiel de Hoon <mjldehoon at yahoo.com>
> Cc: "biopython at biopython.org" <biopython at biopython.org>
> Sent: Thursday, June 27, 2013 4:47 PM
> Subject: Re: [Biopython] Calculating the Hamming distance
>
>
>
> Hi Michiel,
>
> maybe I am thick here (or lack the biological) knowledge, but to me it looks as if your sequence just don't match. Thus the Bio.pairwise2 alignment is 'correct' in terms if alignment.
>
> Cheers
>
> Philipp
>
>
> --
> Philipp Schiffer
> Sent with Sparrow
>
> On Thursday, 27. June 2013 at 09:13, Michiel de Hoon wrote:
> Dear all,
>>
>> I am trying to align a small RNA sequence to a (shortish) DNA sequence.
>> The alignment I am looking for is:
>>
>>
>>
>>
>> AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGAT--TCCCGGTCAGGGAACCA-
>>                                   GGATGATCCCGGTCAGGGAACCAA
>>
>>
>> where the first sequence is the DNA and the second sequence is the RNA.
>> The Hamming distance is 4 (the initial mismatch, the 2 insertions, and the gap at the end).
>>
>>
>> If I try to calculate this alignment with Bio.pairwise2, I get the following if I use
>> globalms(dna, rna, 0, -1, -1, -1, penalize_end_gaps=True):
>>
>>
>> AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGATTCCCGGTCAGGGAACC-A
>> -GGAT--G--------A---------------------TCCCGGTCAGGGAACCAA
>>
>>
>> However, if I set penalize_end_gaps to False, I get
>>
>>
>> -----------------------AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGATTCCCGGTCAGGGAACCA
>> GGATGATCCCGGTCAGGGAACCAA------------------------------------------------------
>>
>>
>> I guess the solution is to penalize end gaps in the DNA but not in the RNA.
>> I could modify Bio.parwise2 to allow for that possibility, but before I do so, I was wondering if there are any other ways to find the desired alignment with Biopython (preferably without using 3rd-party software).
>>
>>
>> Thanks,
>> -Michiel.
>> _______________________________________________
>> Biopython mailing list  - Biopython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------




More information about the Biopython mailing list