[Biopython] Calculating the Hamming distance

Michiel de Hoon mjldehoon at yahoo.com
Fri Jun 28 08:44:18 UTC 2013


Hi Olav,
Thanks for your reply. I agree that using different gap penalties for DNA and RNA will result in the preferred alignment, but I was hoping there was some easy way to keep the same scoring scheme while applying penalties at the ends of the alignment to one sequence but not the other. In the end, I modified Bio.pairwise2 to generalize the penalize_end_gaps parameter to allow the tuples of two Booleans, specifying how gaps at the alignment ends should be treated for the two sequences separately. See
https://github.com/biopython/biopython/commit/c65e3cf17fe58c859d071fe061deca1ee2a16a5d#Bio/pairwise2.py

Thanks,
-Michiel.






________________________________
 From: Olav Zimmermann <olav.zimmermann at fz-juelich.de>
To: biopython at lists.open-bio.org 
Sent: Thursday, June 27, 2013 8:31 PM
Subject: Re: [Biopython] Calculating the Hamming distance
 

Hi Michiel,

you can use globalmd instead of globalms and then choose different gap
penalties for DNA and RNA, e.g.

align.localmd(dna,rna,3,-1,-2,-1,-6,-4) , i.e.
align.globalmd(dna,rna,3,-1,-2,-1,-6,-4,penalize_end_gaps=False)

should give your preferred alignment.

Cheers

Olav



On 06/27/13 11:08, Michiel de Hoon wrote:
> Hi Philipp,
>
> Maybe the sequence alignment doesn't show up clearly in the email, but the two sequences do match very well. The Hamming distance is only 4 (i.e. 4 mismatches/insertions/deletions).
>
> Best,
> -Michiel.
>
>
>
>
> ________________________________
>  From: Philipp Schiffer <philipp.schiffer at gmail.com>
> To: Michiel de Hoon <mjldehoon at yahoo.com>
> Cc: "biopython at biopython.org" <biopython at biopython.org>
> Sent: Thursday, June 27, 2013 4:47 PM
> Subject: Re: [Biopython] Calculating the Hamming distance
>
>
>
> Hi Michiel,
>
> maybe I am thick here (or lack the biological) knowledge, but to me it looks as if your sequence just don't match. Thus the Bio.pairwise2 alignment is 'correct' in terms if alignment.
>
> Cheers
>
> Philipp
>
>
> --
> Philipp Schiffer
> Sent with Sparrow
>
> On Thursday, 27. June 2013 at 09:13, Michiel de Hoon wrote:
> Dear all,
>>
>> I am trying to align a small RNA sequence to a (shortish) DNA sequence.
>> The alignment I am looking for is:
>>
>>
>>
>>
>> AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGAT--TCCCGGTCAGGGAACCA-
>>                                   GGATGATCCCGGTCAGGGAACCAA
>>
>>
>> where the first sequence is the DNA and the second sequence is the RNA.
>> The Hamming distance is 4 (the initial mismatch, the 2 insertions, and the gap at the end).
>>
>>
>> If I try to calculate this alignment with Bio.pairwise2, I get the following if I use
>> globalms(dna, rna, 0, -1, -1, -1, penalize_end_gaps=True):
>>
>>
>> AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGATTCCCGGTCAGGGAACC-A
>> -GGAT--G--------A---------------------TCCCGGTCAGGGAACCAA
>>
>>
>> However, if I set penalize_end_gaps to False, I get
>>
>>
>> -----------------------AGGATTCGGCGCTCTCACCGCCGCGGCCCGGGTTCGATTCCCGGTCAGGGAACCA
>> GGATGATCCCGGTCAGGGAACCAA------------------------------------------------------
>>
>>
>> I guess the solution is to penalize end gaps in the DNA but not in the RNA.
>> I could modify Bio.parwise2 to allow for that possibility, but before I do so, I was wondering if there are any other ways to find the desired alignment with Biopython (preferably without using 3rd-party software).
>>
>>
>> Thanks,
>> -Michiel.
>> _______________________________________________
>> Biopython mailing list  - Biopython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

_______________________________________________
Biopython mailing list  -  Biopython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list