[Biojava-l] unwanted gap in alignments
Andrew Walsh
drandrewwalsh at gmail.com
Fri Jan 14 16:22:55 UTC 2011
Changing the gap penalty isn't making a difference because both versions
have the same number of gaps and gaps of the same length. Penalizing
end gaps might address the first example, but not the second.
Since the gaps are the same (from the point of view of how gaps are
scored by the algorithms), what is actually driving the output is the
substitution penalties. In the PSA example, the preferred alignment has
an 'R' substituted for a 'G', whereas the unwanted output has 'R'
substituted for 'S'. The latter is more common substitution since it
is more conservative from the point of view of amino acid chemistry and
may also require fewer mutations (although that depends on the codon
usage for both 'R' and 'S'). Thus it will get a lower penalty, so most
algorithms will prefer the unwanted PSA over your expected output.
A similar reasoning applies to the MSA example. In the unwanted
version, it is matching 'G' to 'G', which is not a substitution at all
and thus gets a higher score than the 'V' to 'G' substitution required
for the expected output.
Now, I can understand why, in the PSA example an end gap seems more
likely than an internal gap, and in the MSA example one deletion event
seems more likely than two similar but slightly different deletion
events. But the math of the traditional alignment algorithms just won't
support those outputs.
Unfortunately, I don't have a good answer for how to make BioJava output
your desired result. But it is my hope that clarifying the problem
might be a useful step in arriving at a solution.
Incidentally, does your desired output come directly from a particular
alignment algorithm, or have they been hand-adjusted?
-Andy Walsh
On 1/14/2011 10:45 AM, Andreas Prlic wrote:
> looks a bit like an end-gap issue to me. I think the global alignment
> algorithm does not penalize end gaps. Try a local alignment (smith
> waterman) instead.
>
> Andreas
>
>
>
> On Fri, Jan 14, 2011 at 2:32 AM, Khalil El Mazouari
> <khalil.elmazouari at gmail.com> wrote:
>> Hi All,
>>
>> I am testing the PSA and MSA examples from Cookbook3.
>>
>> Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. below:
>>
>> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS
>> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R
>>
>> expected PSA was:
>> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS
>> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR-------------------
>>
>>
>> the same for MSA
>> DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT
>> EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR-----------------
>> EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR-----------------
>> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
>> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
>>
>> expected MSA
>> DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT
>> EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR-----------------
>> EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR-----------------
>> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
>> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
>>
>>
>> I have tested different gop/gep and LOCAL/GLOBAL PSA . No success!
>>
>> How can I force or avoid the gap creation at specific positions?
>>
>> Many thanks.
>>
>> Khalil
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list