[Biojava-l] Calculating edit distance between 2 DNA Sequences

Hannes Brandstätter-Müller biojava at hannes.oib.com
Wed Nov 9 04:01:15 EST 2011


Thanks.

I am thinking about implementing a modified MatrixAligner to fit my
needs here. Direct Levenstein Distance is not exactly right for my
application, because there are 0 to n insertions/deletions.

A direct LevensteinDistance as implemented in would not give me all
the Information I need.

Thanks for the hints and input.

Hannes

PS we seriously need more examples in the Cookbook - I'll submit some
later, but the modified N-W Aligner mentioned below would make a good
example too, don't you think? ;)

On Mon, Nov 7, 2011 at 16:03,  <forumjspro at gmail.com> wrote:
> Hi Hannes,
>
> You could do such a comparison by using the Needleman-Wunsh aligner with gap penalty set to -1 and the matrix set to -1 for mismatches and 0 for matches. The absolute value of the resulting score is exactly the number of errors.
>
> But it will not stop when a maximal number of errors is reached ...
>
> JS
>
> Le 7 nov. 2011 à 15:42, Andreas Prlic a écrit :
>
>> Hi Hannes,
>>
>> you are right, this does not exist yet. Somebody else asked the same
>> question a few weeks ago. As such it would be great if you could
>> provide a patch, there might be other people interested in that, too.
>>
>> Andreas
>>
>> On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandstätter-Müller
>> <biojava at hannes.oib.com> wrote:
>>> Following up:
>>>
>>> If there is no such thing, should I make it available if I write it?
>>>
>>> Hannes
>>>
>>> On Thu, Nov 3, 2011 at 14:08, Hannes Brandstätter-Müller
>>> <biojava at hannes.oib.com> wrote:
>>>> Hi!
>>>>
>>>> Is there a Class/Method in Biojava that calculates the Levenshtein
>>>> distance between two sequences? I could not find anything in the docs
>>>> at first search.
>>>>
>>>> I need to compare 2 DNASequences (or Strings) and get the number of
>>>> insertions, deletions, and substitutions. Ideally, there would be an
>>>> option to abort the comparison if the number of mismatches exceeds a
>>>> certain number.
>>>>
>>>> Hannes
>>>>
>>>
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>



More information about the Biojava-l mailing list