[Biojava-l] Calculating edit distance between 2 DNA Sequences

Andreas Prlic andreas at sdsc.edu
Tue Nov 15 10:53:50 EST 2011


you could send it to the list here and ask for feedback..

Andreas

On Tue, Nov 15, 2011 at 6:10 AM, Hannes Brandstätter-Müller
<biojava at hannes.oib.com> wrote:
> Well, I have implemented a first version that is running quite well
> for me and my needs/specifications, although I did not integrate it
> directly into the biojava class hierarchy yet.
> Is anyone interested in taking a look at it and giving me some
> feedback if I shoud invest the time and work to make it includable
> into biojava?
>
> Hannes
>
> On Wed, Nov 9, 2011 at 10:01, Hannes Brandstätter-Müller
> <biojava at hannes.oib.com> wrote:
>> Thanks.
>>
>> I am thinking about implementing a modified MatrixAligner to fit my
>> needs here. Direct Levenstein Distance is not exactly right for my
>> application, because there are 0 to n insertions/deletions.
>>
>> A direct LevensteinDistance as implemented in would not give me all
>> the Information I need.
>>
>> Thanks for the hints and input.
>>
>> Hannes
>>
>> PS we seriously need more examples in the Cookbook - I'll submit some
>> later, but the modified N-W Aligner mentioned below would make a good
>> example too, don't you think? ;)
>>
>> On Mon, Nov 7, 2011 at 16:03,  <forumjspro at gmail.com> wrote:
>>> Hi Hannes,
>>>
>>> You could do such a comparison by using the Needleman-Wunsh aligner with gap penalty set to -1 and the matrix set to -1 for mismatches and 0 for matches. The absolute value of the resulting score is exactly the number of errors.
>>>
>>> But it will not stop when a maximal number of errors is reached ...
>>>
>>> JS
>>>
>>> Le 7 nov. 2011 à 15:42, Andreas Prlic a écrit :
>>>
>>>> Hi Hannes,
>>>>
>>>> you are right, this does not exist yet. Somebody else asked the same
>>>> question a few weeks ago. As such it would be great if you could
>>>> provide a patch, there might be other people interested in that, too.
>>>>
>>>> Andreas
>>>>
>>>> On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandstätter-Müller
>>>> <biojava at hannes.oib.com> wrote:
>>>>> Following up:
>>>>>
>>>>> If there is no such thing, should I make it available if I write it?
>>>>>
>>>>> Hannes
>>>>>
>>>>> On Thu, Nov 3, 2011 at 14:08, Hannes Brandstätter-Müller
>>>>> <biojava at hannes.oib.com> wrote:
>>>>>> Hi!
>>>>>>
>>>>>> Is there a Class/Method in Biojava that calculates the Levenshtein
>>>>>> distance between two sequences? I could not find anything in the docs
>>>>>> at first search.
>>>>>>
>>>>>> I need to compare 2 DNASequences (or Strings) and get the number of
>>>>>> insertions, deletions, and substitutions. Ideally, there would be an
>>>>>> option to abort the comparison if the number of mismatches exceeds a
>>>>>> certain number.
>>>>>>
>>>>>> Hannes
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>
>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>>
>>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------



More information about the Biojava-l mailing list