[Biopython] I've written a library for executing fuzzy searches...

Tal Einat taleinat at gmail.com
Sun Nov 17 17:40:47 UTC 2013


On Sun, Nov 17, 2013 at 6:24 PM, c0d3g33k <c0d3g33k at gmail.com> wrote:

>  On 11/17/2013 04:14 AM, Tal Einat wrote:
>
>  There are already many libraries to compute vaiours [various?] distance
> metrics between two strings, but that is not the purpose of the library I'm
> developing (fuzzysearch). My goal is to build a library for searching in
> strings or other sequences (e.g. DNA), allowing finding nearly matching
> parts instead of just full matches.
>
>   That's what made me think of it.  *It covers your use case* and seems
> to be well researched, so I thought it might be of interest as you
> implement your own library.
>

I'm sorry, but I don't see how it covers my use case. Calculating a
similarity measure between a short string/sequence and a very long one
isn't quite the same as searching for all of the matching or nearly
matching sub-sequences. It's close but not quite the same, especially with
regard to which algorithms are efficient to use. Or am I missing something?


> The other nice thing from a usability perspective was that it offered the
> option of normalised output in addition to the raw output of the original
> algorithms, which made it easier to compare results when running a series
> of metrics on a given set of strings.
>

That does indeed sound useful. If I get to the point where the library
supports multiple metrics, I'll take a look at how they normalize the
outputs.

- Tal



More information about the Biopython mailing list