[Biopython] I've written a library for executing fuzzy searches...

Tal Einat taleinat at gmail.com
Fri Nov 15 19:08:42 UTC 2013


Hi Martin!

I'm really excited to get such a response! I would love feedback and
suggestions on how this could be made more useful for Biological uses. If
you could expand on specific biological use-cases and their details, for
example, that would be lovely!

- Tal



On Fri, Nov 15, 2013 at 1:38 PM, Martin Mokrejs <mmokrejs at fold.natur.cuni.cz
> wrote:

> Hello Tal,
>   it is interesting. I needed something like this a while ago and the
> alternatives
> were difflib.SequenceMatcher() and https://github.com/facebook/pyre2 . I
> had problems
> with pyre2 crashing so I use difflib.SequenceMatcher(None, str1, str2) at
> the moment.
>   I would prefer you keep fuzzysearch as a separate package and biopython
> just import
> it, as an optional dependency. There is lot more people looking for fuzzy
> search tools
> under python and no reason to hide it under biopython. Search for Longest
> Common Sequence
> (LCS) on the internet.
>   Finally, I lack any comparison to existing tools in the README. ;-)
> Would you mind
> looking into that?
>
>   I should be able to give some more feedback later on if you want, in
> respect to biology.
> I would ask for something looser in searches to overcome under-called and
> over-called
> nucleotides in 454 sequences. The Levenshtein is not the best measure for
> these data
> and we need something respecting more the reality.
> Martin
>
> Tal Einat wrote:
> > Hi everyone,
> >
> > (I'm not on this list, so please make sure to reply to me as well as the
> > list.)
> >
> > In response to a stackoverflow
> > question<http://stackoverflow.com/questions/19725127/>,
> > I've written a Python library for fuzzy searches called
> > 'fuzzysearch'<https://github.com/taleinat/fuzzysearch>.
> > Currently, it allows searching for a string inside a longer string,
> > returning the best sub-string which match up to a given maximum
> Levenshtein
> > distance. This is done quite efficiently, and there is more optimization
> to
> > be done, as needed.
> >
> > Is there any interest in this library and its further development? One
> > thing which I think might be useful is support for BioPython Sequence
> types.
> >
> > This is open-source with a very liberal license (the MIT license).
> >
> > I'd be happy to collaborate on this!
> >
> > - Tal Einat
> > _______________________________________________
> > Biopython mailing list  -  Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
> >
> >
>



More information about the Biopython mailing list