[Biopython] I've written a library for executing fuzzy searches...

Peter Cock p.j.a.cock at googlemail.com
Fri Nov 15 11:08:31 UTC 2013


On Tue, Nov 12, 2013 at 5:59 PM, Tal Einat <taleinat at gmail.com> wrote:
> Hi everyone,
>
> (I'm not on this list, so please make sure to reply to me as well as the
> list.)
>
> In response to a stackoverflow
> question<http://stackoverflow.com/questions/19725127/>,
> I've written a Python library for fuzzy searches called
> 'fuzzysearch'<https://github.com/taleinat/fuzzysearch>.
> Currently, it allows searching for a string inside a longer string,
> returning the best sub-string which match up to a given maximum Levenshtein
> distance. This is done quite efficiently, and there is more optimization to
> be done, as needed.
>
> Is there any interest in this library and its further development? One
> thing which I think might be useful is support for BioPython Sequence types.
>
> This is open-source with a very liberal license (the MIT license).
>
> I'd be happy to collaborate on this!
>
> - Tal Einat

Hi Tal,

This does sounds interesting, yes. It might fit nicely into
Biopython as Bio/SeqUtils/fizzysearch.py? I agree it would
be good to ensure that your code will accept Biopython's
(string like) Seq objects as well as plain strings.

In terms of the license, I presume you'd be happy to accept the
Biopython licence (or the 3-clause BSD licence which we are
looking at switching to), which are both quite similar to the MIT
licence?

In terms of dependencies, you are using namedtuple which
is fine (it wasn't in Python 2.5 but we've dropped that now).

Also I see you are already supporting Python 2.6, 2.7
and 3.2, 3.3 with a single code base - which is good and
perfect for integration into Biopython (we've recently
dropped 2to3 which we used to use).

In terms of unit tests, it is great to see you've done this
already - although using unittest2 where we're still using
unittest (v1) that shouldn't be a problem

Peter



More information about the Biopython mailing list