[Biopython] I've written a library for executing fuzzy searches...

c0d3g33k c0d3g33k at gmail.com
Fri Nov 15 20:12:40 UTC 2013


Hi Tal,

This is only tangentially related to your original post, but I thought 
I'd point out the existence of Simmetrics, a Java-based similarity 
metrics library (GPL v2).  I thought that at some point there was a 
Python port, but I could be confusing that with using the library myself 
under Jython.  Though it is implemented in Java, it might provide a 
solid foundation for a python library/api should you find it 
interesting.  It's fairly comprehensive, so it might at least provide 
inspiration for extending your current efforts.  It seems to be 
unmaintained at present, but source code is available both at the 
original Sourceforge page and at github where someone cloned the project.

http://sourceforge.net/projects/simmetrics/
https://github.com/Simmetrics/simmetrics


On 11/15/2013 2:08 PM, Tal Einat wrote:
> Hi Martin!
>
> I'm really excited to get such a response! I would love feedback and
> suggestions on how this could be made more useful for Biological uses. If
> you could expand on specific biological use-cases and their details, for
> example, that would be lovely!
>
> - Tal
>
>
> Tal Einat wrote:
>>> Hi everyone,
>>>
>>> (I'm not on this list, so please make sure to reply to me as well as the
>>> list.)
>>>
>>> In response to a stackoverflow
>>> question<http://stackoverflow.com/questions/19725127/>,
>>> I've written a Python library for fuzzy searches called
>>> 'fuzzysearch'<https://github.com/taleinat/fuzzysearch>.
>>> Currently, it allows searching for a string inside a longer string,
>>> returning the best sub-string which match up to a given maximum
>> Levenshtein
>>> distance. This is done quite efficiently, and there is more optimization
>> to
>>> be done, as needed.
>>>
>>> Is there any interest in this library and its further development? One
>>> thing which I think might be useful is support for BioPython Sequence
>> types.
>>> This is open-source with a very liberal license (the MIT license).
>>>
>>> I'd be happy to collaborate on this!
>>>
>>> - Tal Einat
>>> _______________________________________________
>>> Biopython mailing list  -  Biopython at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biopython
>>>
>>>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython




More information about the Biopython mailing list