[Biopython] I've written a library for executing fuzzy searches...

Martin Mokrejs mmokrejs at fold.natur.cuni.cz
Fri Nov 15 11:38:11 UTC 2013


Hello Tal,
  it is interesting. I needed something like this a while ago and the alternatives
were difflib.SequenceMatcher() and https://github.com/facebook/pyre2 . I had problems
with pyre2 crashing so I use difflib.SequenceMatcher(None, str1, str2) at the moment.
  I would prefer you keep fuzzysearch as a separate package and biopython just import
it, as an optional dependency. There is lot more people looking for fuzzy search tools
under python and no reason to hide it under biopython. Search for Longest Common Sequence
(LCS) on the internet.
  Finally, I lack any comparison to existing tools in the README. ;-) Would you mind
looking into that?

  I should be able to give some more feedback later on if you want, in respect to biology.
I would ask for something looser in searches to overcome under-called and over-called
nucleotides in 454 sequences. The Levenshtein is not the best measure for these data
and we need something respecting more the reality.
Martin

Tal Einat wrote:
> Hi everyone,
> 
> (I'm not on this list, so please make sure to reply to me as well as the
> list.)
> 
> In response to a stackoverflow
> question<http://stackoverflow.com/questions/19725127/>,
> I've written a Python library for fuzzy searches called
> 'fuzzysearch'<https://github.com/taleinat/fuzzysearch>.
> Currently, it allows searching for a string inside a longer string,
> returning the best sub-string which match up to a given maximum Levenshtein
> distance. This is done quite efficiently, and there is more optimization to
> be done, as needed.
> 
> Is there any interest in this library and its further development? One
> thing which I think might be useful is support for BioPython Sequence types.
> 
> This is open-source with a very liberal license (the MIT license).
> 
> I'd be happy to collaborate on this!
> 
> - Tal Einat
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
> 
> 



More information about the Biopython mailing list