[Biopython] Python equivalent of the Perl String::Approx module for approximate matching?

Tal Einat taleinat at gmail.com
Wed Mar 12 15:55:33 UTC 2014


Kevin wrote:

> @Saket: You're right.. I have already been in touch for the past two days
> with "taleinat" the person who developped that code :) You will see in his
> github that in agreement with him, I suggested my feature as a possible
> enhancement of his package (issue #2
> https://github.com/taleinat/fuzzysearch/issues), and he agreed to consider
> it for future development. No promised release date, but:
>  1) I wouldn't dare to ask for one as I am already asking for a huge favor
> for someone else to program that "for me" and the community
>  2) I am not particularly rushed, his Levenshtein distance does an
> acceptable job for the time being. I would love to be able to write the
> code myself, but my PhD thesis is more about using scripts to gain biology
> knowledge, while my issue would be better dealt with by someone with a much
> stronger low-level programming skillset using abstract mathematical notions
> to optimise the code beyond anything I could do with my scripting skills.

Hi again guys,

I'm the author of the fuzzysearch Python library. I mentioned it on
this list a few months ago thinking it might be useful. The
fuzzysearch library is meant to be used for searching, which isn't
really what you're doing. As far as I can tell it isn't really good
enough for your purpose. I'll be happy to help if I can, however,
especially given the additional interest expressed here!

The python-Levenshtein library supports generating a sequence of
operations transforming one string into another. For example (from the
docs):

>>> editops('spam', 'park')
[('delete', 0, 0), ('insert', 3, 2), ('replace', 3, 3)]

However, the requirement you described is significantly different:
telling whether a string can be transformed into another using a
maximum allowed number of replacements and insertions, but no
deletions. For the above example, it could also be transformed without
deletions using 4 substitutions!

I'd be happy to collaborate on this, including writing code, if you
like. I believe that what you need can be implemented relatively
easily.

- Tal Einat



More information about the Biopython mailing list