[Biopython] Python equivalent of the Perl String::Approx module for approximate matching?

Kevin Rue kevin.rue at ucdconnect.ie
Wed Mar 12 11:32:11 UTC 2014


Hi all,

Some may consider this a repeat of my StackOverflow post (
http://stackoverflow.com/questions/22328884/python-equivalent-of-the-perl-stringapprox-amatch-function)
but over there I didn't mention the possibility of implementing the feature
in Biopython.

I am looking for a function which, given sequence1 and sequence2, would
return whether sequence1 matches a subsequence of sequence2 allowing up to
I insertions, D deletions, and S substitutions.

So far, all I could find in Python were fuzzy matching functions using edit
distances (Levenshtein and others), but none of those distances distinguish
between insertions, deletions and substitution (
http://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison
).

There is a Perl module called String::Approx (
http://search.cpan.org/~jhi/String-Approx-3.26/Approx.pm), where the
function amatch() does exactly what I want.. except in Perl. A
quick-and-dirty fix could be to make an external call to that Perl function
from my Python script, but it would be so much cleaner (and probably
faster) if I could avoid external calls and being dependent on multiple
interpreters.

I believe that such the feature I described could rapidly become popular if
implemented in Biopython, but after reading the Perl module code and not
understanding most of it, I think any Python module I could write to do the
job wouldn't be nearly as optimised and fast. (an external call to the Perl
module would surely be faster than my Python implementation)

So....
- What are your thoughts?
- Did I miss the magic Python package that does what I want?
- Does anyone else think such a package would be useful to the
bioinformatics community?
- Did anyone solve the same issue I'm having in a different way? (I haven't
found an "think out of the box" idea yet)
- Does anyone feel like implementing this feature?

Many thanks for your advice!


-- 
Kévin RUE-ALBRECHT
Wellcome Trust Computational Infection Biology PhD Programme
University College Dublin
Ireland
http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en




More information about the Biopython mailing list