[Biopython] Python equivalent of the Perl String::Approx module for approximate matching?

Ivan Gregoretti ivangreg at gmail.com
Wed Mar 12 13:38:31 UTC 2014


If that Perl function existed in Biopython, I would use it everyday, night
and day. I sense that I would not be the only one.

Ivan



Ivan Gregoretti, PhD
Bioinformatics

On Wed, Mar 12, 2014 at 7:32 AM, Kevin Rue <kevin.rue at ucdconnect.ie> wrote:

> Hi all,
>
> Some may consider this a repeat of my StackOverflow post (
>
> http://stackoverflow.com/questions/22328884/python-equivalent-of-the-perl-stringapprox-amatch-function
> )
> but over there I didn't mention the possibility of implementing the feature
> in Biopython.
>
> I am looking for a function which, given sequence1 and sequence2, would
> return whether sequence1 matches a subsequence of sequence2 allowing up to
> I insertions, D deletions, and S substitutions.
>
> So far, all I could find in Python were fuzzy matching functions using edit
> distances (Levenshtein and others), but none of those distances distinguish
> between insertions, deletions and substitution (
>
> http://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison
> ).
>
> There is a Perl module called String::Approx (
> http://search.cpan.org/~jhi/String-Approx-3.26/Approx.pm), where the
> function amatch() does exactly what I want.. except in Perl. A
> quick-and-dirty fix could be to make an external call to that Perl function
> from my Python script, but it would be so much cleaner (and probably
> faster) if I could avoid external calls and being dependent on multiple
> interpreters.
>
> I believe that such the feature I described could rapidly become popular if
> implemented in Biopython, but after reading the Perl module code and not
> understanding most of it, I think any Python module I could write to do the
> job wouldn't be nearly as optimised and fast. (an external call to the Perl
> module would surely be faster than my Python implementation)
>
> So....
> - What are your thoughts?
> - Did I miss the magic Python package that does what I want?
> - Does anyone else think such a package would be useful to the
> bioinformatics community?
> - Did anyone solve the same issue I'm having in a different way? (I haven't
> found an "think out of the box" idea yet)
> - Does anyone feel like implementing this feature?
>
> Many thanks for your advice!
>
>
> --
> Kévin RUE-ALBRECHT
> Wellcome Trust Computational Infection Biology PhD Programme
> University College Dublin
> Ireland
> http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>




More information about the Biopython mailing list