[Biopython] Get all alignments of a sequence against another

Tal Einat taleinat at gmail.com
Fri Mar 14 10:53:18 UTC 2014


On Fri, Mar 14, 2014 at 11:16 AM, Kevin Rue <kevin.rue at ucdconnect.ie> wrote:
> >>> import fuzzysearch
> >>> fuzzysearch.find_near_matches_with_ngrams("GGGTTLTTSS","XXXXXXXXXXXXXXXXXXXGGGTTVTTSSAAAAAAAAAAAAAGGGTTLTTSSAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBGGGTTLTTSS",
> >>> 1)
>
> The output will find two matches.
> Out[7]: [Match(start=89, end=99, dist=0), Match(start=89, end=99, dist=0)]
>
> BUG:
> I did notice that the second match is reported twice instead and I assume
> this is a bug where the first match was somehow replaced by the second,
> which is why I copied Tal (the developer of this package) to this email
>
> Another example where I added you sequence (with a mismatch) a third time:
>
>>>>
>>>> fuzzysearch.find_near_matches_with_ngrams("GGGTTLTTSS","XXXXXXXXXXXXXXXXXXXGGGTTVTTSSAAAAAAAAAAAAAGGGTTVTTSSAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBGGGTTLTTSS",
>>>> 1)
>
> returns
> Out[9]:
> [Match(start=42, end=52, dist=1),
>  Match(start=99, end=109, dist=0),
>  Match(start=99, end=109, dist=0)]
>
> You can see three matches, one of the mismatched sequence was detected
> correctly (edit distance of 1), but the bug seems to duplicate the last
> match and replace the one before the last match with it.
>
> Tal, can you fix that? I will add the issue to your repository :)

Thanks for bringing this to my attention! Fixed.

Upgrade to version 0.2.1 and your example will work as expected.

(To upgrade, run: pip install --upgrade fuzzysearch)

- Tal Einat



More information about the Biopython mailing list