[Biopython] allow ambiguities is sequence matching?

Peter biopython at maubp.freeserve.co.uk
Fri Nov 20 10:03:15 UTC 2009


On Thu, Nov 19, 2009 at 11:42 PM, Cedar McKay <cmckay at u.washington.edu> wrote:
> Hello all,
> Apologies if this is covered in the tutorial anywhere, if so I didn't see
> it.
>
> I am trying to test whether sequence A appears anywhere in sequence B. The
> catch is I want to allow n mismatches. Right now my code looks like:
>
> #record is a SeqRecord
> #query_seq is a string
> if query_seq in record.seq:
>        do something
>
>
> If I want query_seq to match despite n nucleotide mismatches, how should I
> do that? It seems like something that would be pretty common for people
> working with DNA probes. Is this even a biopython problem? Or is it just a
> general python problem?

We have in general tried to keep the Seq object API as much like that of
the Python string as is reasonable, for example the find, startswith and
endswith methos. Likewise, the "in" operator on the Seq object also works
like a python string, it uses plain string matching (see Bug 2853, this was
added in Biopython 1.51).

It sounds like you want some kind of fuzzy find... one solution would
be regular expressions, another might be to use the Bio.Motif module.
There have been similar discussions on the mailing list before, but no
clear consensus - see for example Bug 2601.

Peter




More information about the Biopython mailing list