[Biopython] allow ambiguities is sequence matching?

Christian Schaefer schafer at rostlab.org
Fri Nov 20 11:55:58 EST 2009


Hey Cedar,

I'm currently doing something similar on protein sequences. A simple 
brute force method could work like this:
Slide the short sequence 'underneath' the long sequence. After each step 
   translate the current overlap into a bit-string where 1 indicates a 
match and 0 a mismatch. Now you can easily apply a regex on this 
bit-string to look for particular patterns like 'n mismatches allowed'.

Hope that helps.
Chris


Cedar McKay wrote:
> Hello all,
> Apologies if this is covered in the tutorial anywhere, if so I didn't 
> see it.
> 
> I am trying to test whether sequence A appears anywhere in sequence B. 
> The catch is I want to allow n mismatches. Right now my code looks like:
> 
> #record is a SeqRecord
> #query_seq is a string
> if query_seq in record.seq:
>     do something
> 
> 
> If I want query_seq to match despite n nucleotide mismatches, how should 
> I do that? It seems like something that would be pretty common for 
> people working with DNA probes. Is this even a biopython problem? Or is 
> it just a general python problem?
> 
> thanks,
> Cedar
> 
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list