[Biopython] alignment/matching algorithm whichs allows missmatches at certain positions

Peter biopython at maubp.freeserve.co.uk
Tue Sep 29 13:30:00 UTC 2009


On Tue, Sep 29, 2009 at 1:50 PM, Stefanie Lück <lueck at ipk-gatersleben.de> wrote:
> Hi everybody!
>
> Does someone knows an algorithm to search for sequence similarity by allowing missmatches at certain positions?
>
> E.g.
> Looking in a sequence database for
>
> ATGCTCGCGCTCGCTCGCGCA
>
> by allowing an missmatch at position [3] and [18].
>
> I can do it via regular expressions but I guess it would be quite slow.

When you say "sequence database" do you mean a set of local files
(e.g. a big FASTA files), a real database (e.g. BioSQL), or something
else like an online database (e.g. GenBank)?

I would have suggested you tried regular expressions, because they
let you deal with the specific positions where you allow a missmatch.
i.e. ATG.TCGCGCTCGCTCGC.CA as a regular expression?

You want to look for ATGNTCGCGCTCGCTCGCNCA using IUPAC
codes, which I think would work with something like fuzznuc from
EMBOSS:

http://emboss.sourceforge.net/apps/release/6.1/emboss/apps/fuzznuc.html

Peter




More information about the Biopython mailing list