[Biopython] alignment/matching algorithm whichs allows missmatches at certain positions

Stefanie Lück lueck at ipk-gatersleben.de
Tue Sep 29 14:23:57 UTC 2009


I mean big FASTA files.
Thanks for all suggestions, I'll have a look on them and decide what to use!

Stefanie
----- Original Message ----- 
From: "Peter" <biopython at maubp.freeserve.co.uk>
To: "Stefanie Lück" <lueck at ipk-gatersleben.de>
Cc: <biopython at lists.open-bio.org>
Sent: Tuesday, September 29, 2009 3:30 PM
Subject: Re: [Biopython] alignment/matching algorithm whichs allows 
missmatches at certain positions


On Tue, Sep 29, 2009 at 1:50 PM, Stefanie Lück <lueck at ipk-gatersleben.de> 
wrote:
> Hi everybody!
>
> Does someone knows an algorithm to search for sequence similarity by 
> allowing missmatches at certain positions?
>
> E.g.
> Looking in a sequence database for
>
> ATGCTCGCGCTCGCTCGCGCA
>
> by allowing an missmatch at position [3] and [18].
>
> I can do it via regular expressions but I guess it would be quite slow.

When you say "sequence database" do you mean a set of local files
(e.g. a big FASTA files), a real database (e.g. BioSQL), or something
else like an online database (e.g. GenBank)?

I would have suggested you tried regular expressions, because they
let you deal with the specific positions where you allow a missmatch.
i.e. ATG.TCGCGCTCGCTCGC.CA as a regular expression?

You want to look for ATGNTCGCGCTCGCTCGCNCA using IUPAC
codes, which I think would work with something like fuzznuc from
EMBOSS:

http://emboss.sourceforge.net/apps/release/6.1/emboss/apps/fuzznuc.html

Peter




More information about the Biopython mailing list