[Biopython] alignment/matching algorithm whichs allows missmatches at certain positions

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Sep 29 13:23:29 UTC 2009


Hi,

On Sep 29, 2009, at 8:50 AM, Stefanie Lück wrote:

> Hi everybody!
>
> Does someone knows an algorithm to search for sequence similarity by  
> allowing missmatches at certain positions?
>
> E.g.
> Looking in a sequence database for
>
> ATGCTCGCGCTCGCTCGCGCA
>
> by allowing an missmatch at position [3] and [18].
>
> I can do it via regular expressions but I guess it would be quite  
> slow.

You can use bowtie:

http://bowtie-bio.sourceforge.net/index.shtml

You can't tell it where to allow the mismatch, but you can tell it how  
many mismatches to allow. The output file is easy to parse, and it  
also informs you the position of the mismatch, and what nucleotide was  
changed to what in order to make the match.

Pros:
Insanely fast aligner.

Cons:
* You'll have to do a bit of work at the command line.
* You need an index file for your "database" of sequences you are  
searching against (not querying with). There are several provided on  
the site, otherwise it's also quite easy to make your own (though  
requires a lot of memory.

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
   |  Memorial Sloan-Kettering Cancer Center
   |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact





More information about the Biopython mailing list