[Biopython] alignment/matching algorithm whichs allows missmatches at certain positions
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue Sep 29 13:23:29 UTC 2009
Hi,
On Sep 29, 2009, at 8:50 AM, Stefanie Lück wrote:
> Hi everybody!
>
> Does someone knows an algorithm to search for sequence similarity by
> allowing missmatches at certain positions?
>
> E.g.
> Looking in a sequence database for
>
> ATGCTCGCGCTCGCTCGCGCA
>
> by allowing an missmatch at position [3] and [18].
>
> I can do it via regular expressions but I guess it would be quite
> slow.
You can use bowtie:
http://bowtie-bio.sourceforge.net/index.shtml
You can't tell it where to allow the mismatch, but you can tell it how
many mismatches to allow. The output file is easy to parse, and it
also informs you the position of the mismatch, and what nucleotide was
changed to what in order to make the match.
Pros:
Insanely fast aligner.
Cons:
* You'll have to do a bit of work at the command line.
* You need an index file for your "database" of sequences you are
searching against (not querying with). There are several provided on
the site, otherwise it's also quite easy to make your own (though
requires a lot of memory.
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Biopython
mailing list