[Biopython] matching sequences from fasta files

Fri Mar 12 00:36:07 UTC 2010

--- On Thu, 3/11/10, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [Biopython] matching sequences from fasta files
> To: "Vincent Davis" <vincent at vincentdavis.net>
> Cc: "biopython" <biopython at lists.open-bio.org>
> Date: Thursday, March 11, 2010, 6:06 AM
> On Thu, Mar 11, 2010 at 12:47 AM,
> Vincent Davis
> <vincent at vincentdavis.net>
> wrote:
> > So I had an idea and wanted to get some feedback.
> > I could make all possible single position mismatches
> for the sequences. I
> > have 230,000 now and the would give me 17,250,000 (3 *
> 25 * 230,000). Then
> > use BLAST to look for perfect matches. I would
> probably do this
> > incrementally maybe even just blast for each sequence.
> The advantage I see
> > in this is that BLAST can run multi core and I am
> running it on an 8core
> > with 48gb of memory So it seems that this would be the
> fastest way to do
> > this and very straight forward as there is very little
> parsing. There is
> > either a match or not. I am purely guessing that
> generating the list if
> > faster than parsing the results.
> 
Nexalign can do exactly what you are trying to do.
See http://genome.gsc.riken.jp/osc/english/dataresource/.

--Michiel.