[Biopython] matching sequences from fasta files

Vincent Davis vincent at vincentdavis.net
Thu Mar 11 00:47:49 UTC 2010


So I had an idea and wanted to get some feedback.
I could make all possible single position mismatches for the sequences. I
have 230,000 now and the would give me 17,250,000 (3 * 25 * 230,000). Then
use BLAST to look for perfect matches. I would probably do this
incrementally maybe even just blast for each sequence. The advantage I see
in this is that BLAST can run multi core and I am running it on an 8core
with 48gb of memory So it seems that this would be the fastest way to do
this and very straight forward as there is very little parsing. There is
either a match or not. I am purely guessing that generating the list if
faster than parsing the results.

*Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>


On Wed, Mar 10, 2010 at 2:56 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Wed, Mar 10, 2010 at 6:10 PM, Vincent Davis <vincent at vincentdavis.net>
> wrote:
> > I don't have a favorite, I have only tried BLAST  :)
> > Is there an example of how to interface between python and
> > BLAST. I have no idea where to start. I have never done
> > anything similar.
>
> There are examples of how to call BLAST and parse its
> (XML) output with Biopython in our tutorial:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
>
> Peter
>
> P.S. I am reminded of the old saying, "When all you have is
> a hammer, everything looks like a nail." (by which I mean
> even if it is not the best tool for the job, you could do it with
> BLAST).
>



More information about the Biopython mailing list