[Biopython] (no subject)

Sun Apr 15 10:41:08 EDT 2012

On Sat, Apr 14, 2012 at 7:41 AM, Matthias Schade
<matthiasschade.de at googlemail.com> wrote:
> Hello everyone,
>
>
> I would like to run a blastn-query of a small nucleotide-sequence against a
> genome. The code works already, but my queries are still slow and mostly
> ineffective, so I would like to ask:
>
> Is there a way to tell the blastn-algorithm that once a 'perfect match' has
> been found it can stop and send back the results?
>
> Background: I am interested in only the first full match because I would
> like to design a nucleotide-probe which -if possible- has no(!) known match
> in a host-genome, neither in RNA nor DNA. Actually, I would reject all
> perfect-matches and all single-mismatches but allow every sequence with two
> or more mismatches.
>
> Currrently, I use this line of code with seq_now being about 15-30 nt long:
> result_handle = NCBIWWW.qblast("blastn", "nt", seq_now, entrez_query="Canis
> familiaris[orgn]")
>
>
> I am still new to this. Thank you for your help and input,
>
> Matt
>

Hi Matt,

Since you're already setting the target database as one genome, this
should already be reasonably fast, right? You can play with the BLAST
sensitivity cutoffs and reporting thresholds, but I don't think it's
possible to do exactly this, except by using an algorithm other than
BLAST.

If speed is crucial, you might be interested in USEARCH, which does
have the feature you're looking for, but isn't wrapped in Biopython
yet:
http://www.drive5.com/usearch/

Cheers,
Eric