[Biopython] send SeqIO.parse to NcbiblastnCommandline

Wed Nov 2 17:04:19 UTC 2011

On Wed, Nov 2, 2011 at 4:21 PM, Matthew MacManes <macmanes at gmail.com> wrote:
> Hi All,
>
> I am trying to take a large fasta file, send sequences one by one
> to NcbiblastnCommandline, sending results to a unique file based on the
> query ID. So far I have
>
> MUSDATABASE='/media/hd/blastdb/mouse.rna'
>
> from Bio import SeqIO
> from Bio.Blast.Applications import NcbiblastnCommandline
> for seq_record in SeqIO.parse("test1.fa", "fasta"):
> cl = NcbiblastnCommandline(cmd="/home/matthew/ncbi-blast/bin/blastn",
>  query=seq_record.seq,
> db=MUSDATABASE, evalue=0.0000000001,
> outfmt="'10 qseqid qseq sseqid sseq bitscore'",
>  out=seq_record.id,
> max_target_seqs=1,
>  num_threads=15)
> print cl
> stdout, stderr = cl()
>
>
> This seems like a promising approach, but the issue is that the query
> argument expects a file, not a sequence itself.  In reading in the BLAST+
> manual, blastn can accept a sequence from the standard input via query="-",
> but I cannot get this to work, does not catch the sequence.
>
>
> Any pointers greatly appreciated.
> Matt

You need to do two things, (1) tell BLAST to read the sequence from
stdin, and (2) supply the FASTA formatted sequence to stdin.

Try something along these lines:

cline = NcbiblastnCommandline(..., query="-", ...)

stdout, stderr = cline(stdin=record.format("fasta"))

Peter