[Biopython] Problem running blastp

Peter Cock p.j.a.cock at googlemail.com
Wed Jan 29 16:33:11 UTC 2014


On Wed, Jan 29, 2014 at 4:26 PM, John Connolly
<j.connolly at sheffield.ac.uk> wrote:
> Hi Peter,
>
> Thank you for your reply.
>
> I realised that the line you mentioned was unnecessary after I'd sent the
> message, but I didn't know how to update the mailing list. Sorry about that.
>
> Here's the program after I've modified it a little:
>
> "from Bio.Blast.Applications import NcbiblastpCommandline
>
> cline = NcbiblastpCommandline(query="seqs.txt", db="NADB", outfmt=5)
>
> cline
> NcbiblastpCommandline(cmd='blastp', query='seqs.txt', db='NADB', outfmt=5)
> print(cline)
> #blastp -query seqs.txt -db NADB -outfmt 5 -remote
> stdout, stderr = cline()"
>
> It runs fine, but I thought I knew how to assign the results of the blast to
> a file_handle, which I could then parse. I thought that the results would be
> in cline(). I know how to get the results to a file, but I would like to
> parse them in the same program (I have a parsing program that does exactly
> what I need).

As written, BLAST's output will be sent to stdout (default behaviour),
and therefore captured as a (potentially large) string. You could turn
this into a handle with StringIO:

from io import StringIO
handle = StringIO(stdout)

Don't use this StringIO approach for large output - it will waste
a lot of memory.

What I would normally do is ask BLAST to save the output to
a file, and open the file for reading to get a handle.

This also means you can separate running BLAST (usually slow)
and processing the output (usually fast, but I find I often need
to adjust the code so I'd want to repeat this bit many times while
working on the code - without having to rerun BLAST each time).

Peter



More information about the Biopython mailing list