[Biopython] Feeding XML stream from BLAST directly into SeqIO.parse()

Peter Cock p.j.a.cock at googlemail.com
Tue Jan 5 12:27:56 UTC 2016


On Mon, Jan 4, 2016 at 4:51 AM, Martin Mokrejs
<mmokrejs at fold.natur.cuni.cz> wrote:
>
> Hi,
>   I want to avoid creation of the huge XML files and feed BLAST
> results directly into SeqIO.

I guess you mean SearchIO rather than SeqIO?

> I think the following clearly sends all output into _stdout.
>
> _stdout, _stderr = subprocess.Popen(_cmdline, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()
>

Calling .communicate() would return ALL of the stdout and stderr as
large strings in memory. If you have large BLAST XML output, that
would be bad.

> Can I do it directly inside my python? Just passing a handle/iterator
> to SeqIO.parse()?

Yes, see for example the "MUSCLE using stdin and stdout" example
in the AlignIO chapter of the Tutorial. You would use the child process'
.stdout handle with the Biopython parse function.

> In Biopython's TUTORIAL I only see in section 7.3 result_handle
> using an existing disk file. Chapter 8 dedicated to SearchIO does
> not go beyond file-based examples either. Is it so uncommon? ;-)

It is much harder to debug (especially if there are any errors), so
personally I tend to avoid parsing stdout from subprocess.

Peter


More information about the Biopython mailing list