[Biopython] Feeding XML stream from BLAST directly into SeqIO.parse()

Wibowo Arindrarto w.arindrarto at gmail.com
Tue Jan 5 12:23:51 UTC 2016


Hi Martin,

If you want to stay inside Python, you should be able to do something like this:

import subprocess
from Bio import SearchIO

blast_process = subprocess.Popen(...)

for record in SearchIO.parse(blast_process.stdout, 'blast-xml'):
    # process each record

If you can afford to go out of Python, you can replace
`blast_process.stdout` above with `sys.stdin`. In general, the parse
functions work with string file names and file handle-like objects.

(P.S. I'm using SearchIO as an example. Both SeqIO and SearchIO uses
the same file-handling function, IIRC.)

Hope this helps,
Bow

On Sun, Jan 3, 2016 at 8:51 PM, Martin Mokrejs
<mmokrejs at fold.natur.cuni.cz> wrote:
> Hi,
>   I want to avoid creation of the huge XML files and feed BLAST results
> directly into SeqIO. I think the following clearly sends all output into
> _stdout.
>
> _stdout, _stderr = subprocess.Popen(_cmdline, shell=False,
> stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()
>
>
>   Wrapping _stdout with cStringIO.StringIO would give a file-like object to
> be fed into SeqIO.parse() but that does not help with the primary problem I
> have: how to delay BLASTN output or maybe just limit the buffer of Popen? so
> that it would not consume all of my computer memory.
>
>   I know one can prepare the left- and right-side of a UNIX pipe separately
> but I am not running SeqIO.parse() on the right hand-side of a pipe as a
> separate process [an example could be:
> http://seqanswers.com/forums/showpost.php?s=9904bb037c254042dfe282d032f8c07d&p=140253&postcount=6
> I use something like this elsewhere but I cannot think of a way to do it now
> instead of moving the SeqIO.parse() caller into a separate program and
> executing it as the consumer on the right side of a pipe. Sounds not very
> elegant.
> Can I do it directly inside my python? Just passing a handle/iterator to
> SeqIO.parse()?
>
>   What do you think of this:
> http://eyalarubas.com/python-subproc-nonblock.html
>
>
>   In Biopython's TUTORIAL I only see in section 7.3 result_handle using an
> existing disk file. Chapter 8 dedicated to SearchIO does not go beyond
> file-based examples either. Is it so uncommon? ;-)
>
> Thank you for clues,
> Martin
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list