[Biopython] Feeding XML stream from BLAST directly into SeqIO.parse()

Martin Mokrejs mmokrejs at fold.natur.cuni.cz
Sun Jan 3 19:51:30 UTC 2016


Hi,
   I want to avoid creation of the huge XML files and feed BLAST results directly into SeqIO. I think the following clearly sends all output into _stdout.

_stdout, _stderr = subprocess.Popen(_cmdline, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()


   Wrapping _stdout with cStringIO.StringIO would give a file-like object to be fed into SeqIO.parse() but that does not help with the primary problem I have: how to delay BLASTN output or maybe just limit the buffer of Popen? so that it would not consume all of my computer memory.

   I know one can prepare the left- and right-side of a UNIX pipe separately but I am not running SeqIO.parse() on the right hand-side of a pipe as a separate process [an example could be: http://seqanswers.com/forums/showpost.php?s=9904bb037c254042dfe282d032f8c07d&p=140253&postcount=6 I use something like this elsewhere but I cannot think of a way to do it now instead of moving the SeqIO.parse() caller into a separate program and executing it as the consumer on the right side of a pipe. Sounds not very elegant.
Can I do it directly inside my python? Just passing a handle/iterator to SeqIO.parse()?

   What do you think of this: http://eyalarubas.com/python-subproc-nonblock.html


   In Biopython's TUTORIAL I only see in section 7.3 result_handle using an existing disk file. Chapter 8 dedicated to SearchIO does not go beyond file-based examples either. Is it so uncommon? ;-)

Thank you for clues,
Martin


More information about the Biopython mailing list