[Biopython-dev] subprocess and calling application wrappers

Tue Jul 20 12:03:44 EDT 2010

On Wed, Jun 2, 2010 at 12:36 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Tue, Jun 1, 2010 at 4:15 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> On Tue, Jun 1, 2010 at 2:23 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>> I'd suggest having an option to not capture stdout and stderr, which
>>> would help users avoid those cases where a program spews a lot to
>>> stdout and it's unwieldy to capture and stick it into a string.
>>
>> We need to avoid any risk of deadlocks, so I guess the safe
>> implementation here would be call subprocess with stdout and
>> stderr sent to dev null.
>
> How does this look? Tested on Mac and Windows:
> http://github.com/peterjc/biopython/tree/app-exec2
>
> Example usage without capturing the output:
>
>    from Bio.Emboss.Applications import WaterCommandline
>    water_cmd = WaterCommandline(gapopen=10, gapextend=0.5, stdout=True,
>                                 asequence="a.fasta", bsequence="b.fasta")
>    print "About to run:\n%s" % water_cmd
>    return_code = water_cmd()
>    print "Return code: %i" % return_code
>
> Example usage with stdout and stderr capture:
>
>    from Bio.Emboss.Applications import WaterCommandline
>    water_cmd = WaterCommandline(gapopen=10, gapextend=0.5, stdout=True,
>                                 asequence="a.fasta", bsequence="b.fasta")
>    print "About to run:\n%s" % water_cmd
>    stdout, stderr, return_code = water_cmd(capture=True)
>    print "Return code: %i" % return_code
>    print "Tool output:\n%s" % stdout
>
> Note in this implementation it either returns an integer error level
> (the default) or a tuple of stdout, stderr and the error level return
> code. If we opt for adding methods rather than using __call__
> these could be different methods instead.
>
> Another potentially useful option would be to copy the
> subprocess.check_call() function in Python 2.5+ which verifies
> the return code (error level) is zero and raises an exception if not
> (probably only sensible if not capturing the output?). Maybe this
> could even be the default behaviour?
>
> [I would prefer to keep the interface as simple as possible though,
> less options is better! KISS principle.]
>
> Peter

Interestingly in Python 2.7 subprocess gained a new function called
check_output which returns a string (stdout, optionally combined
with stderr as a single string). If there is a non-zero return code you
get a CalledProcessError exception (with return code and output):
http://docs.python.org/library/subprocess.html

In some ways there are too many choices - how unpythonic ;)

Having thought about this for a while, I realised that in almost every
case I have never cared about the exact return code, just if it is zero
(success) or not (failure). Therefore the behaviour of the subprocess
functions check_call (Python 2.5+) and check_output (Python 2.7+)
seems desirable (you get an exception if the return code is non zero).

That just leaves what to return: stdout and/or stderr. I personally
have never needed to merge stderr and stdout into a single pipe
or string - the only use case for this I can think of is to capture the
output into a file for logging purposes. Generally it makes more sense
to keep them separate. This leaves the question should we return
just stdout, or both? Sometimes stderr is useful, so I think both.

So, in yet-another-branch, I wrote a __call__ implementation which
raises an exception on non-zero return codes, but otherwise returns
stdout and stderr as a tuple of two strings:

http://github.com/peterjc/biopython/commits/app-exec3

I'm pretty confident this will suffice for most use cases, and propose
we implement this in Biopython 1.55.

Thoughts?

Peter