[Biopython] Alternatives to Bio.Application for invoking command line tools?
Ivan Gregoretti
ivangreg at gmail.com
Sun May 10 16:31:54 UTC 2020
Hi Peter.
I am afraid that I do not have a satisfactory answer for you.
Whenever I have used subprocess in conjunction with biopython it has
been to run a command line child process either piping-in, piping-out,
or both.
I always found it to be hard to code. That would be the con.
The pros is that I have always found a solution for my problem. The
subprocess module is, indeed, very comprehensive.
Thank you.
Ivan
On Sun, May 10, 2020 at 6:17 AM Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Thanks Ivan,
>
> We already recommended subprocess when the default execution mechanism in our command line wrappers is not enough (eg using pipes).
>
> How would you construct your command lines though?
>
> Peter
>
> On Sun, 10 May 2020 at 00:26, Ivan Gregoretti <ivangreg at gmail.com> wrote:
>>
>> Hello Peter.
>>
>> I find the subprocess library to be the most attractive.
>> I comes with a large set of functionalities around it.
>> Is is well established as well. I think that, should a user encounter
>> an error, it would be relatively easier to successfully search for
>> answers.
>>
>> This is just my perspective. Hopefully disagreeing views will
>> contribute to my education.
>>
>> Thank you Peter for asking for the opinion of the community.
>>
>> Ivan
>>
>>
>> On Sat, May 9, 2020 at 12:15 PM Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> >
>> > Dear Biopythoneers,
>> >
>> > Biopython has a lot of command line tool wrappers, based around the objects in Bio/Application/__init__.py, for building a command line string and running it. Some time ago I started to think that we might actually be better off dropping our in-house command line wrappers, and recommending a standard or third party library approach for defining and executing command line strings instead.
>> >
>> > Taking an example in our tutorial, running the blastx tool from NCBI BLAST+. Currently Biopython provides a specific object for the blastx command, which knows all the expected command arguments, can do some validation, and even has some basic help text included for each of them:
>> >
>> >
>> > >>> from Bio.Blast.Applications import NcbiblastxCommandline
>> > >>> help(NcbiblastxCommandline)
>> > ...
>> > >>> blastx_cline = NcbiblastxCommandline(query="opuntia.fasta", db="nr", evalue=0.001, outfmt=5, out="opuntia.xml")
>> > >>> blastx_cline
>> > NcbiblastxCommandline(cmd='blastx', out='opuntia.xml', outfmt=5, query='opuntia.fasta',
>> > db='nr', evalue=0.001)
>> > >>> print(blastx_cline)
>> > blastx -out opuntia.xml -outfmt 5 -query opuntia.fasta -db nr -evalue 0.001
>> > >>> stdout, stderr = blastx_cline()
>> >
>> >
>> > This works quite nicely, but writing a unique class for each command line tool we wish to support is a lot of quiet tedious work, especially if including minimal documentation for the arguments or argument validation. This is also an on-going maintenance problem - one of the issues I think we should fix before the next Biopython release is updating the NCBI BLAST+ wrappers as new arguments have been added.
>> >
>> > Some tools have a rather cryptic command line API, and in those cases perhaps our efforts are sensible. However, with tools like NCBI BLAST+ where is a clear command line API, and I don't see that our efforts actually add a great deal over constructing the string in code and calling subprocess:
>> >
>> >
>> > >>> import subprocess
>> > >>> cmd = "blastx -query opuntia.fasta -db nr -out opuntia.xml -evalue 0.001 -outfmt 5"
>> > >>> subprocess.check_call(cmd, shell=True)
>> >
>> >
>> > There are third party libraries which might be easier? For example, the sh library supports our our current style with keyword arguments:
>> >
>> >
>> > >>> from sh import blastx
>> > >>> blastx(query="opuntia.fasta", db="nr", out="opuntia.xml", evalue="0.001", outfmt="5", _long_prefix="-")
>> >
>> >
>> > You can avoid repeating the extra argument due to the NCBI not following the minus-minus prefix convention, e.g.:
>> >
>> >
>> > >>> import sh
>> > >>> blastx = sh.blastx.bake(_long_prefix="-")
>> > >>> blastx(query="opuntia.fasta", db="nr", out="opuntia.xml", evalue="0.001", outfmt="5")
>> >
>> >
>> > See https://github.com/amoffat/sh
>> >
>> > This is close to the same usability our wrapper offers, but with no ongoing maintenance burden. It would need more investigation (especially commands where the order is critical, often seen on macOS but not Linux), but Windows support aside it seems attractive.
>> >
>> > If there was a cross-platform system which offered this Python-like syntax for specifying the command line arguments, that would be a tempting alternative. I don't think plumbum (latin for lead, as used for pipes in the past) does, and I find this form heavy:
>> >
>> > >>> from blumbum import local
>> > >>> cmd = local["blastx"]["-query", "opuntia.fasta", "-db", "nr", "-out", "opuntia.xml", "-evalue", "0.001", "-outfmt", "5"]
>> > >>> cmd()
>> > ''
>> >
>> > See https://github.com/tomerfiliba/plumbum
>> >
>> > What do people think? Do you have a favourite third party library for this kind of thing?
>> >
>> > Peter
>> >
>> > _______________________________________________
>> > Biopython mailing list - Biopython at mailman.open-bio.org
>> > https://mailman.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list