[Biopython] Alternatives to Bio.Application for invoking command line tools?

Ivan Gregoretti ivangreg at gmail.com
Sat May 9 23:26:00 UTC 2020


Hello Peter.

I find the subprocess library to be the most attractive.
I comes with a large set of functionalities around it.
Is is well established as well. I think that, should a user encounter
an error, it would be relatively easier to successfully search for
answers.

This is just my perspective. Hopefully disagreeing views will
contribute to my education.

Thank you Peter for asking for the opinion of the community.

Ivan


On Sat, May 9, 2020 at 12:15 PM Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Dear Biopythoneers,
>
> Biopython has a lot of command line tool wrappers, based around the objects in Bio/Application/__init__.py, for building a command line string and running it. Some time ago I started to think that we might actually be better off dropping our in-house command line wrappers, and recommending a standard or third party library approach for defining and executing command line strings instead.
>
> Taking an example in our tutorial, running the blastx tool from NCBI BLAST+. Currently Biopython provides a specific object for the blastx command, which knows all the expected command arguments, can do some validation, and even has some basic help text included for each of them:
>
>
> >>> from Bio.Blast.Applications import NcbiblastxCommandline
> >>> help(NcbiblastxCommandline)
> ...
> >>> blastx_cline = NcbiblastxCommandline(query="opuntia.fasta", db="nr", evalue=0.001, outfmt=5, out="opuntia.xml")
> >>> blastx_cline
> NcbiblastxCommandline(cmd='blastx', out='opuntia.xml', outfmt=5, query='opuntia.fasta',
> db='nr', evalue=0.001)
> >>> print(blastx_cline)
> blastx -out opuntia.xml -outfmt 5 -query opuntia.fasta -db nr -evalue 0.001
> >>> stdout, stderr = blastx_cline()
>
>
> This works quite nicely, but writing a unique class for each command line tool we wish to support is a lot of quiet tedious work, especially if including minimal documentation for the arguments or argument validation. This is also an on-going maintenance problem - one of the issues I think we should fix before the next Biopython release is updating the NCBI BLAST+ wrappers as new arguments have been added.
>
> Some tools have a rather cryptic command line API, and in those cases perhaps our efforts are sensible. However, with tools like NCBI BLAST+ where is a clear command line API, and I don't see that our efforts actually add a great deal over constructing the string in code and calling subprocess:
>
>
> >>> import subprocess
> >>> cmd = "blastx -query opuntia.fasta -db nr -out opuntia.xml -evalue 0.001 -outfmt 5"
> >>> subprocess.check_call(cmd, shell=True)
>
>
> There are third party libraries which might be easier? For example, the sh library supports our our current style with keyword arguments:
>
>
> >>> from sh import blastx
> >>> blastx(query="opuntia.fasta", db="nr", out="opuntia.xml", evalue="0.001", outfmt="5", _long_prefix="-")
>
>
> You can avoid repeating the extra argument due to the NCBI not following the minus-minus prefix convention, e.g.:
>
>
> >>> import sh
> >>> blastx = sh.blastx.bake(_long_prefix="-")
> >>> blastx(query="opuntia.fasta", db="nr", out="opuntia.xml", evalue="0.001", outfmt="5")
>
>
> See https://github.com/amoffat/sh
>
> This is close to the same usability our wrapper offers, but with no ongoing maintenance burden. It would need more investigation (especially commands where the order is critical, often seen on macOS but not Linux), but Windows support aside it seems attractive.
>
> If there was a cross-platform system which offered this Python-like syntax for specifying the command line arguments, that would be a tempting alternative. I don't think plumbum (latin for lead, as used for pipes in the past) does, and I find this form heavy:
>
> >>> from blumbum import local
> >>> cmd = local["blastx"]["-query", "opuntia.fasta", "-db", "nr", "-out", "opuntia.xml", "-evalue", "0.001", "-outfmt", "5"]
> >>> cmd()
> ''
>
> See https://github.com/tomerfiliba/plumbum
>
> What do people think? Do you have a favourite third party library for this kind of thing?
>
> Peter
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> https://mailman.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list