[Biopython] Alternatives to Bio.Application for invoking command line tools?

Peter Cock p.j.a.cock at googlemail.com
Sun May 10 10:17:02 UTC 2020


Thanks Ivan,

We already recommended subprocess when the default execution mechanism in
our command line wrappers is not enough (eg using pipes).

How would you construct your command lines though?

Peter

On Sun, 10 May 2020 at 00:26, Ivan Gregoretti <ivangreg at gmail.com> wrote:

> Hello Peter.
>
> I find the subprocess library to be the most attractive.
> I comes with a large set of functionalities around it.
> Is is well established as well. I think that, should a user encounter
> an error, it would be relatively easier to successfully search for
> answers.
>
> This is just my perspective. Hopefully disagreeing views will
> contribute to my education.
>
> Thank you Peter for asking for the opinion of the community.
>
> Ivan
>
>
> On Sat, May 9, 2020 at 12:15 PM Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >
> > Dear Biopythoneers,
> >
> > Biopython has a lot of command line tool wrappers, based around the
> objects in Bio/Application/__init__.py, for building a command line string
> and running it. Some time ago I started to think that we might actually be
> better off dropping our in-house command line wrappers, and recommending a
> standard or third party library approach for defining and executing command
> line strings instead.
> >
> > Taking an example in our tutorial, running the blastx tool from NCBI
> BLAST+. Currently Biopython provides a specific object for the blastx
> command, which knows all the expected command arguments, can do some
> validation, and even has some basic help text included for each of them:
> >
> >
> > >>> from Bio.Blast.Applications import NcbiblastxCommandline
> > >>> help(NcbiblastxCommandline)
> > ...
> > >>> blastx_cline = NcbiblastxCommandline(query="opuntia.fasta", db="nr",
> evalue=0.001, outfmt=5, out="opuntia.xml")
> > >>> blastx_cline
> > NcbiblastxCommandline(cmd='blastx', out='opuntia.xml', outfmt=5,
> query='opuntia.fasta',
> > db='nr', evalue=0.001)
> > >>> print(blastx_cline)
> > blastx -out opuntia.xml -outfmt 5 -query opuntia.fasta -db nr -evalue
> 0.001
> > >>> stdout, stderr = blastx_cline()
> >
> >
> > This works quite nicely, but writing a unique class for each command
> line tool we wish to support is a lot of quiet tedious work, especially if
> including minimal documentation for the arguments or argument validation.
> This is also an on-going maintenance problem - one of the issues I think we
> should fix before the next Biopython release is updating the NCBI BLAST+
> wrappers as new arguments have been added.
> >
> > Some tools have a rather cryptic command line API, and in those cases
> perhaps our efforts are sensible. However, with tools like NCBI BLAST+
> where is a clear command line API, and I don't see that our efforts
> actually add a great deal over constructing the string in code and calling
> subprocess:
> >
> >
> > >>> import subprocess
> > >>> cmd = "blastx -query opuntia.fasta -db nr -out opuntia.xml -evalue
> 0.001 -outfmt 5"
> > >>> subprocess.check_call(cmd, shell=True)
> >
> >
> > There are third party libraries which might be easier? For example, the
> sh library supports our our current style with keyword arguments:
> >
> >
> > >>> from sh import blastx
> > >>> blastx(query="opuntia.fasta", db="nr", out="opuntia.xml",
> evalue="0.001", outfmt="5", _long_prefix="-")
> >
> >
> > You can avoid repeating the extra argument due to the NCBI not following
> the minus-minus prefix convention, e.g.:
> >
> >
> > >>> import sh
> > >>> blastx = sh.blastx.bake(_long_prefix="-")
> > >>> blastx(query="opuntia.fasta", db="nr", out="opuntia.xml",
> evalue="0.001", outfmt="5")
> >
> >
> > See https://github.com/amoffat/sh
> >
> > This is close to the same usability our wrapper offers, but with no
> ongoing maintenance burden. It would need more investigation (especially
> commands where the order is critical, often seen on macOS but not Linux),
> but Windows support aside it seems attractive.
> >
> > If there was a cross-platform system which offered this Python-like
> syntax for specifying the command line arguments, that would be a tempting
> alternative. I don't think plumbum (latin for lead, as used for pipes in
> the past) does, and I find this form heavy:
> >
> > >>> from blumbum import local
> > >>> cmd = local["blastx"]["-query", "opuntia.fasta", "-db", "nr",
> "-out", "opuntia.xml", "-evalue", "0.001", "-outfmt", "5"]
> > >>> cmd()
> > ''
> >
> > See https://github.com/tomerfiliba/plumbum
> >
> > What do people think? Do you have a favourite third party library for
> this kind of thing?
> >
> > Peter
> >
> > _______________________________________________
> > Biopython mailing list  -  Biopython at mailman.open-bio.org
> > https://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20200510/14128e49/attachment-0001.htm>


More information about the Biopython mailing list