[Biopython] Alternatives to Bio.Application for invoking command line tools?

Peter Cock p.j.a.cock at googlemail.com
Sun May 10 17:09:41 UTC 2020


Yes, subprocess is very powerful but tricky.

Sadly it doesn't help with giving a nice syntax for building command
lines, which for me was the main thing our Bio.Application framework added.
Perhaps someone else on the list will have a good suggestion.

Peter

On Sun, May 10, 2020 at 5:32 PM Ivan Gregoretti <ivangreg at gmail.com> wrote:

> Hi Peter.
>
> I am afraid that I do not have a satisfactory answer for you.
>
> Whenever I have used subprocess in conjunction with biopython it has
> been to run a command line child process either piping-in, piping-out,
> or both.
>
> I always found it to be hard to code. That would be the con.
>
> The pros is that I have always found a solution for my problem. The
> subprocess module is, indeed, very comprehensive.
>
> Thank you.
>
> Ivan
>
>
> On Sun, May 10, 2020 at 6:17 AM Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >
> > Thanks Ivan,
> >
> > We already recommended subprocess when the default execution mechanism
> in our command line wrappers is not enough (eg using pipes).
> >
> > How would you construct your command lines though?
> >
> > Peter
> >
> > On Sun, 10 May 2020 at 00:26, Ivan Gregoretti <ivangreg at gmail.com>
> wrote:
> >>
> >> Hello Peter.
> >>
> >> I find the subprocess library to be the most attractive.
> >> I comes with a large set of functionalities around it.
> >> Is is well established as well. I think that, should a user encounter
> >> an error, it would be relatively easier to successfully search for
> >> answers.
> >>
> >> This is just my perspective. Hopefully disagreeing views will
> >> contribute to my education.
> >>
> >> Thank you Peter for asking for the opinion of the community.
> >>
> >> Ivan
> >>
> >>
> >> On Sat, May 9, 2020 at 12:15 PM Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >> >
> >> > Dear Biopythoneers,
> >> >
> >> > Biopython has a lot of command line tool wrappers, based around the
> objects in Bio/Application/__init__.py, for building a command line string
> and running it. Some time ago I started to think that we might actually be
> better off dropping our in-house command line wrappers, and recommending a
> standard or third party library approach for defining and executing command
> line strings instead.
> >> >
> >> > Taking an example in our tutorial, running the blastx tool from NCBI
> BLAST+. Currently Biopython provides a specific object for the blastx
> command, which knows all the expected command arguments, can do some
> validation, and even has some basic help text included for each of them:
> >> >
> >> >
> >> > >>> from Bio.Blast.Applications import NcbiblastxCommandline
> >> > >>> help(NcbiblastxCommandline)
> >> > ...
> >> > >>> blastx_cline = NcbiblastxCommandline(query="opuntia.fasta",
> db="nr", evalue=0.001, outfmt=5, out="opuntia.xml")
> >> > >>> blastx_cline
> >> > NcbiblastxCommandline(cmd='blastx', out='opuntia.xml', outfmt=5,
> query='opuntia.fasta',
> >> > db='nr', evalue=0.001)
> >> > >>> print(blastx_cline)
> >> > blastx -out opuntia.xml -outfmt 5 -query opuntia.fasta -db nr -evalue
> 0.001
> >> > >>> stdout, stderr = blastx_cline()
> >> >
> >> >
> >> > This works quite nicely, but writing a unique class for each command
> line tool we wish to support is a lot of quiet tedious work, especially if
> including minimal documentation for the arguments or argument validation.
> This is also an on-going maintenance problem - one of the issues I think we
> should fix before the next Biopython release is updating the NCBI BLAST+
> wrappers as new arguments have been added.
> >> >
> >> > Some tools have a rather cryptic command line API, and in those cases
> perhaps our efforts are sensible. However, with tools like NCBI BLAST+
> where is a clear command line API, and I don't see that our efforts
> actually add a great deal over constructing the string in code and calling
> subprocess:
> >> >
> >> >
> >> > >>> import subprocess
> >> > >>> cmd = "blastx -query opuntia.fasta -db nr -out opuntia.xml
> -evalue 0.001 -outfmt 5"
> >> > >>> subprocess.check_call(cmd, shell=True)
> >> >
> >> >
> >> > There are third party libraries which might be easier? For example,
> the sh library supports our our current style with keyword arguments:
> >> >
> >> >
> >> > >>> from sh import blastx
> >> > >>> blastx(query="opuntia.fasta", db="nr", out="opuntia.xml",
> evalue="0.001", outfmt="5", _long_prefix="-")
> >> >
> >> >
> >> > You can avoid repeating the extra argument due to the NCBI not
> following the minus-minus prefix convention, e.g.:
> >> >
> >> >
> >> > >>> import sh
> >> > >>> blastx = sh.blastx.bake(_long_prefix="-")
> >> > >>> blastx(query="opuntia.fasta", db="nr", out="opuntia.xml",
> evalue="0.001", outfmt="5")
> >> >
> >> >
> >> > See https://github.com/amoffat/sh
> >> >
> >> > This is close to the same usability our wrapper offers, but with no
> ongoing maintenance burden. It would need more investigation (especially
> commands where the order is critical, often seen on macOS but not Linux),
> but Windows support aside it seems attractive.
> >> >
> >> > If there was a cross-platform system which offered this Python-like
> syntax for specifying the command line arguments, that would be a tempting
> alternative. I don't think plumbum (latin for lead, as used for pipes in
> the past) does, and I find this form heavy:
> >> >
> >> > >>> from blumbum import local
> >> > >>> cmd = local["blastx"]["-query", "opuntia.fasta", "-db", "nr",
> "-out", "opuntia.xml", "-evalue", "0.001", "-outfmt", "5"]
> >> > >>> cmd()
> >> > ''
> >> >
> >> > See https://github.com/tomerfiliba/plumbum
> >> >
> >> > What do people think? Do you have a favourite third party library for
> this kind of thing?
> >> >
> >> > Peter
> >> >
> >> > _______________________________________________
> >> > Biopython mailing list  -  Biopython at mailman.open-bio.org
> >> > https://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20200510/b1e0caf4/attachment.htm>


More information about the Biopython mailing list