[Biopython-dev] Fwd: Where to put command line wrappers

Thu Apr 16 18:53:03 UTC 2009

On 4/16/09, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:
> Hi All,
>
>   On Thu, Apr 16, 2009 at 5:45 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>  > For EMBOSS we have a single file, Bio/Emboss/Applications.py, which
>  > has about 15 wrappers (all very similar as the EMBOSS applications are
>  > very consistent).  This is nice in that all the wrappers are in the
>  > Bio.Emboss.Application namespace.
>  >
>  > Bartek and I have been having a similar discussion for Motif tools,
>  > and if the AliceAce wrappers should go in Bio.Motif.Applications to
>  > match.  For now Bio.Motif has just one wrapper for AlignACE and sister
>  > tool CompareACE.  Now giving each tool-set its own file is possible
>  > (Bio/Motif/Applications/AlignAce.py) but would one (large) file be
>  > simpler? (i.e. Bio/Motif/Applications.py).
>  >
>  I think that there is a difference between EMBOSS and
>  Bio.[Motif|Align]. In EMBOSS we have a very nicely comoditized
>  set of tools with similar interfaces, while both for multiple
>  alignment and motif searching the tools vary a lot. In case of
>  multiple alignments this is only with respect to parameters and
>  output format, while in motif searching there is also a lot of
>  differences in the types of input (background models etc.).

That is a good argument for using Bio/Align/Applications/XXX.py and
Bio/Motif/Applications/XXX.py while also having
Bio/EMBOSS/Applications.py

>  Also, quite likely the parsers for different tools will be written by
>  different people.

Biopython's command line wrappers can be quite separate from the
parsers - this is a natural break.  One can be useful without the
other, and keeping them separate allows you to for example use a
Biopython wrapper with another parser, or vice versa.

>  In this case, I think that it's much easier from the maintainers point
>  of view to have a directory with separate files rather than a single
>  module. [...]

True.

>  >> I'm not sure how many wrappers we might eventually expect for multiple
>  >> sequence alignments, maybe ten or twenty, mostly from different tool
>  >> sets.  Maybe Bio/Align/Applications/Muscle.py etc is the way to go,
>  >> but we can then import all the command line objects under the
>  >> Bio.Align.Applications namespace.
>
>  +1 from me.
>
>  > Bio/AlignAce/Applications.py does use Bio.Application, but we are
>  > planning to replace this module with Bio.Motif which gives us a chance
>  > to review the API without worrying too much about backwards
>  > compatibility.  As part of moving it to Bio.Motif, I would remove the
>  > run methods from AlignAceCommandline and CompareAceCommandline (none
>  > of the other Biopython command line objects have them as far as I
>  > know), and also remove the AlignAce and CompareAce helper functions
>  > (in Bio/AlignAce/AlignAceStandalone.py and
>  > Bio/AlignAce/CompareAceStandalone.py). Internally these all call the
>  > Bio.Application.generic_run function, and return stdout and stderr as
>  > wrapped StringIO handles.
>  >
>  > Because it reads in all the stdout and stderr output into memory,
>  > Bio.Application.generic_run function is only suitable for tools with
>  > print very little to the console (or nothing, in which case the return
>  > values can be ignored).  This method is useless on things like BLAST
>  > XML output to stdout which can be hundreds of megabytes in size.  I
>  > would generally discourage the use of the Bio.Application.generic_run
>  > function and instead we should give examples using the command line
>  > object together with the subprocess module (Python 2.3 doesn't have
>  > subprocess, but Biopthyon 1.50 will be the last release to care about
>  > this) which lets the user choose what if any handles they care about.
>
>  Motif finding programs usually output a lot less than there is input. Normally,
>  you don't want to see more than 10 motifs and each contributes ~1kb so
>  I don't see this as a huge problem in this case.

I can see that Bio.Application.generic_run function is often handy,
but sometimes it is quite inappropriate.  For AlignAce obviously it
has sufficed.

>  To be honest, I'm not too keen on rewriting this old code (as well as
>  MEME parser which was contributed by Jason Hackney). But if there
>  will be any new motif parsers (I'd like to have weederand RSAT one
>  day...) I'm happy to conform to any (reasonable) policy.

In the AlignAce case, in the above I wasn't suggesting rewriting,
rather removing some of the what I saw as redundant bits (in an effort
at consistency).

On reflection, perhaps the core Bio.Application.AbstractCommandline
object might benefit from some "run" like methods?  However they do
morph it from a command line string representation into something
bigger...  feature creep! ;)

Peter