[Biopython-dev] Fwd: Where to put command line wrappers
Peter
biopython at maubp.freeserve.co.uk
Thu Apr 16 14:53:03 EDT 2009
On 4/16/09, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:
> Hi All,
>
> On Thu, Apr 16, 2009 at 5:45 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> > For EMBOSS we have a single file, Bio/Emboss/Applications.py, which
> > has about 15 wrappers (all very similar as the EMBOSS applications are
> > very consistent). This is nice in that all the wrappers are in the
> > Bio.Emboss.Application namespace.
> >
> > Bartek and I have been having a similar discussion for Motif tools,
> > and if the AliceAce wrappers should go in Bio.Motif.Applications to
> > match. For now Bio.Motif has just one wrapper for AlignACE and sister
> > tool CompareACE. Now giving each tool-set its own file is possible
> > (Bio/Motif/Applications/AlignAce.py) but would one (large) file be
> > simpler? (i.e. Bio/Motif/Applications.py).
> >
> I think that there is a difference between EMBOSS and
> Bio.[Motif|Align]. In EMBOSS we have a very nicely comoditized
> set of tools with similar interfaces, while both for multiple
> alignment and motif searching the tools vary a lot. In case of
> multiple alignments this is only with respect to parameters and
> output format, while in motif searching there is also a lot of
> differences in the types of input (background models etc.).
That is a good argument for using Bio/Align/Applications/XXX.py and
Bio/Motif/Applications/XXX.py while also having
Bio/EMBOSS/Applications.py
> Also, quite likely the parsers for different tools will be written by
> different people.
Biopython's command line wrappers can be quite separate from the
parsers - this is a natural break. One can be useful without the
other, and keeping them separate allows you to for example use a
Biopython wrapper with another parser, or vice versa.
> In this case, I think that it's much easier from the maintainers point
> of view to have a directory with separate files rather than a single
> module. [...]
True.
> >> I'm not sure how many wrappers we might eventually expect for multiple
> >> sequence alignments, maybe ten or twenty, mostly from different tool
> >> sets. Maybe Bio/Align/Applications/Muscle.py etc is the way to go,
> >> but we can then import all the command line objects under the
> >> Bio.Align.Applications namespace.
>
> +1 from me.
>
> > Bio/AlignAce/Applications.py does use Bio.Application, but we are
> > planning to replace this module with Bio.Motif which gives us a chance
> > to review the API without worrying too much about backwards
> > compatibility. As part of moving it to Bio.Motif, I would remove the
> > run methods from AlignAceCommandline and CompareAceCommandline (none
> > of the other Biopython command line objects have them as far as I
> > know), and also remove the AlignAce and CompareAce helper functions
> > (in Bio/AlignAce/AlignAceStandalone.py and
> > Bio/AlignAce/CompareAceStandalone.py). Internally these all call the
> > Bio.Application.generic_run function, and return stdout and stderr as
> > wrapped StringIO handles.
> >
> > Because it reads in all the stdout and stderr output into memory,
> > Bio.Application.generic_run function is only suitable for tools with
> > print very little to the console (or nothing, in which case the return
> > values can be ignored). This method is useless on things like BLAST
> > XML output to stdout which can be hundreds of megabytes in size. I
> > would generally discourage the use of the Bio.Application.generic_run
> > function and instead we should give examples using the command line
> > object together with the subprocess module (Python 2.3 doesn't have
> > subprocess, but Biopthyon 1.50 will be the last release to care about
> > this) which lets the user choose what if any handles they care about.
>
> Motif finding programs usually output a lot less than there is input. Normally,
> you don't want to see more than 10 motifs and each contributes ~1kb so
> I don't see this as a huge problem in this case.
I can see that Bio.Application.generic_run function is often handy,
but sometimes it is quite inappropriate. For AlignAce obviously it
has sufficed.
> To be honest, I'm not too keen on rewriting this old code (as well as
> MEME parser which was contributed by Jason Hackney). But if there
> will be any new motif parsers (I'd like to have weederand RSAT one
> day...) I'm happy to conform to any (reasonable) policy.
In the AlignAce case, in the above I wasn't suggesting rewriting,
rather removing some of the what I saw as redundant bits (in an effort
at consistency).
On reflection, perhaps the core Bio.Application.AbstractCommandline
object might benefit from some "run" like methods? However they do
morph it from a command line string representation into something
bigger... feature creep! ;)
Peter
More information about the Biopython-dev
mailing list