[Biopython-dev] Fwd: Where to put command line wrappers

Bartek Wilczynski bartek at rezolwenta.eu.org
Thu Apr 16 17:37:29 UTC 2009


Hi All,

 On Thu, Apr 16, 2009 at 5:45 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> For EMBOSS we have a single file, Bio/Emboss/Applications.py, which
> has about 15 wrappers (all very similar as the EMBOSS applications are
> very consistent).  This is nice in that all the wrappers are in the
> Bio.Emboss.Application namespace.
>
> Bartek and I have been having a similar discussion for Motif tools,
> and if the AliceAce wrappers should go in Bio.Motif.Applications to
> match.  For now Bio.Motif has just one wrapper for AlignACE and sister
> tool CompareACE.  Now giving each tool-set its own file is possible
> (Bio/Motif/Applications/AlignAce.py) but would one (large) file be
> simpler? (i.e. Bio/Motif/Applications.py).
>
I think that there is a difference between EMBOSS and
Bio.[Motif|Align]. In EMBOSS we
have a very nicely comoditized set of tools with similar interfaces,
while both for multiple
alignment and motif searching the tools vary a lot. In case of
multiple alignments this is only
with respect to parameters and output format, while in motif searching
there is also a lot of
differences in the types of input (background models etc.). Also,
quite likely the parsers for
different tools will be written by different people.

In this case, I think that it's much easier from the maintainers point
of view to have a directory
with separate files rather than a single module. If people are scared
by nested namespaces,
we can import the important classes into the higher level.

>> I'm not sure how many wrappers we might eventually expect for multiple
>> sequence alignments, maybe ten or twenty, mostly from different tool
>> sets.  Maybe Bio/Align/Applications/Muscle.py etc is the way to go,
>> but we can then import all the command line objects under the
>> Bio.Align.Applications namespace.
>>
+1 from me.

>
> Bio/AlignAce/Applications.py does use Bio.Application, but we are
> planning to replace this module with Bio.Motif which gives us a chance
> to review the API without worrying too much about backwards
> compatibility.  As part of moving it to Bio.Motif, I would remove the
> run methods from AlignAceCommandline and CompareAceCommandline (none
> of the other Biopython command line objects have them as far as I
> know), and also remove the AlignAce and CompareAce helper functions
> (in Bio/AlignAce/AlignAceStandalone.py and
> Bio/AlignAce/CompareAceStandalone.py). Internally these all call the
> Bio.Application.generic_run function, and return stdout and stderr as
> wrapped StringIO handles.
>
> Because it reads in all the stdout and stderr output into memory,
> Bio.Application.generic_run function is only suitable for tools with
> print very little to the console (or nothing, in which case the return
> values can be ignored).  This method is useless on things like BLAST
> XML output to stdout which can be hundreds of megabytes in size.  I
> would generally discourage the use of the Bio.Application.generic_run
> function and instead we should give examples using the command line
> object together with the subprocess module (Python 2.3 doesn't have
> subprocess, but Biopthyon 1.50 will be the last release to care about
> this) which lets the user choose what if any handles they care about.

Motif finding programs usually output a lot less than there is input. Normally,
you don't want to see more than 10 motifs and each contributes ~1kb so
I don't see this as a huge problem in this case. To be honest, I'm not too keen
on rewriting this old code (as well as MEME parser which was contributed by
Jason Hackney). But if there will be any new motif parsers (I'd like
to have weederand RSAT one day...) I'm happy to conform to any
(reasonable) policy.

cheers
Bartek




More information about the Biopython-dev mailing list