[Biopython-dev] Where to put command line wrappers

Peter biopython at maubp.freeserve.co.uk
Thu Apr 16 17:16:10 UTC 2009


On Thu, Apr 16, 2009 at 5:45 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> We were recently discussing alignment tools like MUSCLE and ClustalW
> and putting together a set of command line wrappers under Bio.Align
> for them.  I think Bio.Align.Applications was suggested to match
> Bio.EMBOSS.Applications.
>
> For EMBOSS we have a single file, Bio/Emboss/Applications.py, which
> has about 15 wrappers (all very similar as the EMBOSS applications are
> very consistent).  This is nice in that all the wrappers are in the
> Bio.Emboss.Application namespace.
>
> Bartek and I have been having a similar discussion for Motif tools,
> and if the AliceAce wrappers should go in Bio.Motif.Applications to
> match.  For now Bio.Motif has just one wrapper for AlignACE and sister
> tool CompareACE.  Now giving each tool-set its own file is possible
> (Bio/Motif/Applications/AlignAce.py) but would one (large) file be
> simpler? (i.e. Bio/Motif/Applications.py).
>
> I'm not sure how many wrappers we might eventually expect for multiple
> sequence alignments, maybe ten or twenty, mostly from different tool
> sets.  Maybe Bio/Align/Applications/Muscle.py etc is the way to go,
> but we can then import all the command line objects under the
> Bio.Align.Applications namespace.
>
> Any comments?

For any that missed the thread last week, I'd like to link back to the
end of my post:
http://lists.open-bio.org/pipermail/biopython-dev/2009-March/005658.html

I see introducing Bio.Align.Applications as chance to get a more
consistent approach to Biopython's command line wrappers established
(replacing Bio.Clustalw).  And as I wrote last month, I think we
should focus on the Bio.Application command line wrapper object.  For
reasons explained in the linked email, I would want to rewrite
Bio.Blast.NCBIStandalone in the same way (probably putting the command
line wrapper classes in Bio.Blast.Applications, and if there is
interesting, include other variants like WUBlast).  Are there any
other wrappers not using Bio.Application which I have forgotten about?

Bio/AlignAce/Applications.py does use Bio.Application, but we are
planning to replace this module with Bio.Motif which gives us a chance
to review the API without worrying too much about backwards
compatibility.  As part of moving it to Bio.Motif, I would remove the
run methods from AlignAceCommandline and CompareAceCommandline (none
of the other Biopython command line objects have them as far as I
know), and also remove the AlignAce and CompareAce helper functions
(in Bio/AlignAce/AlignAceStandalone.py and
Bio/AlignAce/CompareAceStandalone.py). Internally these all call the
Bio.Application.generic_run function, and return stdout and stderr as
wrapped StringIO handles.

Because it reads in all the stdout and stderr output into memory,
Bio.Application.generic_run function is only suitable for tools with
print very little to the console (or nothing, in which case the return
values can be ignored).  This method is useless on things like BLAST
XML output to stdout which can be hundreds of megabytes in size.  I
would generally discourage the use of the Bio.Application.generic_run
function and instead we should give examples using the command line
object together with the subprocess module (Python 2.3 doesn't have
subprocess, but Biopthyon 1.50 will be the last release to care about
this) which lets the user choose what if any handles they care about.

Peter




More information about the Biopython-dev mailing list