[Bioperl-l] tree building, analysis interfaces (long)

Jason Stajich jason@cgt.mc.duke.edu
Wed, 18 Sep 2002 10:52:46 -0400 (EDT)


Shawn/Elia -

Can we try and agree on a standard for tree building interfaces? - I'd
like to respect the stream nature of some tree building -- parsimony may
produce multiple trees and go with the next_tree.  This is not criticism
so much as a plea for us to move in the same direction so these object can
really be generic building blocks, if you have better ideas feel free to
provide counter-arguments.

I'd suggest we use the Bio::Factory::TreeFactoryI interface which Molphy
and PAML result objects use.  create_tree is a little restrictive in that
multiple trees can be produced by a method and it couples the running and
the parsing in the same method which may not work so well in some systems.

In case you get lost reading all of the below, basically I'd like to see
us go to the Result object which implements the TreeFactory interface and
has a next_tree method.  My vision of how execs should happen is

# setup object, init variables
$obj->parameters({ .. } );

# run the app, get back a result status and result object
my ($rc,$result) = $obj->run();

while( my $tree = $result->next_tree ) {
}

In doing this we decouple the running a bit more from the parsing as
Aaron and I have done in the PAML and Molphy wrappers.  We in fact produce
a result object which handles the parsing of multiple files.  So the
Wrapper will return a return code status and a result object.  The result
object can be queried for various statistics as well as iterate through
the available trees.

I added a proper synopsis to the Tools::Run::Phylo::Molphy::ProtML object
so this gives an example of how it works in that object.  I'd like to see
something similar with delegation to parsers in
Bio::Tools::Run::Phylo::Phylip like parsers.  I will try and work
something up if this is too confusing, but let me know if you think this
seems sane?

Incidently, if those module names look really long and scary to you - I
hope that the work that Martin is doing to build a general AnalysisFactory
object will allow us to do (names subject to change of course)

my $factory = new Bio::Tools::Run::AnalysisFactory(-type => 'local');

my $protml = $factory->program('protml'); # or some other coding

$protml->parameter( .. );
my ($rc,$results) = $protml->run();

AND substitute 'local' for 'ebi.ac.uk:novella', or 'pasteur.fr:pise' (or
something more GRID-like) to run these analysis in their compute queues.
Hence the need for fairly standard and simple interfaces to the
applications and hiding of all the details in a result object IMHO.

Sorry for the long ramble.

-jason

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu