[Bioperl-l] Pise/bioperl

Jason Stajich jason@cgt.mc.duke.edu
Thu, 10 Jan 2002 14:35:49 -0500 (EST)


[cc bioperl so that others can see the thought process]

On Tue, 8 Jan 2002, Catherine Letondal wrote:

> Ok, I have to look more precisely to BSANE, Novella and to the EMBOSS
> interface.
>
> I am not sure, but I think that there are some correspondances between
> the 2 first interfaces you describe and Pise/bioperl classes and
> objects. If I take the diagram posted in my intial message:
>
>        Pise              PiseJob               PiseJobParser
>        ----              -------               -------------
>        factory     -->   actual job           (Perl/SAX) to parse
>        to set            (methods to           XHTML server output
>        parameters         actually submit
>        and launch         the http request
>        jobs               via LWP::UserAgent,
>                           get job state
>          |                and results)
>          |
>        +------+
>        | Pise |
>        +------+---------------------------------------+
>        | toppred fasta hmmbuild nnssp blast2 clustalw |
>        | genscan tacg dnapars wublast2 ...            |
>        |                                              |
>        +----------------------------------------------+
>
>
> - Pise/program instances could probably stand for the analysis
> application object (accepting data and parameters and producing as
> many analyses results as required)
>
> - there is no factory in the sense of a queue, the factory here is
> just the Pise/program class; what do you exactly mean by an analysis
> queue? a complex task? a strategy?
>
Just some generic entities that aggregate the information about the
central server.

my $pisefactory = new Bio::Factory::PISE(@params);
my $embossfactory = new Bio::Factory::EMBOSS(@params);

my $clustal = $pisefactory->get_Application_Handle('clustalw');
my $res = $clustal->align(-data => [$seq1,$seq2]);

my $primer3 = $embossfactory->get_Application_Handle('primer3');

my $res = $primer3->analyze( @parameters );


my $localfactory = new Bio::Factory::Local();

$localblast = $localfactory->get_Application_Handle('blastall');

my $res = $localblast->search(-data => [$seq1,$seq2,$seq3]);


None of the method names are meant to imply those are good names for these
methods!  Obviously I am simplifying what is a more complex problem. But
the idea is there and I feel like it would be something to debate.  I only
want things to be fairly transparent to the user whether or not invoking
an application requires submitting a HTTP request through LWP or through
CORBA or as a forked process.

You have a the JobQueue notion built into PISE, I think this is probably
better than my very simplified example above.  We might add a parameter to
the server or application code that is BLOCKUNTILFINISHED so that users
don't have to write while( ! done() ) sleep(X)


>
> - a PiseJob instance represents an analysis result but in a very
> generic way: it provides access both to the output and to the input
> data ; the only semantic here is given by methods related to the
> "result type" (alignment, phylogenetic tree, distance matrix, ...) ;
> result parsing is out of the scope of these modules; I believe it is
> important to keep analysis running and parsing independant so that it
> is still possible to enter the parsing level with already gathered
> analyses data (files, urls, database entries, ...)
>

Yes - parsing should definitely be separated out.


I'm thinking I would have a better idea of thet issues after drawing out
some of this at a whiteboard. I am not looking to reinvent anything just
trying to tie together the different systems in a nice perl layer.



> > Happy to do this at the hackathon unless we want to put together someone
> > sooner over email.
> >
> > -jason
> >
> > --
> > Jason Stajich
> > Duke University
> > jason@cgt.mc.duke.edu
> >
>
> --
> Catherine Letondal -- Pasteur Institute Computing Center
>

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu