[Bioperl-l] bioperl-pipeline and runnables

shawnh@worf.fugu-sg.org shawnh@worf.fugu-sg.org
Mon, 20 May 2002 00:58:05 +0800 (SGT)


Hi all,
	We have been working on the bioperl-pipeline(heavily modelled after ensembl) for some time now and 
we have the input/output/job managment parts working and these parts are still currently
undergoing testing. We are also in the process of writing up as many runnables for the
pipeline as we can. Runnables are basically wrappers around executables that takes
in inputs, passing it to the binary and the parsing the outputs and returning them to the
RunnableDB which talks to the database.
These runnables have a pretty fixed number of method calls that must be present for the pipeline
to work. At the same time, we would like to reuse all that have been written inside Bio::Tools::Run.
It is with this in mind that we propose the following: (Jason, I think we have discussed this before)

Currently bioperl has the following structure (from my view anyway)
1. Parsing Mechanism
AlignIO, SearchIO that handles a variety of data formats.

2. Executable Wrappers
All the modules under Bio::Tools::Run like StandAloneBlast,Clustalw,TCoffee etc..
which makes use of the parsers in (1)

We propose adding the layer:

3. Pipeline Runnables (sitting in bioperl-pipeline)
This is basically a thin wrapper layer that makes use of the exe wrappers in (2) but
contains the interface to the pipeline job managment system. So in the
run function of the runnable, we instantiate the module in (2) ,execute it and pass the output back
to the pipeline logic. We do not wish to embed the parsing logic into the runnable code as this 
will be wasted for there will be those who just wish to use it for a single analysis. This makes
the runnable very lite and easy to code. So for a new program that we want to include, we will
write the wrapper inside Bio::Tools::Run + whatever parsers needed in (1) and the small runnable in bioperl-pipeline.
That way, it is usable in both the core and pipeline.

 
We currently have Blast and Clustalw runnables (inside Bio::Pipeline::Runnable)and will also be modifying the DnaBlockAligner runnable
to fit this new scheme. Feel free to look at the code, though I must confess documentation is quite poor
but we are working on it :) All comments welcomed.

shawn