[Bioperl-pipeline] Re: Changes to come (long long mail)
bala at tll.org.sg
bala at tll.org.sg
Sun Aug 17 16:50:57 EDT 2003
Hi Shawn,
> est->
> Analysis: Run Blast against genome
> -> Chain_Output (with filter attached ) && (Output(store blast hit)
> {Optional})
> ->Analysis(setup_est2genome)
> Analysis: Est2Genome-> Output(store gene)
>
>
> We do not need to have some temporary blast hit database but we can
> still have it stored if we want to by attaching an additional output
> iohandler.
I think this approcsh will be very helpful as our analysis are getting more and
more focused.......
>
> The Guts
> ---------------
>
> What I'm proposing is to have a grouping of rules.
>
> A rule group means that I will chain a group of analysis in a single
> job.
>
> Sample rule table:
>
> +---------+---------------+---------+------+---------+
> | rule_id | rule_group_id | current | next | action |
> +---------+---------------+---------+------+---------+
> | 1 | 1 | 1 | 2 | NOTHING |
> | 2 | 2 | 2 | 3 | CHAIN |
> | 3 | 3 | 3 | 4 | NOTHING |
> +---------+---------------+---------+------+---------+
>
> Analysis1: InputCreate
> Analysis2: Blast
> Analysis3: SetupEst2Genome
> Analysis4: Est2Genome
>
> So here we have 3 rule groups. Each job will have its own rule group.
>
> For a single est input, it will create 3 jobs during the course of the
> pipeline execution.
> Job 1: Input Create (fetch all ests and create blast jobs)
> Job 2: Blast (blast est against database)
> Output is chained to Analysis 3 (setup est2genome) using a
> IOHandler of type chain with a blast filter attached
> Job 3: Run Analysis 4(est2genome) of jobs created by analysis 3
>
> Only between analysis 2 and 3 do chaining occur.
>
> If Job 2 fails, the blast and setup_est2genome analysis will have to be
> rerun.
>
> You could imagine having multiple analysis chained within a rule_group.
>
> I have working code for this. The next thing that I'm still thinking
> about is to have a stronger
> form of datatype definition between the runnables which is currently
> not too strongly
> enforced . It will be probably based on Martin's (or Pise or emboss)
> Analysis data
> definition interface. We can either have this information done at the
> runnable layer
> or the bioperl-run wrappers layer or both.
>
> Once this is done, we can have a hierarchical organization of the
> pipelines:
>
> - chaining analysis within rule groups
> - chaining rule groups ( add a rule_group relationship table)(defined
> within 1 xml)
>
> - chaining pipelines(add a meta_pipeline table) which means re-using
> different xmls
> as long as the inputs and outputs of first and last analysis of the
> pipelines match.
>
>
> I would like some help with regards to this application definition
> interface if people are interested or have
> comments...
I would like to chip in.....and maybe after this changes we will have a very
updated version of biopipe with all the things we have done in the past
months...
> sorry for the long mail..if u get to reading to this point.
>
> shawn
>
>
bala
--------------------------------------------------------
This mail was sent through Intouch: http://www.techworx.net/
More information about the bioperl-pipeline
mailing list