[Bioperl-pipeline] Plant-Biotech pipeline
Shawn Hoon
shawnh at stanford.edu
Wed Sep 17 11:44:21 EDT 2003
On Wednesday, September 17, 2003, at 7:28 AM, Joachim H. Strach wrote:
> Hello,
>
> first of all thanks for your previous anwsers, they helped a lot for
> my understanding of the Biopipe workflow.
>
> Some more question arised ... .
> I would be glad if either you could tell me to find the suitable
> documentation or give me some more answers.
>
> I took a closer look to genome_annotation_pipeline.xml:
> - What are the tags <transformer>, <input_iohandler_mapping>, good for?
transformer defines modules that may be used to operate on data input
and output before they are passed to iohandlers.
This includes filtering or other data transformation operations. Input
transformers are applied after fetching from iohandlers,
while output transformers are applied before storing to db.
For example in a 2 stage blast which writes the result to the database
both times and applying filters when fetching
the input and writing output, the flow is :
store to db
/
filter trransformer |filter transformer}
/ \ /
Input- ----> blast 1/ (analysis 1)
\ -> blast 2 (analysis 2)
\ / \
filter transformer \
|filter transformer|
\
store to db
Previously biopipe requires that all results should be written to a
database before being fetched again for the next analysis.
This is so that if say the analysis 2 fails, one would not need to
rerun the first.
Now, sometimes the first analysis may be some simple operation that
runs fast, and we don't want to bother with storing her results.
So I have recently committed some relatively new code for doing
iohandler chaining.
the flow for the same analysis is sightly different:
filter transformer filter transformer
/ \ / \
Input- ----> blast 1 blast2
\
filter transformer|
\
store to db
So for this case if blast2 fails, we gotta go back and rerun blast1.
I don't think I have committed the xml for this. Will do so, when I get
back from a dept retreat this week.
see mail:
http://bioperl.org/pipermail//bioperl-pipeline/2003-August/000387.html
> - What is the function of the <data_monger> ?
this is to a 'special' runnable that is used to set up analysis.
Say you want to align cdnas to genomes.
What you may want to do is to run est2genome of the cdna on the section
of the genome where it hits found via blast.
You wouldn't want to pass the entire chromosome for example to
est2genome. Instead you may need to figure out the
region where the blast , do some padding and pass the slice of the
genome together with the cdna to the next analysis.
So you would figure out the hit region, pass the start,end strand of
the coordinates to the est2genome input iohandler .
So we would plug into the DataMonger, a Bio::Pipeline::InputCreate
which contains various 'hacky' modules that
setup jobs very specifically as to how your analysis requires the
inputs. This is to reconcile how a lot of times,
the database adaptor modules do not return what you want to feed
directly into an analysis.
> - At the rule section in <action>: where is e.g. the "COPY_ID" related
> to?
>
Once a job is finished, the PipelineManager will look up what it should
do next with regards to this job.
for COPY_ID it will reuse the same input id for the next analysis but
may map the input iohandler to a new one for example:
RepeatMasker->Blast
both use the same input say sequence_1
but the fetching of sequence for blast (via ensembl) would use a
fetch_repeatmasked_seq while RepeatMasker would
fetch unmasked seq as its input. So there is a reuse of the input id
and change of the input iohandler
see bioperl-pipeline/xml/README
> - Shawn, why did you say "... return mostly bioperl objects". Which
> runnables do not and what do they return?
Uhm, okay you got me. All committed runnables return bioperl objects.
However sometimes we do write specific runnables that may return
ensembl objects ( in genome annotation)
or other objects that we use for our own data schemas.... not things
which we are proud of and do not commit yet ... :)
> - My pipeline should perform two blast queries, where the second one
> gets as input the filtered ouput of the first one.
> How can I filter on the bioperl objects directly without using
> IO-handling? Or more general: How can I pass on the bioperl objects
> returned from a runnable to the runnable of the next analysis?
>
Ah the IOHandling chaining example describe previously would be the
way. I will commit some examples this weekend.
cheers,
shawn
> Thanks for your advice.
>
> Joachim
>
>
>
>
>
>
>
> _______________________________________________________________________
> _______
> Zwei Mal Platz 1 mit dem jeweils besten Testergebnis! WEB.DE FreeMail
> und WEB.DE Club bei Stiftung Warentest! http://f.web.de/?mc=021183
>
> _______________________________________________
> bioperl-pipeline mailing list
> bioperl-pipeline at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-pipeline
>
>
-shawn
More information about the bioperl-pipeline
mailing list