[Bioperl-pipeline] creating jobs, job_setup
Shawn Hoon
shawnh at fugu-sg.org
Fri Feb 13 03:18:58 EST 2004
Hi Alexandre,
okay this is what I gather you are trying to do:
Run analysis 1 -> 2 -> 3 -> 4
The question is what are your inputs? are u running the four analysis
on the same input type? for example, you
have four blast analysis that you do on sequences? If so, then what
you would do is use a input create/data monger to
create inputs for analysis 1. Then in your rules you would specify
COPY_ID for analysis 1 -> 2 and 2->3 and 3->4
then the input id will be transferred between analysis.
If your input for analysis 2 for example is different from that of
analysis 1, then you need to do something different.
For this, there are 2 options:
1) If you require that the analysis 1 is completed before 2 is
completed, then you need an analysis in between 1 and 2 ( so as a
result 2 becomes 3)
Analysis 2 would now be an input_create which knows how to create
inputs for analysis 2. (Basically we are assuming the this input
creation is linked to
the input 1 of analysis 1.
2) If you require that all of the inputs from analysis 1 is completed
before any analysis 2 jobs are started, you can do a rule WAITFORALL
which would then launch
a job of analysis 2 (which may or may not be a input create).
for your definition below, I don't see why analysis 4 should be
executed at startup. Can you provide the xml file?
shawn
On Feb 12, 2004, at 5:48 AM, Alexandre Dehne wrote:
> Hi Kiran,
>
> Thank you for answering me.
> Actually, your solution is very clean but, by using it, other problems
> came up.
>
> Here is the current situation:
> So, I start a job on my first analysis with your suggestion. Then, more
> jobs on other analysis are created by placing
> "<action>COPY_ID_FILE</action>" or "<action>COPY_ID</action>" in their
> respective rules in the XML file.
> (Remember that for now, all of my analysis do not take any input and do
> not give any output. So, this way, everything is fine and work well.)
>
> Here comes the problem when I want to use an analysis that needs an
> input. For that, I am using the data monger. Since the data monger
> needs
> an input, it therefore does not work. So, I am trying to create this
> input by using the following <input> mark:
>
> ...
> <analysis id="4">
> <data_monger>
> <input>
> <name>$input_description</name>
> <iohandler>1</iohandler>
> </input>
> <input_create>
> ....
>
>
> My initial data monger (analysis N.1) and the one previously described
> (analysis N.4) are now called at the beginning of the pipeline.
> But, the analysis N.4 has to be called after the third one as I
> specified it in the rules.
>
> Do you have any suggestion on how to solve my problem and why the rules
> are not followed ?
> Please let me know if I am not clear.
>
> Thank you in advance,
>
> Alexandre
>
>
>
> On Wed, 2004-02-11 at 00:30, Kiran Kumar wrote:
>> Hi Alexandre,
>> It's nice to know that it fits into your work.
>>
>> In short, you would be able to create job without inputs. The direct
>> way
>> would be 'not to pass' any inputs to "create_job" function.
>>
>> my $job = $self->create_job($next_anal);
>> $self->dbadaptor->get_JobAdaptor->store($job);
>> That should make it 'righteous' :-)..
>>
>> Since you are following the Biopipe spirit, let me go on to explain
>> the
>> other aspects too.
>>
>>
>> On the xml level, you are right that the <job_setup> tag could be
>> used for
>> this purpose.
>> The <job_setup> provides for specifying jobs directly inside the XML
>> file
>> without using a Datamonger/InputCreate. Ofcourse, this is convinient
>> if
>> the number of jobs are handful which otherwise would make the XML file
>> very lengthy. This feature is still there but has not been tested for
>> long
>> time. We have stopped using this feature for a drawback it poses
>> towards
>> the biopipe spirit which is as follows.
>>
>> If the job needs inputs, and it is specified using job_setup options,
>> the xml file becomes too specific and anyone else trying to re-use it
>> would have to change all the input_ids each time they need to run for
>> different sets of inputs. The datamonger/InputCreate on the other
>> hand,
>> provides for the clean separation of input names from the xml pipeline
>> specification. The InputCreates are expected to read the input_names
>> for
>> the the jobs they are gonna create from a file or directory or
>> somewhere
>> (this location for the input_names is specified as the input_create's
>> parameters in the xml file).
>>
>> Hope I havent left you more confused than before!
>>
>> Cheers,
>> Kiran
>>
>>
>>> Hi,
>>>
>>> First, I would like to congratulate the Biopipe team for having
>>> created such a useful tool.
>>>
>>>
>>> The context :
>>> For several reasons (some goods and some not so good), some of my
>>> runnables take nothing in input and return nothing.
>>>
>>> The problem :
>>> This type of runnables does not match the biopipe "spirit", so it is
>>> a problem to create jobs for these runnables via the "create_job"
>>> function which needs a array input.
>>>
>>> The "temporary" unrighteous solution :
>>> I have created an InputCreate module named setup_nothing which
>>> creates a void input like the following :
>>> my @input=$self->create_input("nothing",'',"infile");
>>> my $job = $self->create_job($next_anal,\@input);
>>> $self->dbadaptor->get_JobAdaptor->store($job);
>>> This way, I launch one job on my analysis as well as on the
>>> following ones by placing "<action>COPY_ID_FILE</action>" in their
>>> respective rules in the XML file.
>>>
>>>
>>> The questions :
>>> Is there a clean way to create jobs without any input (a just_do_it
>>> function ?) ?
>>> Perhaps the <job_setup> mark in the XML file ?
>>> Also, could someone tell me more about this <job_setup> mark ???
>>>
>>>
>>> Thank you in advance
>>>
>>>
>>> Alexandre
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> bioperl-pipeline mailing list
>>> bioperl-pipeline at bioperl.org
>>> http://bioperl.org/mailman/listinfo/bioperl-pipeline
>>>
>>
>
> _______________________________________________
> bioperl-pipeline mailing list
> bioperl-pipeline at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-pipeline
>
More information about the bioperl-pipeline
mailing list