[Bioperl-pipeline] creating jobs, job_setup

Shawn Hoon shawnh at fugu-sg.org
Fri Feb 13 03:18:58 EST 2004


Hi Alexandre,
	okay this is what I gather you are trying to do:

Run analysis 1 -> 2 -> 3 -> 4

The question is what are your inputs? are u running the four analysis 
on the same input type? for example, you
have four blast analysis that you do on  sequences? If so, then what 
you would do is use a input create/data monger to
create inputs for analysis 1. Then in your rules you would specify 
COPY_ID for analysis 1 -> 2 and 2->3 and 3->4
then the input id will be transferred between analysis.

If your input  for analysis 2 for example is different from that of 
analysis 1, then you need to do something different.
For this, there are 2 options:

	1) If you require that the analysis 1 is completed before 2 is 
completed, then you need an analysis in between 1 and 2 ( so as a 
result 2 becomes 3)
	     Analysis 2 would now be an input_create which knows how to create 
inputs for analysis 2. (Basically we are assuming the this input 
creation is linked to
   	     the input 1 of analysis 1.
	2) If you require that all of the inputs from analysis 1 is completed 
before any analysis 2 jobs are started, you can do a rule WAITFORALL 
which would then launch
     	     a job of analysis 2 (which may or may not be a input create).

for your definition below, I don't see why analysis 4 should be 
executed at startup. Can you provide the xml file?
shawn


On Feb 12, 2004, at 5:48 AM, Alexandre Dehne wrote:

> Hi Kiran,
>
> Thank you for answering me.
> Actually, your solution is very clean but, by using it, other problems
> came up.
>
> Here is the current situation:
> So, I start a job on my first analysis with your suggestion. Then, more
> jobs on other analysis are created by placing
> "<action>COPY_ID_FILE</action>" or "<action>COPY_ID</action>" in their
> respective rules in the XML file.
> (Remember that for now, all of my analysis do not take any input and do
> not give any output. So, this way, everything is fine and work well.)
>
> Here comes the problem when I want to use an analysis that needs an
> input. For that, I am using the data monger. Since the data monger 
> needs
> an input, it therefore does not work. So, I am trying to create this
> input by using the following <input> mark:
>
> ...
>    <analysis id="4">
>       <data_monger>
>         <input>
>           <name>$input_description</name>
>           <iohandler>1</iohandler>
>         </input>
>         <input_create>
> ....
>
>
> My initial data monger (analysis N.1) and the one previously described
> (analysis N.4) are now called at the beginning of the pipeline.
> But, the analysis N.4 has to be called after the third one as I
> specified it in the rules.
>
> Do you have any suggestion on how to solve my problem and why the rules
> are not followed ?
> Please let me know if I am not clear.
>
> Thank you in advance,
>
> Alexandre
>
>
>
> On Wed, 2004-02-11 at 00:30, Kiran Kumar wrote:
>> Hi Alexandre,
>> It's nice to know that it fits into your work.
>>
>> In short, you would be able to create job without inputs. The direct 
>> way
>> would be 'not to pass' any inputs to "create_job" function.
>>
>>       my $job   = $self->create_job($next_anal);
>>       $self->dbadaptor->get_JobAdaptor->store($job);
>> That should make it 'righteous' :-)..
>>
>> Since you are following the Biopipe spirit, let me go on to explain 
>> the
>> other aspects too.
>>
>>
>> On the xml level, you are right that the <job_setup> tag could be 
>> used for
>> this purpose.
>> The <job_setup> provides for specifying jobs directly inside the XML 
>> file
>> without using a Datamonger/InputCreate. Ofcourse, this is convinient 
>> if
>> the number of jobs are handful which otherwise would make the XML file
>> very lengthy. This feature is still there but has not been tested for 
>> long
>> time. We have stopped using this feature for a drawback it poses 
>> towards
>> the biopipe spirit which is as follows.
>>
>> If the job needs inputs, and it is specified using job_setup options,
>> the xml file becomes too specific and anyone else trying to re-use it
>> would have to change all the input_ids each time they need to run for
>> different sets of inputs. The datamonger/InputCreate on the other 
>> hand,
>> provides for the clean separation of input names from the xml pipeline
>> specification. The InputCreates are expected to read the input_names 
>> for
>> the the jobs they are gonna create from a file or directory or 
>> somewhere
>> (this location for the input_names is specified as the input_create's
>> parameters in the xml file).
>>
>> Hope I havent left you more confused than before!
>>
>> Cheers,
>> Kiran
>>
>>
>>> Hi,
>>>
>>> First, I would like to congratulate the Biopipe team for having 
>>> created such a useful tool.
>>>
>>>
>>> The context :
>>> For several reasons (some goods and some not so good), some of my 
>>> runnables take nothing in input and return nothing.
>>>
>>> The problem :
>>> This type of runnables does not match the biopipe "spirit", so it is 
>>> a problem to create jobs for these runnables via the "create_job" 
>>> function which needs a array input.
>>>
>>> The "temporary" unrighteous solution :
>>> I have created an InputCreate module named setup_nothing which 
>>> creates a void input like the following :
>>>    my @input=$self->create_input("nothing",'',"infile");
>>>    my $job   = $self->create_job($next_anal,\@input);
>>>    $self->dbadaptor->get_JobAdaptor->store($job);
>>> This way, I launch one job on my analysis as well as on the 
>>> following ones by placing "<action>COPY_ID_FILE</action>" in their 
>>> respective rules in the XML file.
>>>
>>>
>>> The questions :
>>> Is there a clean way to create jobs without any input (a just_do_it 
>>> function ?) ?
>>> Perhaps the <job_setup> mark in the XML file ?
>>> Also, could someone tell me more about this <job_setup> mark ???
>>>
>>>
>>> Thank you in advance
>>>
>>>
>>> Alexandre
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> bioperl-pipeline mailing list
>>> bioperl-pipeline at bioperl.org
>>> http://bioperl.org/mailman/listinfo/bioperl-pipeline
>>>
>>
>
> _______________________________________________
> bioperl-pipeline mailing list
> bioperl-pipeline at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-pipeline
>



More information about the bioperl-pipeline mailing list