[Bioperl-pipeline] multiple pipelines

Shawn Hoon shawnh at fugu-sg.org
Wed Jul 2 02:51:21 EDT 2003


>
> But, for now, how safe is it to run two pipelines at once? Especially,
> has anyone done any workarounds to allow the PipelineManager to write 
> to
> different tmp directories? (if not, I I will something very simple to
> keep the execution scripts separate)
>

Not a problem running two pipelines if you have 2 pipeline databases 
and run PipelineManager twice.

Temp files written to the same tmp directories should not pose a 
problem as they are
named using the pipeline database id as part of the id so no conflicts 
should occur.


>> Absolutely. To achieve best performance you need:
>>
>> 1-Blast database local to node with best possible read speed (in our
>> case with 2  mirrored local hard disks)
>
> I don't know if you have any numbers or not, but I wonder what the
> approximate percent speed gain is from doing this... any idea? That is
> obviously a very aggressive setup... the type of setup I would expect 
> on
> a heavily used/publicly accessible resource.
>

Something for Chen Peng to answer..
>>
>> 2-Write STDOUT and STDERR to local node, read results from there and
>> finally store results in database (no need to copy anywhere)
>>
>> The only current caveat with point 2 is that if a job fails, the error
>> file stays there...
>
>
> So, is doing this included in the current code? I didn't notice this...
> or is it not there due to the problem you mentioned.


The current way I do things is to have STDOUT and STDERR written to 
NFSTMP_DIR.
These are pipeline log files which is more convenient for me to access 
and I haven't had massive
problems so far. By default, the data input and output of programs are 
handled by bioperl-run
and this should be written at the local node. Jobs are run locally and 
the wrapper modules will write their
files to the local temp directory as should be the case.  Usually /tmp 
(on the local node) or whatever your env var tempdir
is set to.  Results are parsed locally and objects written to
database. If you are writing to files, then the files should get copied 
to some NFS mounted result
directory. This location is usually set in the runnable parameters.

>
> Actually, initially, I was doing basically this. I set NFSTMP_DIR to
> /tmp, which is local on each machine. But, I had to stop doing that 
> when
> the pipeline started making subdirectories in NFSTMP_DIR. I think the
> pbs software was automatically copying (scp) the output to /tmp on the
> master node... I'm not exactly sure how that was working though.
>

We don't have PBS installed so I'm not sure for this. But the 
subdirectories were created so that the multitude of log files
created do not pile up into a single directory making it hard to 
access. So spliting the files among the subdirectories will lessen the
load in that sense.



shawn



More information about the bioperl-pipeline mailing list