[Bioperl-pipeline] multiple pipelines
Elia Stupka
elia at tll.org.sg
Tue Jul 1 16:11:29 EDT 2003
Hi Jeremy,
we are currently having an internal discussion about this, we are
actually trying to work towards a new multi-pipeline system, where one
database could contain multiple pipelines. Also, files relating to jobs
would have pipeline ids, etc. etc. and finally the web manager would
track multiple pipelines. This is at discussion stage at the moment,
though Juguang and Aaron over here seem set to work on this soon.
> One other note: with our setup, reading/writing from/to an nfs
> directory
> during a blast analysis is very io bound.
Absolutely. To achieve best performance you need:
1-Blast database local to node with best possible read speed (in our
case with 2 mirrored local hard disks)
2-Write STDOUT and STDERR to local node, read results from there and
finally store results in database (no need to copy anywhere)
The only current caveat with point 2 is that if a job fails, the error
file stays there, and there is no simple way to track which node a job
is running on. We are about to change the database schema and the code
to make sure we keep track of the node id that a job is running on
after it is submitted.
> then copied back to the nfs mounted directory the analysis was started
> in
If you are using a database (e.g. BioSQL or Ensembl) to store your
blast results, you don't even need this last step, you just parse the
file locally and then write results back to the db.
Elia
---
Bioinformatics Program Manager
Temasek Life Sciences Laboratory
1, Research Link
Singapore 117604
Tel. +65 6874 4945
Fax. +65 6872 7007
More information about the bioperl-pipeline
mailing list