[Bioperl-pipeline] some newbie questions
Shawn Hoon
shawnh at stanford.edu
Wed Sep 17 04:12:55 EDT 2003
On Monday, September 15, 2003, at 12:19 AM, Marc Logghe wrote:
> Hi all,
> I am brand new to biopipe, so please forgive me if I ask some silly
> questions.
> I am currently playing with the idea of implementing the bioperl
> pipeline and for that I have done some homework by reading a number of
> biopipe documents. I might have missed a few relevant documents,
> though ;-)
>
Ah, I'm writing some of them probably. Documentation may come sooneror
later, depending on how soon I settle into school.
> However there is at least one thing that is not yet clear to me. Up to
> now, we are mirroring a number of databases, like wormbase, and
> handling it manually. This means, unpacking it, making the chromosomes
> and wormpep sequences blastable; genomewide blast to map some features
> in which we are interested; reformatting the database and custom
> mapping data to gff; import into gbrowse; ...
For data preparation, there is some but it maybe limited. One should be
able to roll out your own and plug it in. These would come under
InputCreates. In Bio::Pipeline::InputCreate::* modules are responsible
for various means to setup the inputs and jobs to the pipeline. For
example a module that does file based blasting of sequences called
setup_file_blast will
a) given a file of input sequences in any format, split the file into a
specified number of chunks.
b) create a blast job in the pipeline for each chunk
c) create the specified working directory for storing the output files
d) format the db file for blasting if you are blasting against itself
if the option is specified
see bioperl-pipeline/xml/examples/xml/blast_file_pipeline.xml
If say you want to have the blast output stored as gff files, then u
can specify a data dumper as an output iohandler, see
bioperl-pipeline/xml/examples/xml/blast_db_flat.xml which uses
Bio::Pipeline::Utils::Dumper
Alternatively if you want, you can probably use Bio::DB:GFF as an
output handler to take the blast features and store in directly in to
the database using the Seqfeature gff_string method.
Any customization you will want to do you should probably roll your
module which you can plug in as an output iohandler.
>> From the documentation it is pretty clear that the genomewide blast
>> is especially suited for biopipe.
> But what about all te rest, especially the preparation of the input
> data ? Also, how can you trigger the pipeline ? I mean, every week
> wget is fetching new wormbase data, and of course the pipeline shoud
> only be triggered when new data have arrived. How can you do that ?
Right now, the best bet would be to write some pipeline that reads new
sequences from some directory or file to load sequencing into a db or
treat as a file and carry out the blast. See blast_file_pipeline.xml or
blast_db_flat.xml
for similar example.
This would be triggered by some kinda of cron job that checks the last
modification time of the data file. Nothing for this is currently
written so you are welcome to give it a shot.
> Can you use biopipe for tasks like installing the new version of acedb
> ?
>
I have no knowledge of installing acedb and biopipe cannot do this so
I can't say much. Biopipe is more suited for task where you wanna
parallelize multiple jobs or have some kinda of workflow that you want
to execute in a certain order. So it must be quite complex to setup
acedb if you need a pipeline to do so?
cheers,
shawn
-shawn
More information about the bioperl-pipeline
mailing list