[Bioperl-pipeline] Next set of questions.

Elia Stupka elia@fugu-sg.org
Fri, 30 Aug 2002 19:25:34 +0800 (SGT)


> I do not want any of these large datasets. I'll try to install 
> ensembl program without these, tell me if I am wrong and it is not 
> possible.

Yes, that is what you should do, just check out using cvs the ensembl code
as shown in our instructions, and for now ignore everything else, the
human data, the web mirrors, etc. *just the code*

> 2., On the other hand it would be nice to have a small database to 
> play with while I am setting up our own system. Is there some small 
> (demo) database for this purpose?

Yes, the whole example tarball is up on the website, see:

http://www.fugu-sg.org/bioperl-pipeline/bioperl-pipeline-install.html and
read those documents, download the tar ball which has a small ensembl
database, pipeline database,etc.

> 3., Of course I can not predict what information we need to have in 
> the database in the future. Is it possible to change the schema of 
> the database once it is already populated and being used?

Sure, no problem, that is what open source is for :)

> a try as a lab-rat. Could you point at the location of the binaries 
> to download? 

genewise: ftp://ftp.sanger.ac.uk/pub/birney/wise2/

genscan has to be requested from Chris Burge at:
http://genes.mit.edu/license.html

> 5., I would need results of bacterial gene finders, such as glimmer 
> which will hopefully work here if I finally succeed solving some 
> funny problem with it. And the result of GeneMarkS, which is provided 
> through the Internet, and I will have an output file through e-mail. 
> I can parse/convert any files to a Bioperl-readable format, if I have 
> to, that is not a problem. Can I then somehow inject these things 
> into the pipeline or right into the database?

Yes, absolutely, we will guide you to do that once you have the basic
setup and working. The principle of the pipeline is to easily extend to
any data sources for input and output.

> Sorry for the lame questions, but I am confused a bit as I am working 
> on every possible levels of the problem, starting from the special 
> chemistry of the sequencing reactions, to editing, finishing, 
> annotating, pipeline- WebServer- database-setup, as well as designing 
> and coding a search for a new kind of feature.

Absolutely no worries, it's normal! :)

> By the way the result of this one should also go into the database.
> Moreover I am confused because an OO database would have been closer 
> to my current knowledge, and I can not imagine how this whole thing 
> would work. I hope it will one day. I have read "Ensembl Tutorial" 
> which only shows how to get things out and not how to put them in.

All of the data that goes in and out is always in the form of objects,
even if it came from a file or a database, so the OO approach shouldn't be
too different from what you are used to.

> 6., My "compute farm" is one piece of small SUN. Do I need these 
> fancy LoadSharing things like LSF and whatever the other one is 
> called?

No, not necessarily, those just improve your performance

Elia

********************************
* http://www.fugu-sg.org/~elia *
* tel:    +65 6874 1467        *
* mobile: +65 9030 7613        *
* fax:    +65 6779 1117        *
********************************