[Bioperl-pipeline] Next set of questions.
Peter Kos
kos@rite.or.jp" <kos@rite.or.jp
Fri, 30 Aug 2002 15:57:00 +0900
Hi all,
Thank you for the advices. Now mysql is up. There were problems with
permissions, groups, the my.sock and some others.
Now comes the setup of ensembl and the specific perl modules.
The today's questions about the pipeline:
1., According to the "Ensembl 7.3 Website Installation Instruction"
the ensembl API is given an integer version number that corresponds
to the database schema that it was built for.
So, the program to be installed seems to be specific to the release
version of the human genome and Drosy and such.
I do not want any of these large datasets. I'll try to install
ensembl program without these, tell me if I am wrong and it is not
possible.
I just want to have a nicely annotated bacterial genome in a nice and
usable GUI. Of course it should contain gene predictions, BLAST
homologies, Pfam hits, clickable links to EMBL and SwissProt entries
and such, but it does not have anything to do with the complexity of
the human genome project.
2., On the other hand it would be nice to have a small database to
play with while I am setting up our own system. Is there some small
(demo) database for this purpose?
3., Of course I can not predict what information we need to have in
the database in the future. Is it possible to change the schema of
the database once it is already populated and being used?
4., You are working on implementing Genscan and Genewise. I do not
know how they would perform on a bacterial genome, but I can give it
a try as a lab-rat. Could you point at the location of the binaries
to download? Or will they work through the Internet? I have ONLY and
EXCLUSIVELY port 80 open to the World, so if it is anything else than
Http, I will not be able to use it.
(Likewise cvs, ftp, https, icq, ssh do not work among others.)
5., I would need results of bacterial gene finders, such as glimmer
which will hopefully work here if I finally succeed solving some
funny problem with it. And the result of GeneMarkS, which is provided
through the Internet, and I will have an output file through e-mail.
I can parse/convert any files to a Bioperl-readable format, if I have
to, that is not a problem. Can I then somehow inject these things
into the pipeline or right into the database?
Sorry for the lame questions, but I am confused a bit as I am working
on every possible levels of the problem, starting from the special
chemistry of the sequencing reactions, to editing, finishing,
annotating, pipeline- WebServer- database-setup, as well as designing
and coding a search for a new kind of feature.
By the way the result of this one should also go into the database.
Moreover I am confused because an OO database would have been closer
to my current knowledge, and I can not imagine how this whole thing
would work. I hope it will one day. I have read "Ensembl Tutorial"
which only shows how to get things out and not how to put them in.
6., My "compute farm" is one piece of small SUN. Do I need these
fancy LoadSharing things like LSF and whatever the other one is
called?
That's perhaps enough for today. :-)
Sorry for the too many questions.
Regards
Peter
---------------------------
Peter B. Kos,
(RITE)
E-mail: kos@rite.or.jp