[Bioperl-l] bioperl-db

Jason Stajich jason@chg.mc.duke.edu
Tue, 11 Jul 2000 15:51:10 -0400 (EDT)


On Tue, 11 Jul 2000, Fernan Aguero wrote:

> I have some newbie questions about the following:
> 
> >bioperl-db is an effort to provide sequence database access and support
> >for updateable sequences (and the annotations) outside of the core bioperl
> >read-only seq db support.
> >
> >Currently this is being implemented using the Ensembl db structure (with
> >some additional tables) on a mysql db server.  The ultimate goal for
> >ensembl-lite (in my mind at least) is a reasonable framework for
> >small/midsized laboratories to store their genomic data and access to the
> >analysis pipelines (a lite-r version than the standard ensembl pipeline). 
> >When the DAS standard is completed we would also like to make an
> >ensembl-lite a DAS server.
> >
> >AceDB support as well would be nice to allow users to access data in the
> >ensembl-lite system transparently if it is in the mysql db, and acedb
> >file/server, or a remote web db (GenBank, others ... ).  Other ideas or
> >additions will be welcome.
> >
> >I am in the mid stages of the updatableseqdb implementation, and still
> >designing the rest of the structures.  Suggestions, volunteers, support is
> >welcome of course.
> 
> 
> We are currently trying to implement some automated framework to do 
> analysis on sequences. Our approach uses PostgreSQL and some Perl 
> scripts to get sequence data in the database, do the analysis on them 
> and store results again in the DB.

> i) is this what ensembl does?

yes, no, maybe... The ensembl analysis pipeline works but is geared
towards heavy duty analysis not some simple BLASTing.  Through my
discussions with Ewan I understand that it would make more sense to have a
lite pipeline that works for smaller things.  Ewan can certainly give a
better description of what it does/does not do.  I'd obviously like to see
a wealth of sequence analysis, annotation, and prediction software as part
of the pipeline, whether or not that really feasible will depend on the
number of hands on deck.
 
> ii) what is the developing status of bioperl-db?

Would like to finish off Bio::EnsemblLite::UpdateableDB in the next few
days.  I don't want to check in unfinished code just yet...

But if there is definitely interest, I can finish up my design document
and put it up for discussion (Thinking I really want wikki for this, maybe
we'll put it on Ensembl wikki if that is okay Ewan?)

Basically, I have proposed the UpdateableSeqI for sequences that are
'changeable' in contrast to read-only databases.  The next step is running
analysis on these seqs and capturing those results.  Methods for
annotating would fall into this as well.  Some of these things are solved
by ensembl, some are not, finding that line has been hard for me, but I
think I am beginning to see the big picture...

> iii) any reasons to choose MySQL instead of PostgreSQL? I know that 
> the first is faster than the latter...any other?
> We have settled ourselves with PostgreSQL due to its ability to do 
> subqueries. Although we don't have any subqueries now that we need to 
> do, we thought that this capability could be useful. Any comments on 
> this?

MySQL was chosen because that is the db the Ensembl group chose and I want
this to match up with their work as much as possible so we can take
advantage of their code when appropriate.  There is work underway to port
the underlying db connection in Ensembl to a more generic framework so
that multiple dbs can be supported more easily. I'll be working on the
Sybase port when we agree on a object model for this, I'm sure a Postgres
port can be included as well if someone wants to tackle it.

> 
> And a proposal:
> My experience with Perl is limited, although i can usually get away 
> with what i want to do. If this description fits a volunteer, we can 
> start talking about what I can do for bioperl-db. Or maybe i can help 
> with some other task...
>

How about this.  I'll have first draft of the in progress EnsemblLite
UpdateableDB code done by Friday - I'll write up a design doc for what I
see needs to be worked on, and we can see what the interest level is for
volunteers and helpers, people can add to the document and we'll see
where it takes us. 

BTW: I just checked  the makefile, readme, and the sql code.  I will put
the EnsemblLite in as soon as 1st try at implementation is finished. 

-Jason

> 
> Fernan
> -- 
> 
> 
> 
> Lic. Fernan Aguero                                        Tel: 
> (54-11) 4752-0021
> Instituto de Investigaciones Biotecnologicas              Fax: 
> (54-11) 4752-9639
> Universidad Nacional de General San Martin
> 

Jason Stajich
Center for Human Genetics
Duke University Medical Center
jason@chg.mc.duke.edu
(919)684-1806 (office)
(919)684-2275 (fax)
http://wwwchg.mc.duke.edu/
http://galton.mc.duke.edu/~jason/