[Bioperl-l] new directions

Ewan Birney birney@ebi.ac.uk
Wed, 7 Mar 2001 17:35:13 +0000 (GMT)


On Wed, 7 Mar 2001, David Block wrote:

> On Wed, 7 Mar 2001, Jason Stajich wrote:
> 
> > Yes and no.  I think with sequences as the primary currency, writing out
> > to genbank is a reasonable solution for me.  This can get ugly with lots
> > of features, but then the thought is I should be using something
> > ensembl-ish if I am talking about tons of sequences.  
> 

[cc'ing back to bioperl just because I think it is relevant to alot of
people on the list]

> Ewan - how hard is it to factor out the persistence code from
> ensembl?  Not everyone needs the full pipeline, and you guys have
> certainly stress-tested the db code :)
> 

Ensembl is well designed for genomic sequences of a certain "type". It
just wont work with say the Fuzzies (who would!) and/or all the reference
lines in embl or swissprot.

I feel I (or Ensembl) am holding up a sensible implementation of a
"bioperl-db" focused on more lab-orientated/simple bioinformatics
solutions, due the fact that I either say "oh, we've already done that in
Ensembl" or people a worried about treading on my toes. I would encourage
people to (a) start a project like this if people did not find that
Ensembl fitted their problem set and/or they had something to start this
off with (b) made it compliant to Bioperl interfaces and (c) look at the
Ensembl code, in particular say the Adaptor scheme, to get some ideas
about how to solve persistence/database handling nicely.

(http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/DatabaseObjectStandards.html)


Of course, if you want to write a database for managing a vertebrate
genome (or significant parts of a vertebrate genome) then I think it is
insane not to use Ensembl ;). Indeed Ensembl really should be quite a good
fit for any large-ish eukaryote genome with some modifications.



Now we have good interfaces (SeqI, SeqFeatureI and GeneI) I think we can
really let rip for the implementations that "hit" these interfaces and
expect a large amount of interoperability between projects.


Word of warning: if you design a database you will design a database for
your needs (quite right to). Don't try to design a database to encompass
all of bioinformatics: it wont work ;). Ensembl is a database to handle
large fragmentory genomes and it hits bioperl interfaces. There shouldn't
be really a "bioperl-db" project but a
"bioperl-small-lab-db" project. Less catchy name of course...




ewan





-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------