[Bioperl-l] Bioperl + Oracle 8i Database

Ewan Birney birney at ebi.ac.uk
Tue Mar 11 08:29:46 EST 2003



On Tue, 11 Mar 2003, Elia Stupka wrote:

> > i am just reading oracle magazine ("Cracking the Code of Life") and it
> > said, "Unfortunately, the IT infrastructure that many research
> > facilities depend on today is little more than a collection of flat
> > files strung together with perl scripts and propieatary algorithms."
> > Because of this, i'm considering to use Oracle for storing Data.
>
> In that same article it talks about the large databases and services
> provided by the EBI and the Sanger Centre in Cambridge, UK. You'd be
> interested to know that the largest resource provided there is Ensembl,
> a large genome annotation service, which actually relies on MySQL
> rather than Oracle, and builds upon some of the code from BioPerl.


Elia - it is sadly not true that Ensembl is the largest resource provided
here, which is definitely EMBL Nucleotide sequence DB (the GenBank partner
in Europe), which is internally stored as Oracle but dumped for legacy
reasons as flat files. I don't think Oracle table dumps are provided
because there is not that much demand and there is a large amount of
private/uninteresting data to cope with things like patent submission
before granting of patents etc etc...


For people interested in a real tour-de-force of relational modelling,
check out the MSD project at EBI, which deals with protein structure data
and provides a submission route into the global PDB archive (along with
their US partners in RCSB). They have protein structure in an Oracle
database with an impressive use of reference tables to ensure that every
amino acid has the right number of atoms with the right connections etc
etc.


(if you ask MSD nicely I believe you might be able to get oracle table
dumps for this beast. Lots of fun).






More information about the Bioperl-l mailing list