[Open-bio-l] Schema for genes & features & mappings to assemblies
Ewan Birney
birney@ebi.ac.uk
Tue, 23 Apr 2002 11:39:28 +0100 (BST)
On Tue, 23 Apr 2002, Thomas Down wrote:
> On Tue, Apr 23, 2002 at 05:24:09PM +0800, Elia Stupka wrote:
> >
> > > We do need to discuss assemblies. I vote for "flat" one level assemblies
> >
> > I guess the other bit missing from biosql at the moment is gene
> > structures, to really start thinking of being able to do things only with
> > biosql.
>
> Do you really want to special-case gene structures? I thought
> that the `idea' of BioSQL was to put everything into a single
> feature table, using tag-value fields for all the non-code bits
> of data on each feature, and an ontology to hold the whole lot
> together.
>
> Remember -- we have hierarchical features. Isn't that enough
> to do gene structures? Once you start adding gene/exon/transcript
> /etc. tables, then you end up with... Ensembl!
And is that such a bad thing!
>
> > > (b) zero level (Lincoln likes this). The schema stores contigs as
> > > "features" on DNA Sequences which are chromosome length.
> >
> > But with zero level reverse-engineering is hard, if you want to, for
> > example, do a local update, right?
> >
> > I think zero level is suitable for what comes later, data mining, which is
> > what we are planning to do for our multi-genome data-mining
> > pipeline. Because by that stage you really cannot care less why the
> > coordinates are what they are, you just want to use them (a la
> > ensembl-lite)
>
> One thing to remeber with zero-level type arrangements: you're
> potentially going to want to store whole chromosome sequences.
> A lot of databases will not be happy about this, especially if you
> then want to go back and efficiently pull out a small region
> from the middle of chromosome 1.
>
> One solution would be to have a new sequence-storage type in BioSQL
> (an alternative to the existing biosequence table), which stores
> the sequence in "shredded" (small chunks) form. This is different
> from assemblies, in that the use of shredded sequence behind
> the scenes should be completely hidden from the user. I remember
> talking to someone (Lincoln, I think) about this at Cape Town.
>
>
> Thomas.
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------