[Bioperl-l] Re: Bio::EnsemblLite::UpdateableDB

Ewan Birney birney@ebi.ac.uk
Mon, 17 Jul 2000 09:30:41 +0000 (GMT)


> 
> I guess this is the most correct assesment.  Obviously I'd like to see
> more overlap because that means less duplicate coding, but also means we
> have to break down the goals better. I didn't really want to fork from
> Ensembl, but it seems to be addressing data from a different perspective.
> Maybe we should talk about goals of EnsemblLite again.
> 

Jason -

I have suddenly realised that you might have rejected some of Ensembl
because it seems to be "contig" focused whereas most things are "just a
sequence" focused. I see single sequence things (eg, genbank) as 
"clones which have one contig". This allows you to view both unfinished
and finished/single sequence/standard genbank/ sequences in the same
schema.


I suspect that we will trash out how these two projects work together when
we meet up at BOSC. I suspect this is going to need a group of us in front
of a whiteboard ;)

> I am most interested in better integrating the 'public domain'
> genome data with laboratory produced experimental data (ie 'OUR' sequences
> for BAC123X12 ).  In the best of all possible worlds - would like to be
> able to: 
> (ewan and I have had this discussion before, but I would like to throw it
> out there and see what the opinions are)
> 
> - build a virtual contig (from 100 kb to a couple of MB ) between marker
>   D2SXX and D2SXXX that consisted of data in public domain and
>   experimentally produced in-house.  
> - Annotations and features included and updated automagically from public
>   sources.
> - Analyze this X MB of sequence, finding and identifying known and
>   predicted genes (this is ensembl like stuff), match them up with
>   observed and reported data, find homologies, essentially try and know
>   what this sequence does because we think it might be involved in disease
>   Y.  
> 
> This is really hard to do right now, but is also really what I think
> researchers want to do.  Computers should make this easy, instead of
> clicking away at multiple genome web sites we should be able to put
> together the known information and sprinkle in our own data.  Maybe this
> is what commerical services provide and I am just not in the know... =)
>  
> > 
> > BTW - Jason - have you handled the "how to store a SeqFeature::Generic"
> > type problem in the SQL?
> 
> check out the schema sql/ensembl-lite-mysql-addon.sql  (I'll have a pretty
> graphic on ensembl wikki by next week )
> 
> dna_description - describe the sequence, accession number (didn't build in
> 	           multiple accession numbers right now)
> generic_feature - a generic feature for a sequence, 
> 		  (name,strand, source, start & end positions)
> feature_detail  - tag,value pairs that exist for a feature
> feature_detail_association - associate details with generic features.
> 

sounds very sane. I would like to reuse this over in Ensembl sometime.