[Biocorba-l] Annotations

Juha Muilu muilu@ebi.ac.uk
Mon, 04 Jun 2001 21:37:34 +0100


Ewan Birney wrote:
> 
> On Mon, 4 Jun 2001, Juha Muilu wrote:
> 
> >
> > How people feel about the separation of Annotation holder from the
> > Sequence (Annotatable) ?
> >
> > + It can be useful if we later need new get/set methods for the
> > annotations and sequence features. The Sequence "line" may become over
> > exploited if we start to extend it because of new annotation methods.
> >
> > - It is another indirection more.
> >
> > Do we need composite annotations? For those we can have new annotation
> > interface which inherits from the Annotation and AnnotationHolder.
> >
> > By quickly looking the GO annotations, for example, can be expressed
> > using the composite annotations. Does this work also in practice? In the
> > bioPerl mailing list were recently lot of discussion about the GO stuff.
> > Did you reached the consensus?
> 
> This is something I would like to take on at BOSC. My feeling:
> 
>    - Annotation describes the association of comments, literature,
> other database references and indeed anything else dreamt up by a
> curator associated with "something" definite (often Genes, often
> Sequences).
> 
>      SeqFeatures *are not* by default Annotations. SeqFeatures need to be
> as light as possible as we generate, store, make etc millions of them.

In the simplest case the Annotation is just a name value pair. In the BSA the
SeqFeature (SeqAnnotation there) is specialized annotation, which has the
necessary location info. In theory inheritance from the Annotation do not have
to make the SeqFeature any fatter. Name value info is needed there as well...
OK, there are cases where the value can be obsolete because name (type of
feature) and location are all what is need :-)

I found the BSA solution rather elegant because it allows to handle SeqFeaures
and Annotations equally. It is important (IMHO) because they can have same
semantics. For example one may be interested to know that sequence is annotated
to have exon, but he is not interested where it is on sequence.

OK not very strong opinion on this. 


> 
>    - we need one level of indirection - but this is possibly already done
> by the annotation object. I think composition rather than multiple
> inheritance is fine. ie
> 
>     seq has-a annotation object which is a rather generic holder of
> annotations. I think annotation holder becomes equivalent to annotation.
> 
>   - Annotation objects should be very run-time query-able, something like
> 
>     @objects = $annotation->get_Annotation('Disease');

That is what we have in the AnnotationHolder. I will comment this more on a
separate mail

> 
>     - this is the sort of future extensibility which was kicked around on
> Bioperl. Problems:
> 
>          (a) do we constrain objects at all? Or do we go more like
> 
>     @objects = $annotation->get_Annotation_type('Disease','string');

That is perhaps one option. 

> 
>     to allow clients to request types here.
> 
>         (b) Just simple "type" queries. Or something richer? (NB - this is
> not a seqfeature problem which a separate querying task...)
> 
>         (c) naming. get_Annotation on an Annotation object. Sounds v. bad.

These names are used so differently in different projects. get_Annotation method
is on AnnotationHolder and Annotation is just that "atomic" name value pair.  I
will come back this later.

> 
> 
>     Basically we are in a sticky area here. There are probably a number of
> basic design patterns around this. Any suggestions?
> 
>   - We need an explicit extension of SeqFeature which has-a annotation.
> 
>    (or mix-in of SeqFeature and Annotation? Hmmmm)
> 
>      - most sequence features (>95%) *do not* have annotations in real
> life (believe me, i know) but certain ones have heavy annotation (eg
> Genes).
> 

Can we use some separate object (e.g. Gene) which is own by the feature (or
actually stored into the value of the feature) to hold the annotations? 

I was thinking that we can use the Identifier for this purpose, in cases where
we do not have (yet) proper specialized sub-class for the "Feature value"

Good (or bad) point is that we do not have to make specialized sequence features
just to associate some known entities on a sequence location...

> 
> 
> Lots of ideas to kick around. We don't do this well in Bioperl, Ensembl,
> Biopython or Biojava in my view (Brad/Jason/Matt/Thomas - thoughts?).
> 
> e.
> 
> >
> > --
> >  +--------------------------------------------------------------------+
> >  |Juha Muilu, Ph.D., EMBL Outstation| Email:  muilu@ebi.ac.uk         |
> >  |European Bioinformatics Institute | Phone:  +44 (0)1223 494 624     |
> >  |Wellcome Trust Genome Campus      | Fax:    +44 (0)1223 494 468     |
> >  |Hinxton, Cambridge CB10 1SD, UK   | http://industry.ebi.ac.uk/~muilu|
> >  +--------------------------------------------------------------------+
> > _______________________________________________
> > Biocorba-l mailing list
> > Biocorba-l@biocorba.org
> > http://www.biocorba.org/mailman/listinfo/biocorba-l
> >
> 
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>.
> -----------------------------------------------------------------
> 
> _______________________________________________
> Biocorba-l mailing list
> Biocorba-l@biocorba.org
> http://www.biocorba.org/mailman/listinfo/biocorba-l

-- 
 +--------------------------------------------------------------------+
 |Juha Muilu, Ph.D., EMBL Outstation| Email:  muilu@ebi.ac.uk         |
 |European Bioinformatics Institute | Phone:  +44 (0)1223 494 624     |
 |Wellcome Trust Genome Campus      | Fax:    +44 (0)1223 494 468     |
 |Hinxton, Cambridge CB10 1SD, UK   | http://industry.ebi.ac.uk/~muilu|
 +--------------------------------------------------------------------+