[Biocorba-l] Part III - SeqFeature / CompositeSeqFeature

Alan Robinson alan@ebi.ac.uk
Mon, 12 Feb 2001 18:29:26 +0000 (GMT Standard Time)


3) Currently, the 'Seq' interface has a method:

'all_SeqFeatures(in boolean sub_seqfeatures)'.

IMO, the problem with this is that if we request all SeqFeatures of a Seq,
including the sub_SeqFeatures, then if we request the sub_SeqFeatures of
one of these SeqFeature objects, we end up getting some of the features
twice - Either as duplicates or equivalents! Plus we don't know what
SeqFeature may have sub-SeqFeatures until we've called this method. (At
least that's my interpretation). 

This sounds like more than enough rope for someone to hang themselves
with...


Solution 1: Remove the 'in boolean sub_seqfeatures' from all methods. It
is the responsibility of the client to descend the structure and return
all sub-SeqFeatures (It is a little more work for the client, but removes
the potential for object duplication/equivalence and confusion).

Solution 2: A sequence feature may be either singular, or composed of
other sequence features. This sounds like SeqFeature should be modelled as
a composite to me.

................

Personally, I would prefer to remove the 'boolean sub_seqfeatures'
attribute and model SeqFeatures and sub_SeqFeatures using a composite
model:

  interface Seq {
    // ...
    SeqFeatureVector get_SeqFeatures();
  }

The SeqFeatures returned are the top level ones only - It is necessary to
descend those SeqFeatures that have sub-SeqFeatures in a recursive manner.
Yes, this is more work, but it also means the client is less likely to get
themselves in a mess with duplicate objects.

If a feature may have sub-features - Then this should be modelled as a
composite:

  interface SeqFeature {
    // All the normal methods, bar the current 'sub_SeqFeatures()' method.
  }
 
  interface CompositeSeqFeature : SeqFeature {
    SeqFeatureVector sub_SeqFeatures();
  }


Thus for a sequence with a 'gene' feature made up of 'exons' and
'introns', the call:

  my $seqFeatureVector = $seq -> get_SeqFeatures();

will return a SeqFeatureVector containing a 'gene' feature which is
actually a CompositeSeqFeature object (since it has sub-SeqFeatures of
exon and intron SeqFeatures).

So, as a composite, the 'sub_SeqFeatures()' method is available on this
'gene' CompositeSeqFeature and will return the 'exons' and 'introns' as
SeqFeature objects in a SeqFeatureVector object.


N.B. A side-effect of having the SeqFeatureComposite object, is that it
would be possible to have a parameter to specify if the order of the
sub-SeqFeatures returned in the vector is significant, or not. (I cannot
decide if this appropriate currently).


--
============================================================
Alan J. Robinson, D.Phil.             Tel:+44-(0)1223 494444
European Bioinformatics Institute     Fax:+44-(0)1223 494468
EMBL Outstation - Hinxton             Email:  alan@ebi.ac.uk
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, UK                http://industry.ebi.ac.uk/~alan/
============================================================