[Biocorba-l] Part IV - SeqFeatureLocation

Alan Robinson alan@ebi.ac.uk
Mon, 12 Feb 2001 18:29:50 +0000 (GMT Standard Time)


4) IMO, I think we need to re-visit the SeqFeatureLocation design as it's
   currently limited to 'join' operations only in BioCorba-0.2.0.


A location usually looks like: 

  join(1..100, one-of(120..200, 156..200))

This may be represented as a location tree:

            join
           /    \
          /      \
      (1..100) one-of
               /    \
              /      \
       (120..200)  (156..200)

This tree has (a) locations (e.g. 1..100) and (b) operations that work on
locations, e.g. 'one-of' and 'join'.

We need a model that can deal with this.


Solution 1: Stay as is since we're only interested in join operations
(this is quite adequate for biocorba-0.2).


Solution 2: Use a composite design with a SeqFeatureLocation that defines
a single location, and a CompositeSeqFeatureLocation that: 

  - Extends SeqFeatureLocation

  - Provides access to a number of SeqFeatureLocation objects of which 
    it is composed (possibly including further CompositeSeqFeatureLocation
    objects) that are available as sub-locations.

  - defines how these sub-locations are combined (e.g. join, order, etc).

Problem: Does a CompositeSeqFeatureLocation object have a valid location
of its own? i.e. does CompositeSeqFeatureLocation really extend from
SeqFeatureLocation? [c.f. one-of(120..200, 156..200) as a
CompositeSeqFeatureLocation - what would be a valid location for this?]


Solution 3: Introduce a SeqFeatureLocationOperator object alongside the
current SeqFeatureLocation (No use of composite and no inheritance between
them) that just allows access to SeqFeatureLocations and defines how these
locations are combined. A request for the locations of a SeqFeature will
return either a SeqFeatureLocation or a SeqFeatureLocationOperator (that
contains SeqFeatureLocations and defines how these are to be combined).

N.B. Since we don't know what type of an object may be returned at run
time, we need to use the CORBA 'union' type in the IDL (which isn't a big 
deal).

N.B. The 'location' returned for a SeqFeature refers to that 'type' of
SeqFeature only, i.e. the location shouldn't include locations specific to
sub-SeqFeatures (e.g. the location of a 'gene' SeqFeature may be modelled
as the start of the initial 'exon' and the end of the final 'exon', but it
probably shouldn't be defined in terms of the locations of the other
constituent exons and introns - However, a 'mRNA' SeqFeature may have its
location specified using the locations of all the constituent exons).

..............

Going for solution 3 - The location() method of SeqFeature will use a
CORBA 'union' to return either a SeqFeatureLocation or a
SeqFeatureLocationOperator (which itself contains SeqFeatureLocation
objects and a description of how these are to be combined).

[N.B. Some of the object names are rather long in the IDL. This is because
I like names to be descriptive, and not esoteric abbreviations or TLA's
('Three Letter Acronyms')].


  interface SeqFeature
  {
    // ... see BioCorba-0.2 ...

    // This method will return either a SeqFeatureLocation or a
    // SeqFeatureLocationOperator depending upon the structure of the
    // location. It provides the root of the 'SeqFeatureLocation tree'.
    SeqFeatureLocationUnion location();
  };

  // An interface that defines the type codes for the permissible
  // types of object that the SeqFeatureLocationUnion may return.
  interface SeqFeatureLocationUnionTypeCodes
  {
    const short LOCATION = 0;
    const short OPERATOR = 1;
  };

  // The CORBA 'union' as used by the 'SeqFeature' and
  // 'SeqFeatureLocationOperator' interfaces which will return either a
  // SeqFeatureLocation or a SeqFeatureLocationOperator.
  typedef short SeqFeatureLocationUnionTypeCode;
  union SeqFeatureLocationUnion switch (SeqFeatureLocationUnionTypeCode)
  {
    case SeqFeatureLocationUnionTypeCodes::LOCATION:
      SeqFeatureLocation location;
    case SeqFeatureLocationUnionTypeCodes::OPERATOR:
      SeqFeatureLocationOperator operator;
  };


  // The SeqFeatureLocationOperator object specifies how
  // SeqFeatureLocation objects it contains are to be combined.
  interface SeqFeatureLocationOperator : GNOME::Unknown
  {
    typedef short SeqFeatureLocationOperatorTypeCode;
    typedef sequence<SeqFeatureLocationUnion> 
      SeqFeatureLocationUnionList;

    // Return the type code that specifies how the
    // SeqFeatureLocation objects are to be combined, e.g. 'JOIN'
    // or 'ORDER'.
    SeqFeatureLocationOperatorTypeCode operator();

    // Return the SeqFeatureLocation objects upon which the
    // operator is to act. If the location cannot be modelled by
    // this IDL then an exception is thrown. The Exception
    // description string may return the actual location as a
    // string for parsing by a client.
    SeqFeatureLocationUnionList locations()
      raises (UnableToProcess);
  };


  // An interface that specifies the type codes for permissible
  // location operator that may be performed on SeqFeatureLocation
  // object(s).
  interface SeqFeatureLocationOperatorTypeCodes
  {
    const short JOIN = 0;
    const short ORDER = 1;
    const short ONE_OF = 2;
    const short COMPLEMENT = 3;
  };



--
============================================================
Alan J. Robinson, D.Phil.             Tel:+44-(0)1223 494444
European Bioinformatics Institute     Fax:+44-(0)1223 494468
EMBL Outstation - Hinxton             Email:  alan@ebi.ac.uk
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, UK                http://industry.ebi.ac.uk/~alan/
============================================================