[Biocorba-l] Biocorba IDL - Clarifications

Ewan Birney birney@ebi.ac.uk
Fri, 20 Oct 2000 18:00:22 +0100 (BST)


On Fri, 20 Oct 2000, Alan Robinson wrote:

> Hi all,
> 
> Could someone clarify the correct behaviour for the following situations
> for me, please?
> 
> I have an EMBL sequence with the following feature and location:
> 
>   FT mRNA join(397..627,1194..1339,1596..1682,2294..2473,2608..3327)
> 
> What should be the SeqFeaure returned for this?
> 
> Currently, I simply generate and return:
> 
> 	type = mRNA
> 	source = <none>
> 	seq_primary_id = X02158
> *	start = 397
> *	end = 3327
> 	strand = +1
> 	qualifiers = <none>
> 	PrimarySeq_is_available = true
> 	
> It's the start and end values I'm concerned with. The current SeqFeature
> is rather minimal (not necessarily a bad thing!). But, I cannot see any
> way to specify the multiple locations of this feature cleanly in
> SeqFeature.
> 

Right. We punted on this one previously. But we have to solve it.


(this is one of the 'unsolved design problems in bioinformatics in my
view')


options:

(a) have an extended "range" object that is a composite of simple range.
Ie...

   interface Range {
      int start;
      int end;
      int strand;
   };

   interface CompositeRange : Range {
      sequence<Range> sub_Range();
   };

Sequence Feature have-a Range that could either be CompositeRange or a
Range

   -- beneifts - close to original embl/genbank ft model
	       - close to biojava

   -- drawbacks - really mini-objects here - could this be a struct?
                - not close to bioperl
                - annoys ewan because it allows people to overload
semantics into features without making different objects, eg, like the
way the CDS line "means" a gene in EMBL Feature table and you have to
combine it with a mRNA line to get the whole structure with UTRs

(b). Same as above, but reverse the containment rules. I like this because
with my experience in Ensembl, you want to have many ranges pointing to 
the same sequence feature.


ie, Ranges have-a sequence feature object, and sequences return a list of
ranges.

    -- beneifts - above:

    -- drawbacks - above (A) drawbacks, plus it makes the feature objects 
pathetically small in some cases, with just "primary_tag" and "source_tag"


(c) let sequence features have sub sequence features

    -- benefits - close to bioperl
                  less objects
    -- drawbacks - still open to abuse as in (a) and also question about
                   how you map EMBL feature table lines to 
    
(d) Let people extend sequence feature with sub sequence feature or more
complex locations, ie:


We would put in biocorba.idl a number of "common extensions", with complex
locations being one of them...


    ComplexLocationSeqFeature : SeqFeature {
        // could discuss how we extend this 
        sequence<SeqFeature> subSequenceFeatures;
        string embl_location_line(); // for anal parsing

    -- benefits - using inheritance properely to solve this problem.
Novel route in bioinformatics surely!

                - also can extend for fuzziness

    -- drawbacks - none?


I'm going for (d) in this - but the floor is open for other opinions here.
  
Whaddya reckon?



> If I were to define each location separately, I cannot see how to describe
> their implicit relationship under the current IDL without some potentially
> hairy naming convention or qualifiers that stresses the life of the client
> writer.
> 
> Alternatively, one could return the location string via a new method, or 
> in the qualifier.
> 
> 
> Secondly, just to check - the 'source' variable of SeqFeature - Is this
> just containing the value of an /evidence qualifier?
> 

could be /evidence or could be "emblbank" or similar.

I would go for "emblbank". source is of more use for program output.

> 
> The sooner I get clarification on these - the sooner I can finish the
> server! ;)
> 
> 
> cheers,
> al
> 
> --
> ============================================================
> Alan J. Robinson, D.Phil.             Tel:+44-(0)1223 494444
> European Bioinformatics Institute     Fax:+44-(0)1223 494468
> EMBL Outstation - Hinxton             Email:  alan@ebi.ac.uk
> Wellcome Trust Genome Campus
> Hinxton, Cambridge
> CB10 1SD, UK                http://industry.ebi.ac.uk/~alan/
> ============================================================
> 
> 
> 
> 
> _______________________________________________
> Biocorba-l mailing list
> Biocorba-l@biocorba.org
> http://biocorba.org/mailman/listinfo/biocorba-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------