[Dynamite] Is this working now then?

Ian Holmes ihh@fruitfly.org
Sun, 5 Mar 2000 09:23:02 -0800 (PST)


On Sun, 5 Mar 2000, Ewan Birney wrote:

> 
> 
> On Sat, 4 Mar 2000, Ian Holmes wrote:
> 
> > > > > 	b) sequence object as a separate module (yeah!) but then 
> > > > > we need methods always to access it, meaning possibly internal
> > > > > sequence objects in some modules (yuk)
> > 
> > Come to think of it..
> > 
> > we don't even need methods if it's just a data structure - right?
> 
> Well - there in is the rub. 
> 
> I think we are heading towards "doing something different" (option c)
> in my last mail. I am happy to go down this route if people think this
> is the best way.
> 
> The drawback of declaring things as datastructures is that it enforces
> implementations which potentially can delay aspects of the retrieval
> of sequence objects to forcing to build all the data members up-front.
> This can be a real pain if your "sequence object" is in fact a wrapper
> of a gene prediction which span virtual contigs in a large database. To
> get the sequence, some pretty heavy DB interaction has to go on.
> 
> There are alot of cases where people might pick up the sequence object
> just to get its name and/or part of the sequence, and we want
> to be able not to require everything to be there straight away.
> 
> An alternative is the following design pattern (speaking IDL) :
> 
> module Sequence {
> 
> 	// structs map to sized data structures in the
> 	// object.h 	
> 	struct Seq_str {
> 		string seq;
> 		string display_id;
> 		string accession_number;
> 		string primary_id; // could be called internal_id
> 	};
> 
> 	interface Seq {
> 		attribute string seq;
> 		attribute string display_id;
> 		attribute string accession_number;
> 		attribute string primary_id;
> 
> 		// so we can do smart things about getting sub-strings		
> 		string get_subseq(in long start,in long end);
> 
> 		// get everything in one go
> 		Seq_str get_str();
> 	};
> 
> }
> 
> 
> Interestingly, this is the CORBA design pattern for this, which is
> meant to be replaced by object-by-value which does this "for free"
> supposedly (except that the people in the know think the specification
> sucks...).
> 
> Finally from my knowledge of interface writting now at Ensembl and
> Bioperl, we can do something where the interface definition actually
> is very clean but does not sacrifice the ability to get the data
> structure out. ie, in IDL terms
> 
> module Sequence {
> 
> 	// structs map to sized data structures in the
> 	// object.h 	
> 	struct Seq_str {
> 		string seq;
> 		string display_id;
> 		string accession_number;
> 		string primary_id; // could be called internal_id
> 	};
> 
> 	// Foreign_Seq is written by someone wishing to 
> 	// provide a sequence.
> 	interface Foreign_Seq {
> 		attribute string seq;
> 		attribute string display_id;
> 		attribute string accession_number;
> 		attribute string primary_id;
> 
> 		// so we can do smart things about getting sub-strings		
> 		string get_subseq(in long start,in long end);
> 	};
> 
> 	// External_Seq is provided by a module written by ourselves.
> 	// it cache's the data members where appropiate and builds
> 	// a Seq_str 
> 
> 	interface External_Seq : Foreign_Seq {
> 		Seq_str get_Seq_str();
> 	};
> 
> 	// Factory method guranteed by the module
> 
> 	interface External_Seq_Factory {
> 		External_Seq from_Foreign_Seq(in Foreign_Seq seq);
> 	};
> }
> 
> This all seems very heavy handed, but the point is we either have to
> 
> 	a) discard any idea of sequences being more than a completely
> exposed datastructure, with no possibilities of placing smarts behind it.
> (I think this is **bad**. We want to put these around DB handles
> sometime).
> 
> 	b) accept that sequences are proper first class objects
> that have to be accessed only through methods.
> 
> 	c) use one of the design patterns outlined above, or something
> similar to provide both views validly.
> 
> 
> I was suggesting going for b) until we know this does not work, but I
> am happy to go for c). a) is my least favoured choice.
> 

I think this is fine in principle. I believe your Sequence::Seq_str is
called a "Memo" pattern by Gamma et al.

Ian

> 
> There is no clean solution to this. :(.
> 
> 
> > 
> > At the risk of treading old ground here..
> > 
> > I vote for a lightweight sequence data structure containing two strings:
> > name and sequence data. This accession number stuff has nothing to do with
> > dynamic programming really. Besides -- having three different kinds of ID
> > with apparently nothing to distinguish them is somewhat idiosyncratic.
> > 
> > Ian
> > 
> > 
> > _______________________________________________
> > Dynamite mailing list  -  Dynamite@bioperl.org
> > http://www.bioperl.org/mailman/listinfo/dynamite
> > 
> 
>