[BioPython] bioperl idl

Ewan Birney birney@sanger.ac.uk
Thu, 16 Sep 1999 18:30:40 +0100 (BST)


as promised, (though not compiled yet). But it is documented. Should
make interesting reading for "what" bioperl is.


//
// The purpose of this IDL is to define suitable Objects for
// biological sequence work from a developers standpoint. In other
// words these objects are designed to make it ease for another
// implementation (or a partial implementation) to be made.
//
// This idl matches the implicit bioperl idl perfectly. Bioperl can
// implement this idl easily and also be a client to this
// idl. Although the bioperl project heavily influenced this idl, the
// idl is not bioperl biased - a Java/Python/C implementation would be
// as good.
//
// The server implementation bias is noticable in the way that the
// objects do not have many complex methods at all - they look like
// very simple objects. In addition the difficult life-cycle issues
// about distributed systems is delibrately dodged - this idl expects
// well behaved, reliable clients which will call a release method
// before exiting.
//
// This means this is *not* a good idl at all for a client focused
// system.  A client focused system should have many more helper
// methods around more complex objects, and also cope with distributed
// memory problems A middle layer which translated this idl to a
// client focused idl would be ideal. A very sensible, client focused
// idl is the LSR-BSA submission.
//

//
// This explains alot of the design decisions below
//
// a) I have stayed away from complex lifecycle issues. All objects
// are expecting well-behaved clients, which call obj->release
// (inherieted from ReleaseableObject). Client has to be responsible!
//
// b) There are two separate Sequence objects. A lightweight Seq
// object which just represents a Sequence and a heavy weight AnnSeq
// object which provides a Sequnence with Features. This is important.
// If we forced everything through a single object with features, when
// we retrieved sequences from a database, naive implementations would have
// to load up all (and for chromosones - that is alot) of features into memory.
// By having two objects we are really asking the client to decide what it
// wants to do with the object up-front.
//
// c) There are no biological smarts. This idl is all about providing
// simple interfaces to objects. the biological smarts are expected to
// be provided in the client system, not this system. We want to make
// this as easy as possible for servers to provide.

module BioServer {
  
  // release means that no other calls on
  // this object will be made. It is just
  // a basic deconstrutor/destroy type method
  interface ReleasableObject {
    void release();
  }

  // Ranges are not found on their own (you might as well use start,end,strand tuples)
  // strand is 1 for forward, -1 for reverse, and 0 for strand agnostic (eg, a simple
  // repeat is strand agnostic). On protein sequences, 1,0 are the only sensible values
  // for strand, and they are equivalent
  interface Range {
    long start(void);
    long end(void);
    short strand(void);
  };


  // The lightweight sequence object. The controversial thing here is
  // that we do not provide subtypes of the object, but this is
  // inferred programmatically through the type method. This is is not
  // mad as it seems as there is no biological smarts to this system,
  // so sub types are simply not required (there are no revcom or
  // translate mechansims). Subtyping is perfectly sensible in the
  // client system, but here it just doesn't help us much.

  enum seqtype { DNA, RNA, Protein };
  interface Seq : ReleaseableObject {
    string seq(void); // the entire sequence in IUPAC codes
    string subseq(in start,in end); // a sub sequence in IUPAC codes
    string id(void); // human readable name
    string accession(void); // computer assigned name - see below
    string description(void); // human readable description. Perfectly ok to be null
    seqtype type(void); // type of sequence.
  };

  // database cross reference refers to direct logical links between biological
  // objects - for example, a protein sequence would be cross referenced to its
  // parent DNA sequence and its protein structure. dbxcross refs can be a many to
  // many relationship
  interface Dbxref : ReleaseableObject {
    string database(void);
    string primary_key(void);
  }

  // A specialised dbxref to a published piece of literature. The primary key here
  // (medline number) is considerably less informative than the title, author line etc
  // One issue here is that I have not provided any further parsing of the author line
  // In general I feel that is not the role of this object. You should use the primary
  // key to retrieve a richer object with many more smarts to it. 
  interface LiteratureReference : Dbxref {
    string title;
    string author_line;
    string location;
  };

  // just a list of strings.
  interface Comment : ReleaseableObject {
    sequence <string> comments;
    bool is_html;
  }


  interface Annotation : ReleaseableObject {
    sequence <LiteratureReference> litref;
    sequence <Comment> comment;
    sequence <Dbxref> dbxref; // should return only non litref Dbxref's
  }


  interface SeqFeature : Range , ReleaseableObject {
    string primary_key(void);
    string source_key(void);
    Annotation annotation(void);
    Seq seq(void);
    Seq entire_seq(void);
  }

  interface AnnSeq : ReleaseableObject {
    sequence <annotation> non_feature_annotation;
    sequence <SeqFeature> seqfeature;
    Seq seq(void);
  }


  interface SearchHit : ReleaseableObject {
    string id(void);
    string description(void);
    float raw_score(void);
    float expectation_value(void);
  };

  interface SearchResult : ReleaseableObject {
    string query_id(void);
    string library_name(void);
    long   library_size(void);
    long   library_count(void);
    sequence <SearchHit> hits(void);
  };


  interface SeqDB : ReleaseableObject {
    Seq get_Seq_by_id(in string id);
    Seq get_Seq_by_acc(in string acc);
  };

  interface SeqStream : ReleaseableObject {
    Seq next_Seq(void);
  }

  interface AnnSeqDB : SeqDB {
    AnnSeq get_AnnSeq_by_id(in string id);
    AnnSeq get_AnnSeq_by_acc(in string id);
  }


  
};
  
  


-----------------------------------------------------------------
Ewan Birney. Work: +44 (0)1223 494992. Mobile: +44 (0)7970 151230
<birney@sanger.ac.uk>
http://www.sanger.ac.uk/Users/birney/
-----------------------------------------------------------------