[BioPython] IDL semi-finalised
Ewan Birney
birney@ebi.ac.uk
Fri, 11 Feb 2000 12:34:22 +0000
With Matt (from biojava) and Kim Rutherford (from Artemis) help we have a
semi-finalised IDL. The IDL is more java friendly and provides a real
"just-the-sequence" object, called AnonymousSeq., designed so that people
who want to declare methods that *just work on sequences* without any
other information can get hold of them.
[there were alot of other interesting discussions about iterators and
databases which I wont bore you with].
I have updated the bioperl-corba-server distribution to work with this
IDL, so there is one working server people can download.
I would like the three different projects to generate a number of clients
and servers to this IDL so that we can really start throwing objects
around between them. Once this has happened then we can re-evaluate the
IDL on this basis and provide a final, frozen, IDL.
I would also be interested then in recruiting a team to work on providing
a stable, java based bridge between this IDL and the OMG-BSA IDL. I think
this would be a great project for a infrastructure company who wanted to
show that it was serious about supporting free software. (that's a heavy
hint to some people on this list).
IDL below.
// Notes on the IDL.
//
// This IDL is designed to be pretty uncontroversial and simple. I am
// using very simple CORBA features, and so no valuetypes, any's,
// const's all of which get implemented late in ORBs (if at all).
//
// This IDL is designed only for sequences and features. There is no
// provision for other stuff - we need to start with the things we can
// agree with and build on that.
//
// This IDL should work well as an internal IDL for an OMG compliant
// server. The OMG specification leaves alot of the "magic" to the
// server, including memory management etc, and also uses alot of
// "standard" OMG types and services which generally do not come
// with a standard free ORB. Hence the GNOME memory management
// model and the simple iterators.
//
// <birney@ebi.ac.uk>
//
//
// This comes directly from the GNOME bonoboo model.
// It allows memory management via ref and unref calls.
// The query_interface is not important for this case,
// but here for completeness.
//
module GNOME {
interface Unknown {
void ref();
void unref();
Object query_interface(in string repoid);
};
};
//
// These are the actual biological objects that we are interested
// in. Nearly everything is an interface. It is not going to work
// well across large internet connections, so don't use it for that.
//
// The org.Biocorba.Seqcore package is so we look good in Java.
// Makes the C interface names waaaaay too long of course
module org {
module Biocorba {
module Seqcore
// changed indentation to give us more space for the main text
{
exception RequestTooLarge { string reason;
long suggested_size; };
// means you need to request a smaller number,
// ie, request only failed due to its size.
exception OutOfRange { string reason; }; // For when start/end points are out of range.
exception EndOfStream { }; // for end of streams
exception UnableToProcess { string reason; }; // All other errors
enum SeqType { PROTEIN,DNA,RNA };
// AnonymousSeq is just the sequence informaiton and *nothing else*
// including names
interface AnonymousSeq : GNOME::Unknown {
SeqType type(); // server has to at least *guess* the type.
long length();
// the entire sequence. Use max_request_length to find the max
// size allowed
string get_seq() raises (RequestTooLarge);
// gets a sub sequence. the start,end are in biological
// coordinates, ie, 1-2 are the first two bases
string get_subseq(in long start,in long end) raises (OutOfRange,
RequestTooLarge);
// This is to find the largest string that can be passed back
long max_request_length();
};
// Primary sequences are just the sequence information and enough to
// idenity information to process the sequence/store results/etc.
interface PrimarySeq : AnonymousSeq {
// three different id's which might be the same. The first,
// display id is what to use if a human uses it. The second,
// primary_id is what the implementation decides as the correct
// unique id for this sequence. (in alot of cases this will be
// accession number). The final one is the accession number which
// is the unique id in the biological database which it is from
// (this maybe the same as the primary_id, but might not). Yes -
// we do need all three ids.
string display_id(); // id to display to humans
string primary_id(); // id to use as a unique id for this
// sequence. in some cases it could be
// byte position/file munged into a string for example
string accession_number(); // The unique id (commonly called accession number) in
// the biological database this comes from, not the particular
// instance of the database for the implementation.
long version(); // potential (unstable) version number for the sequence. 0 for
// things that don't have a version number
};
// Represents streaming through a single database, eg over a fasta file
// Don't forget to deference objects once they are done
interface PrimarySeqIterator : GNOME::Unknown {
PrimarySeq next() raises (EndOfStream,UnableToProcess);
boolean has_more(); // returns 1 when next_seq will give an object
};
// Provides a database mainly for database searching. Can make new
// streams and can retrieve sequences from the database.
interface PrimarySeqDB : GNOME::Unknown {
string database_name(); // This is to identify databases by name
short database_version(); // version of the database
PrimarySeqIterator make_PrimarySeqIterator(); // makes a new iterator object.
PrimarySeq get_PrimarySeq(in string primary_id) raises (UnableToProcess); // Retrieves one sequence
};
// We need to be able to pass back additional structured information
// in some cases. This gives us a way of doing it without specifying
// the structure at compile time. Try not to abuse this...
// This is equivalent to a hash of arrays in perl
typedef sequence <string> stringList;
struct NameValueSet {
string name;
stringList values;
};
typedef sequence <NameValueSet> NameValueSetList;
// SeqFeatures are features on a sequence. This is GFF
// compatible.
interface SeqFeature : GNOME::Unknown {
string type(); // exon, repeat etc.
string source(); // source of the SeqFeature mainly for GFF compatibility
string seq_primary_id(); // This gives the primary sequence id this is linked to.
long start(); // start in biological coordinates (1 is the first base)
long end(); // end in biological coordinates (1-2 are the first two bases in a sequence)
short strand(); // -1,0,1. -1 means reverse, 0 means either, 1 means forward. Irrelevant for proteins
NameValueSetList qualifiers(); // additional structured information
boolean PrimarySeq_is_available(); // returns 1 if it does, 0 if not.
PrimarySeq get_PrimarySeq() raises ( UnableToProcess ); // the Sequence may or may not be there.
// implementors are free to choose
};
typedef sequence <SeqFeature> SeqFeatureList;
// We have to handle large numbers of features.
interface SeqFeatureIterator : GNOME::Unknown {
SeqFeature next() raises (EndOfStream,UnableToProcess);
boolean has_more();
};
// Yes we should inheriet of SeqFeature for more complex things. Please
// inheriet off SeqFeature for your favourtie feature extension!
// This is one heavy object, This should really be a number of
// coordinating objects underneath. Notice that the Seq object
// both inheriets from the PrimarySeq interface and also has-a
// PrimarySeq interface. This is deliberate so that clients can
// indicate when they really want to discard a complete sequence
// with features by freeing but still hold on to the original
// primary sequence.
// otherwise servers will have extremely large objects for every
// sequence in feature rich databases (bad).
interface Seq : PrimarySeq {
SeqFeatureList all_features() raises (RequestTooLarge);
SeqFeatureIterator all_features_iterator();
SeqFeatureList features_region(in long start,in long end)
raises (OutOfRange,UnableToProcess,RequestTooLarge);
SeqFeatureIterator features_region_iterator(in long start,in long end)
raises (OutOfRange,UnableToProcess);
long max_feature_request();
// This is put here so that clients can ask servers just for the
// sequence and then free the large, seqfeature containing sequence.
// It prevents a sequence with features having to stay in memory for ever.
PrimarySeq get_PrimarySeq();
};
typedef sequence <string> primaryidList;
interface SeqDB : PrimarySeqDB {
Seq get_Seq(in string primary_id) raises (UnableToProcess); // Retrieves one sequence
primaryidList get_primaryidList();
};
}; // end Seqcore module
}; // end Biocorba module
}; // end org module
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230
<birney@ebi.ac.uk>
-----------------------------------------------------------------