Sat, 18 Sep 1999 12:46:51 +0100 (BST)
On Fri, 17 Sep 1999, Bradley Marshall wrote:
> > If I could suggest that you use
> > the bioperl idl as more
> > of a guide than LSR.
> I like the bioperl IDL better anyway.
> A couple questions though:
> 1) Multiplicities. My assumptions are an Annotation can have multiple
> LitRefs, dbxrefs and comments; an AnnSeq can have multiple annotations
> and seq features. Can an AnnSeq have multiple sequences? (ie Several
> exons of a gene)
I decided not. As SeqFeatures have-a sequence, you can have a series of
exon objects (derived from SeqFeatures) which have different seq objects,
but they should not be stored in the same AnnSeq.
The problem comes down to coordinate systems. If you want to be able to
if( seq_feature_a.overlaps(seq_feature_b) )
then they have to have a common coordinate system. They are a number of
ways of skinning this particular cat, but by far the easiest is that they
should all lie on the same sequence, even if that sequence is a "virtual"
sequence built from other sequences. Most of the other ways of skinning
the cat provide the same end result (ie, that a virtual sequence is
This does cause problems for locations in EMBL/GenBank which are spread
over multiple sequences. At the moment bioperl is barfing at this. :(.
I have to get a good strategy in place for this by the end of the year...
> 2) What is the purpose of Seq's SubSeq attribute?
For long sequences, when an application only wants a part of the sequence,
asking for the entire sequence and then truncating it on the client side
is asking for trouble. Basically, imagine the entire chromosone one
sequence (100 MBases). This will be stored as a virtual sequence of say
3,000 odd clones in a database. When you get this object, you are unaware
that you have actually connected to a database (to you - it is a sequence
object). However, there are going to be tears if you want the sequence for
one exon, and the only way to get this is to retrieve all 100Mbases and
then truncate on your side.
The subseq provides better "smarts" to occur between client and server
of the objects. In fact, it is the only "smarts" orientated part of the
> 3) What are the Primary_Key and Source_Key of SEqFeature for?
primary_key is like the key in embl/genbank feature table. This is heavily
influences by Ian Korf's excellent seqfeature object from Bio::GSC - it
is in addition compatible with the GFF system that alot of people are now
using (see http://www.sanger.ac.uk/Software/GFF/).
> 4) Are your objects mutable?
The idl is an immutable idl (mutable objects in a distributed system
spells serious disaster). Of course, your objects can be mutable (bioperl
> 5) If using java, I don't think you'd need a releasable object because
> all objects have that built in. (In addition, Java doesn't support
> mutiple inheritance) Does anyone know if it's necessary for python?
Right - most languages provide for garbage collection. But IDL/CORBA
doesn't and in many cases can't because different resource management
policies might be in place.
I am explicitly saying in this idl what resource management policy that
I expect this objects to have. A **good thing** in my view.
> > One thing to think about is the BioPython sequence
> > object both *wrapping*
> > a IDL BioSource::Seq object and serving/supplying a
> > BioSource::Seq object
> > (ie, being both client and server).
> > I need to knock up a little example object (probably
> > in C) which you can
> > then play with via Fnorb. Sounds good?
> Well I don't know C, and I've I've never heard of fnorb, but I guess
> now's as good a time as any to learn.
You don't need to know (expect ./configure, make) and fnorb == a pure
> Do You Yahoo!?
> Bid and sell for free at http://auctions.yahoo.com
> BioPython mailing list - BioPython@biopython.org
Ewan Birney. Work: +44 (0)1223 494992. Mobile: +44 (0)7970 151230