[BioPython] seq objects etc...

Ewan Birney birney@sanger.ac.uk
Sat, 11 Sep 1999 16:39:32 +0100 (BST)

Hi guys - popping over from bioperl ;).

It is great to see biopython alive and I am really looking forward
to this being the start of a really long lasting project...

I'd really like to influence you to look at the bioperl object model
and - if you like it - try to make biopython compatible with the
bioperl system.

Bioperl is moving towards an interface/implementation split, meaning that 
we have effecitve IDL (I am in the process of writing IDL) for all
our objects. There are some key design decisions in the bioperl set
which I'd like to talk you through and see if you like ;).

a) lightweight sequence objects / heavy annotated sequence objects.

unlike the lsr-bsa, at bioperl we have split sequence objects into two

	Bio::Seq - lightweight object.
	Bio::AnnSeq - heavy object with refrences and sequence features.

AnnSeq has-a Seq object.

This scheme has some real pluses:

	a) the seq object can have manipulation functions (eg, truncation)
which avoid the difficult quesiton of waht happens to coordinate systems
of seqfeatures and seqfeatures on manipulations

	b) methods which declare to take Seq objects are known to work
with a minimum information. If the Seq object was 'heavy' like in the LSR
submission (ie, has annotations), it is unclear what informaiton a method
actually requires

	c) The resources (in particular memory) are better handles, as
people are encouraged to use the Seq object whenever possible.

	d) SeqFeatures can link to Seq and yet not have a nasty ciricular
reference as AnnSeq has seqfeatures as well as a sequence (thsi is why
AnnSeq -> Seq is a composition model, not inheritence).

Having done this a number of different ways (and been burnt!) this is
my prefered way.

b) unlike LSR again, bioperl is going for seqfeatures being a
object-relationship model that connects Sequences and Annotations.
SeqFeatures have a Seq and an Annotation object. AnnSeq has a
set of SeqFeatures, a set of Annotations not linked to features and
a sequence.

c) SeqFormt IO code is abstracted out into Object streams, like

	seq_obj = input_stream_object->next_seq();

I hope this gives you some things to think over. I guess I should get that
IDL stuff written for you.

Ewan Birney. Work: +44 (0)1223 494992. Mobile: +44 (0)7970 151230