[Biojava-dev] Feature interface change
Matthew Pocock
matthew_pocock@yahoo.co.uk
Thu, 22 Aug 2002 10:42:10 +0100
Hi.
This is part of an ongoing discussion Thomas and I have had. These
changes to features are not slated for any possible 1.3 release. They
may perhaps form part of BioJava 2.
The idea:
Entities like genes or repeat types are realy first class objects. You
could have all sorts of information attached to a Gene - phenotypes,
diseases, all that biological stuff. You can have hierachies or
ontologies of these terms. They realy exist independantly of any
'material view' on a bit of DNA sequence.
Features on sequences live within some sequence/feature space as
modelled variously by bio{perl,python,sql,java,corba} - some sort of
coordinate. In this world, a single gene may be represented by multiple
features, one for each coordinate system it is found in - chromosome,
clone, embl file etc..
It would be nice if we could model the semanticaly rich descriptive
object that is shared as a single entity, bound into multiple sequence
contexts.
The current model:
Sequence isa FeatureHolder
FeatureHolders isa Set<Feature>
Feature isa FeatureHolder
hasa Location
So, features are located via a Sequene,Location pair.
The new scheim would be something like:
Sequence isa FeatureHolder
FeatureHolders isa Set<Feature>
Feature isa FeatureHolder
hasa Location
hasa FeatureCard
FeatureCard hasa Set<Feature>
In this case, the gene, exon, repeat object is the FeatureCard. All the
info specific to that type of biological feature goes into the
FeatureCard. The Feature object is all info about how it is anchored to
a specific region of the genome. So, where as now we have methods like:
getTranslation() on Feature, these would move to the FeatureCard. The
getStrand() method would stay on the Feature object as that is specific
to where it is bound into a bit of sequence.
This way, when feature information is projected into different
coordinate systems (via assemblies or DAS or whatever), the exact same
FeatureCard instance can be returned, and when you parse an Embl record
or look up what's on a micro-array spot, the same FeatureCard instance
could be reused. The names are bad, but that is easily improved.
Any thoughts anyone?
Matthew
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com