[Biojava-dev] Feature interface change
Thomas Down
td2@sanger.ac.uk
Thu, 22 Aug 2002 17:12:28 +0100
Hi...
Here are a couple more thoughts on the FeatureCard/FeatureMapping
interfaces...
I'm wondering if it's worth allowing one FeatureCard to be a
specialization of another. This could be applied as follows in
the case of repeat features:
- One FeatureCard for all intergral repeats
- One FeatureCard for all `Alu' repeats.
- One FeatureCard for `AluJo' repeats.
- (Optionally) a FeatureCard for one specific copy of AluJo
for which an annotator has noted some interesting feature.
I guess having a hierarchy like this implies that there should
probably be a `blank' FeatureCard which is the root of the
specialization hierarchy.
In this scheme, it may not be practical to have the double-binding
between FeatureCards and FeatureMappings. If a FeatureMapping
kept a reference to its most specific FeatureCard, to go from
card to mappings, you could do something like:
Sequence seq = ...
Set<FeatureMapping> = seq.filter(new FeatureFilter.ByCard(aluJo));
(Hopefully you could also apply that query to a SequenceDB or other
container, like you can with FeatureFilters in the current trunk
of BioJava 1.x).
The attraction of this scheme is that is removes the concept
of feature `types' as opaque strings, and allows you to do
more meaningful things. I guess that, at least for some uses,
we'd probably still want to keep stringy type properties in the
system for the benefit of Genbank/GFF/whatever dumpers, but
this could potentially be relegated to being a normal tag-value
type property stored in with the rest of the data.
This proposal generates loads of issues. For instance:
- Multiple vs. single inheritancei? (my gut says multiple).
- Do properties automatically get inherited from more
to less general FeatureCards?
- How does this play with feature hierarchy? (instinctively,
I don't think this will be a problem. The general `transcript'
card contains the general `exon' card, just as the card for
one specific transcript contains the cards for several specific
exons).
Issues aside, does anyone like this idea? Or think it's
completely stupid?
Thomas.