[Biojava-dev] Introducing feature-schemas
Thomas Down
td2@sanger.ac.uk
Sun, 24 Nov 2002 23:07:06 +0000
I've just checked in a patch when introduces a simple `feature
schema' mechanism for BioJava. This touches quite a lot of
files, but the good news is that the impact should be relatively
minor at first -- the only people who *have* to pay attention
to this are FeatureHolder implementors, and people who directly
use manipulate MergeFeatureHolder objects.
A feature schema is simply a FeatureFilter which provides an
`upper bound' on a set of features. In the past, these have
already been used in various ad hoc ways. For example,
MergeFeatureHolder had a method (now removed) for specifying
a `membership filter' on a sub-FeatureHolder.
The new approach involves one new method on the FeatureHolder
interface:
public FeatureFilter getSchema();
This returns a FeatureFilter which will accept all top level
Features in the FeatureHolder. It is also possible to give
information about their child features. For example:
new FeatureFilter.And(
new FeatureFilter.ByType("transcript"),
new FeatureFilter.OnlyChildren(
new FeatureFilter.And(
new FeatureFilter.Or(
new FeatureFilter.ByType("exon"),
new FeatureFilter.ByType("translation")
),
FeatureFilter.leaf
)
)
);
This schema indicate that:
- All top level features have type "transcript"
- Transcripts may have child features of type "exon" or "translation"
- There are no grandchild features.
It is, of course, valid to return the non-informative schema,
FeatureFilter.all.
There are a number of possible uses for schemas. The primary
reason for their existence is query-optimization. It is possible
(using the FilterUtils.areDisjoint method) to compare a given
query FeatureFilter against a schema, and potentially prove that
this query will return an empty set. Other possible applications
include introspecting the available feature types (for display
to the user).
All FeatureHolder implementations in biojava-live should now
return valid schema information, although it is not always
as restrictive as it could be.
Let me know if there are any problems with this,
Thomas.