[Bioperl-l] Hilmar and Ewan debate SeqFeatures some more...

Matthew Pocock mrp@sanger.ac.uk
Mon, 22 Jan 2001 13:07:52 +0000


Hi. Just thought I'd have a short inane ramble. Please ignore everything that
you don't agree with. I'm realy looking at this more as a user of the libraries
rather than an implementer, so things may look different your side of the fence.

If you intend to end up with multiple feature implementations and multiple types
of locations (point, range, fuzzy etc.) then you should definitely consider
composition - Location interface, Feature interface hasA Location.

Please don't do things like having FuzzyFeature extends Feature, FuzzyLocation -
if Feature must extend Location, then it should be the stupidest extention
possible - otherwise people will get realy confused realy quickly.

We make a lot of stuff very easy by defining that every Location has min & max
that are the lowest and highest index that are within the location. If Feature
must extend Location, then it's min & max should delegate off to min & max in
it's location delegate. These methods should never throw exceptions.

If you go for the composition/delegation aproach, then it feels wrong to me that
Feature extends Location - but there is no reason why the current
implementations of Feature shouldn't implement it, or the Feature interface may
choose to define min/max (or do you use start/end?) so that the legacy code
runs.

If you go for Location & Feature, the hierachy of features should represent the
semantic knowledge about what you are annotating, and the (potential) location
hieracy hanging off a feature should be shallow - just pertain to that feature
only. Locations are stupid math objects. For example, if you have a gene
feature, it's location should span the entire gene area, where as the feature
may only contain child exon features that span part of that region. Otherwise,
you end up with two hierachies that look nearly exactly the same as each other &
life gets confusing.

It works well for us putting strand info in features and leaving locations
a-directional. Strand stuff requires semantic knowledge (you need context), and
that belongs in features - they represent the biological information.

Horible EMBL locations that reference other sequences could be handled with
complicated sequence/featre/location implementations/interfaces - or - you could
just build an assembly of the two entries and project the feature into
assembly-space to get out something that you can represent cleanly. I don't know
how well bioperl does assemblies...

Anyway, that's it. These are the kind of details that give me the Hammer Hooror
tingley spine every time I think about them. Eugh. Embl locations suck.

Matthew