[Biojava-l] FW: Location Problems
Keith James
kdj@sanger.ac.uk
25 Oct 2001 10:31:02 +0100
>>>>> "Ewan" == Ewan Birney <birney@ebi.ac.uk> writes:
Ewan> On Wed, 24 Oct 2001, Forsch, Dan wrote:
>> I'm pretty sure this doesn't qualify as 'brilliant thought',
>> but...
>>
>> I'd like to see the solution to this problem move BioJava away
>> from having StrandedFeature as a sub-interface of Feature,
>> thereby eliminating those annoying (to me anyway) 'if
>> instanceof StrandedFeature' checks in the code. A Strand could
>> become an attribute of (and inner class within) something else,
>> possibly of Locations. If each Location has an associated
>> Strand then the components of a CompoundLocation could differ.
>> I'm not sure if this fixes the issue with RemoteFeatures but I
>> think the same principle would apply.
Ewan> This is like the bioperl approach. (bioperl takes locations
Ewan> into a whole tree-system to allow representation of
Ewan> FuzzyLocations)
Ewan> I know that Thomas likes in BioJava strandness being a
Ewan> property of the feature, not the location which I think it
Ewan> quite a good principled stand: it just causes havoc wrt to
Ewan> EMBL/GenBank.
Ewan> I think I made a similar stand against "complex" locations
Ewan> in Bioperl for a while before I was overruled by people
Ewan> wanting, understandably, to parse the *whole* of GenBank,
Ewan> and then round-trip it properely.
Ewan> It is going to be interesting to see BioJava's approach to
Ewan> this.
Ewan> But - just to say - I don't think there is a 100% clean
Ewan> solution here. Just different compromises.
I've had too much coffee this morning... it seems to me that
Biojava Locations represent nothing more than "this stretch of
sequence" with no biological interpretation
EMBL/Genbank locations add some biological interpretation to the same
data
Biojava Features often offer biological interpretation over their
Location (I think they do this implicitly wherever they have a
Strand)
What about formalising the implicit biological interpretation in
Feature and using it to store the extra info from the EMBL/Genbank
location?
Feature would need a way of presenting the raw information in the
Location to say how it is to be interpreted. Currently this is done by
having one Strand attribute per StrandedFeature.
Instead, any Feature (rather than just StrandedFeature, addressing
Dan's point) could have a rule for, in this case, adding Strand
information to its component pieces.
Feature f;
Strand s = f.getStrand();
would return POSITIVE, NEGATIVE, UNKNOWN or MIXED
for (Iterator li = f.getLocation().blockIterator(); li.hasNext();)
{
Strand s = f.getStrand((Location) li.next());
would return the strand of the Location or barf "This location isn't
one of mine".
The rule would have to define what to do with recursive locations (and
could disallow them).
My 0.022455 Euros
--
-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
Pathogen Sequencing Unit, Wellcome Trust Sanger Institute, Cambridge, UK