[Bioperl-l] Locations

Heikki Lehvaslaiho heikki@ebi.ac.uk
Thu, 15 Feb 2001 17:04:32 +0000


Mark Wilkinson wrote:
> 
> Heikki Lehvaslaiho wrote:
> 
> >  Are you sure this is a good approach? Are fuzzy locations really that
> 
> > important? My feeling is that most people want to forget fuzzyness as
> > long as they can.
> 
> I was under the impression that this is exactly what had been decided
> here amongst the group...???  Or did I grossly misunderstand?  In any
> case, I wholly agree with your sentiment that ->start and ->end should
> *never* have to be explicitly tested for returning a defined value.  But
> I honestly thought that was the final conclusion reached in this
> discussion.

Luckily nothing is cast in stone. 8-)
I know I was a bit late coming in to this discussion. It was not until
I actually used the new classes I fully understood the implications.

Unfortunately the connection to bioperl server was lost at the time I
sent my mail. The topic was discussed in private mails and the end
result was mentioned by Jason in his last mail to the list:

2) Implement the CoordinatePolicy as per our discussion and fix
    fuzzy/split to alway return a real number for start/end whenever
    possible.  maximal range, etc... 


I'll collect here messages from Jason, Hilmar, Ewan and me to give
everyone a better picture what has been going on.

	-Heikki


Subject: Re: Locations
Date: Tue, 13 Feb 2001 11:37:21 -0500 (EST)
From: Jason Stajich <jason@chg.mc.duke.edu>
To: Heikki Lehvaslaiho <heikki@ebi.ac.uk>
CC: Hilmar Lapp <hlapp@gmx.net>, lapp@gnf.org, Ewan Birney
<birney@ebi.ac.uk>

DNS servers are down on east cost somewhere and boston <-> nyc
connectivity is down or poor as far as I can tell and Chris Dadigian
confirms.  8+ hrs and the dns info is expiring on local dns srvs....

I think the majority of problems stem from having to comply with
backwards
compatibility.

I guess my questions are.

1) When you have a fuzzy location <50..80
   What do you want $location->start to return here?

   We didn't want to throw exceptions so start must return something,
I
   started with undef, then made it '0', what else is appropriate? 
50?
   What about for the location (30.40)..80?

2) How should a split location behave when it contains a fuzzy
location?
   When getting the min_start/max_end for a split location it has to
be 
   sure and get the outermost values, or do we just ignore Fuzzies
here?
   Every time we've tried to suggest corner cutting I've gotten people
   saying "no we have to handle that in the model too".  So what I
have
   put out there attempts to handle every type of location that I was
   aware of.

3) Currently a Fuzzy location is just a supercharged Simple location,
ie
   you can use a fuzzy location object with exact start/end/span and
it
   will behave just like a Simple location.  Is this okay?

4) Does this mean we need to revisit the LocationI interface or the 
Location::SplitLocationI Location::FuzzyLocationI locations?

I'm happy to change things to make them easier to use, but little of
the
feedback I've gotten on the list has been constructive suggestions on
behavior.  I spent a lot of time trying to put the ideas that were
batted
around into software and am now paying the price with my own work
projects.  So I really need concrete suggestions about the behavior
now
before I can jump in and make changes.  Sorry if that sounds trite,
I'm
having to justify my time expenditures right now and try and revamp an
abstract for submission tomorrow.

-Jason


Subject: Re: Locations
Date: Tue, 13 Feb 2001 16:46:48 +0000 (GMT)
From: Ewan Birney <birney@ebi.ac.uk>
To: Jason Stajich <jason@chg.mc.duke.edu>
CC: Heikki Lehvaslaiho <heikki@ebi.ac.uk>, Hilmar Lapp
<hlapp@gmx.net>,lapp@gnf.org

On Tue, 13 Feb 2001, Jason Stajich wrote:

> DNS servers are down on east cost somewhere and boston <-> nyc
> connectivity is down or poor as far as I can tell and Chris Dadigian
> confirms.  8+ hrs and the dns info is expiring on local dns srvs....
> 
> I think the majority of problems stem from having to comply with backwards
> compatibility.
> 
> I guess my questions are.
> 
> 1) When you have a fuzzy location <50..80
>    What do you want $location->start to return here?
> 
>    We didn't want to throw exceptions so start must return something, I
>    started with undef, then made it '0', what else is appropriate?  50?
>    What about for the location (30.40)..80?

I want this to return 50 ;) Inconsistent I know, but common usage.

> 
> 2) How should a split location behave when it contains a fuzzy location?
>    When getting the min_start/max_end for a split location it has to be 
>    sure and get the outermost values, or do we just ignore Fuzzies here?
>    Every time we've tried to suggest corner cutting I've gotten people
>    saying "no we have to handle that in the model too".  So what I have
>    put out there attempts to handle every type of location that I was
>    aware of.
> 

I couldn't give a monkeys ;)


> 3) Currently a Fuzzy location is just a supercharged Simple location, ie
>    you can use a fuzzy location object with exact start/end/span and it
>    will behave just like a Simple location.  Is this okay?
> 
> 4) Does this mean we need to revisit the LocationI interface or the 
> Location::SplitLocationI Location::FuzzyLocationI locations?
> 
> I'm happy to change things to make them easier to use, but little of the
> feedback I've gotten on the list has been constructive suggestions on
> behavior.  I spent a lot of time trying to put the ideas that were batted
> around into software and am now paying the price with my own work
> projects.  So I really need concrete suggestions about the behavior now
> before I can jump in and make changes.  Sorry if that sounds trite, I'm
> having to justify my time expenditures right now and try and revamp an
> abstract for submission tomorrow.
> 


Go for the best thing you can do. Don't forget - whoever codes it wins
the
argument. If people object - let them code it.



-------- Original Message --------
Subject: Re: Locations
Date: Tue, 13 Feb 2001 11:58:18 -0500 (EST)
From: Jason Stajich <jason@chg.mc.duke.edu>
To: Ewan Birney <birney@ebi.ac.uk>
CC: Heikki Lehvaslaiho <heikki@ebi.ac.uk>, Hilmar Lapp
<hlapp@gmx.net>,lapp@gnf.org


On Tue, 13 Feb 2001, Ewan Birney wrote:

> On Tue, 13 Feb 2001, Jason Stajich wrote:
> 
> > DNS servers are down on east cost somewhere and boston <-> nyc
> > connectivity is down or poor as far as I can tell and Chris Dadigian
> > confirms.  8+ hrs and the dns info is expiring on local dns srvs....
> > 
> > I think the majority of problems stem from having to comply with backwards
> > compatibility.
> > 
> > I guess my questions are.
> > 
> > 1) When you have a fuzzy location <50..80
> >    What do you want $location->start to return here?
> > 
> >    We didn't want to throw exceptions so start must return something, I
> >    started with undef, then made it '0', what else is appropriate?  50?
> >    What about for the location (30.40)..80?
> 
> I want this to return 50 ;) Inconsistent I know, but common usage.

I'll go with the maximal bound behavior here. 

<50..100 start=50, end=100, min_start/max_start = start,
min_end/max_end=end 
>50..100 start=50, end=100, "" ""
(30.40)..100 start=30, end=100, min_start=30, max_start=40, ""
50..(80.90) start=50, end=90, min_start/max_start= start,
            min_end=80/max_end=90


If location is 40..(50.60), end will return 60.




-------- Original Message --------
Subject: Re: Locations
Date: Tue, 13 Feb 2001 11:24:12 -0800
From: Hilmar Lapp <hilmarl@yahoo.com>
Organization: Nereis 4
To: Jason Stajich <jason@chg.mc.duke.edu>
CC: Heikki Lehvaslaiho <heikki@ebi.ac.uk>, lapp@gnf.org,Ewan Birney
<birney@ebi.ac.uk>
References:
<Pine.GSO.4.05.10102131122070.18862-100000@helix.mc.duke.edu>

Thanks Heikki for your feedback.

A few general comments first: the design is absolutely not focused
on fuzzy locations, even though it may seem so. The idea behind
having min_XXX()/max_XXX() already in LocationI is maybe the
result of entangling 2 different things:

	1) Be able to record the information of different kinds
	of location, of which FuzzyLocation is one.
	2) Make it easy for the client-side user to retrieve
	coordinates in a consistent and choosable way.

Goal 1) is pretty well achieved I think. Goal 2) is open for
debate.

{min,max}_XXX() enables the client to achieve several policies for
coordinate-computation on-the-fly, without having to bother about
different location types. At least, that was the idea I think. So,
if the client wants to go for a always-widest-range policy, he/she
calls min_start()/max_end() instead of start()/end(), regardless
of location type. Changing the policy is easy (maybe it's not; we
have too little practical experience yet).

The last note in braces is not purely rhetorical. Imagine an
interactive application that wants to let the user change the
policy. This requires extensive client code, or he/she must
provide his/her own location classes overriding the default ones,
which I think should not be prohibitive.

It is clear that some users will want a different behaviour of
coordinate-computation than others. The question this poses for
the design is how to make this possible in the best way for
client-side as well as the core classes.

In OO speak, you could delegate this to a CoordinatePolicy, which
could look like the following (assuming that we all know Java):

interface CoordinatePolicy {
	int start(LocationI loc);
	int end (LocationI loc);
}

LocationI would get a method

	void setCoordinatePolicy(CoordinatePolicy pol);

and several useful policies are provided by default, and an
implementation would of course initialize every location with a
default. This lets you change the behaviour of an object
on-the-fly, with minimal code required on the client side, and in
a consistent way.

However, this would further increase the number of classes ...
although in 99% of use-cases not the number of objects, because
every location object would use essentially the same instance (to
be ensured by implementation, certainly feasible).

Other than that, it will be difficult to achieve big consensus
about what to do in situations like Jason pointed out. That is, if
you have <5..20, what do you return for min_start()? If you return
undef, you break a default policy of always-widest relying on
valid numbers. If you return 5, you pretend it's not fuzzy,
because then min_start() == max_start() (you may claim here that
the client should inspect the start_pos_type()). If you return 0,
it's a valid number and != max_start(), but it has other
disadvantages.

> 
> I guess my questions are.
> 
> 1) When you have a fuzzy location <50..80
>    What do you want $location->start to return here?
> 

I think the idea is to return something that is guaranteed to be a
valid number and makes, out of all possible choices, the most
general sense (even though it might not fit some user's specific
needs).

For the example, this means start() == 50.

>    We didn't want to throw exceptions so start must return something, I
>    started with undef, then made it '0', what else is appropriate?  50?
>    What about for the location (30.40)..80?
> 

According to said idea, start() == 30 here. (With the policy model
it is easy to replace with an object that takes the mean for
'BETWEEN' fuzzy locations.)

> 2) How should a split location behave when it contains a fuzzy location?
>    When getting the min_start/max_end for a split location it has to be
>    sure and get the outermost values, or do we just ignore Fuzzies here?

Outermost I think makes the 'most general sense'. Don't ignore.

> 
> 3) Currently a Fuzzy location is just a supercharged Simple location, ie
>    you can use a fuzzy location object with exact start/end/span and it
>    will behave just like a Simple location.  Is this okay?
> 

That's how it should be in my opinion.

> 4) Does this mean we need to revisit the LocationI interface or the
> Location::SplitLocationI Location::FuzzyLocationI locations?
> 

As mentioned at the beginning, I think modeling various types of
locations is achieved in the first place and doesn't need
revisiting (although other GenBank nightmares may require
additional classes) before gaining some experience (and
complaints) with the model.

The coordinate computation is a different story and is open for
debate.

> I'm happy to change things to make them easier to use, but little of the
> feedback I've gotten on the list has been constructive suggestions on
> behavior.

Because 99% of people want the default beviour and think they're
not concerned otherwise. The remaining 1% probably has 1000
different opinions.

What do you think about the coordinate policy model? Is it worth
it? (It wouldn't be much code anyway.)

	Hilmar



-------- Original Message --------
Subject: Re: Locations
Date: Wed, 14 Feb 2001 09:29:07 +0000
From: Heikki Lehvaslaiho <heikki@ebi.ac.uk>
Organization: EMBL - EBI
To: Hilmar Lapp <hilmarl@yahoo.com>
CC: Jason Stajich <jason@chg.mc.duke.edu>, lapp@gnf.org,Ewan Birney
<birney@ebi.ac.uk>
References:
<Pine.GSO.4.05.10102131122070.18862-100000@helix.mc.duke.edu>
<3A8989DC.D341BB23@yahoo.com>


Sorry for not being able to participate to discussion yesterday.

I am quite happy about about the expressed opinions which seem to
converge nicely to a consensus. 

Hilmar's idea about CoordinatePolicy seems to me the way to go if you
really want to cover all the possibilities. Go for it!
Maximal range policy seems to be the best default one.

Really pleased,

	-Heikki