[Bioperl-l] Refactoring Locations...

Chris Mungall cjm@fruitfly.org
Fri, 28 Jun 2002 09:54:54 -0700 (PDT)


I second this. gadfly works in space-oriented coordinates. you have to be
super-rigorous in import/export but otherwise it's a much better system,
it's ridiculous having to import an awkward fuzzy system for representing
insertions/splice sites etc.

is it really too late to have us switch to this system? I can't see how it
would be done without extreme pain but I think it'd be worth it in the
end. bioperl2.0?

On Fri, 28 Jun 2002, Lincoln Stein wrote:

> The suggested refactoring sounds correct.  I prefer IN-BETWEEN to TWEEN or
> TWIXT.
>
> As a meta comment, life would be much easier if positions were described
> (perhaps internally) as zero-based half open intervals, which is the way that
> all sensible graphics code does it (I first learned the concepts working with
> Apple's QuickDraw).  In half-open intervals, the coordinates refer to the
> spaces between the nucleotides, rather than to the nucleotides themselves.
> For the dinucleotide AG, the following mappings hold:
>
> 	coordinate		sequence
>
> 	(0,1)			A
> 	(0,2)			AG
> 	(1,1)			space between A & G
>
> Note that in half-open intervals, the length of the sequence is always end
> minus start, and that you can do coordinate arithmetic withoug adding and
> subtracting 1's.
>
> Lincoln
>
>
> On Thursday 27 June 2002 12:34 pm, Heikki Lehvaslaiho wrote:
> > I ran into a small problem with Bio::Locations and would like to slightly
> > refactor them.
> >
> >  From my point of view there are three types of exact sequence locations
> > which in feature table notation are: 23, 34..55 and 46^47. The first two
> > are handled by Bio::Location::Simple and have location_type('EXACT'). The
> > last one is lumped into location_type('BETWEEN') together with locations
> > like 46^78 and handled by Bio::Location::Fuzzy. The source for the
> > confusion is that the feature table definition allows for locations like
> > 46^78 which I do not think are used anywhere. To stress, notation 46^47 is
> > essential when you have clean insertions between residues.
> >
> >
> > Currently we have Bio::LocationI which defines the interface,
> > Bio::Location::Simple and two subclasses of Simple: Bio::Location::Fuzzy
> > and Bio::Location::Split.
> >
> > What I'd like to have is to rename the current Simple into Atomic to be a
> > common superclass and recreate Bio::Location::Simple so that it can have
> > two values for the method location_type(): 'EXACT' and  'IN-BETWEEN'
> > ('TWEEN', 'TWIXT' ?). The object will throw an error if location_type() is
> > 'TWEEN' and start() and end() are both defined and not adjacent. The length
> > of 'TWIXT' location is always zero. The default value of location_type()
> > will be 'EXACT'.
> >
> >
> > In practice the code changes seem to be easy to make and there might even
> > be slight speed increase: Current Simple does some thing slightly
> > convoluted way because methods are inherited by Fuzzy and Split.
> > Using Bio::Location::Simple in scripts and other modules is made more
> > complicated only if you are conserned about insertions (your should be!).
> > You can then test either location_type() or lenght().
> >
> >
> > The only other place in bioperl core outside Bio::Location that I have
> > found to be affected is FTHelper.pm where one more condition needs to be
> > added.
> >
> >
> > I have almost all the code changes ready for committing.
> >
> > Any comments?
> >
> > 	-Heikki
>
>