[Bioperl-l] Refactoring Locations...

Jason Stajich jason@cgt.mc.duke.edu
Tue, 2 Jul 2002 10:57:00 -0400 (EDT)


On Tue, 2 Jul 2002, Heikki Lehvaslaiho wrote:

> It's done.
>
> If you have any errors being generated from location in the CVS HEAD, I'd be
> happy to have a look at them.
>
> Bio::Location::Fuzzy now complains if location like 23^24 is assigned to it.
> You should use Bio::Location::Simple with location_type('IN-BETWEEN').
>
> Location.t tests failed overnight failed because I forgot to add and commit
> Bio::Location::Atomic. Fixed.
>
Good stuff am looking through it.  I think we need to revisit the speedups
which bypass the constructors in Simple to make sure they still work (see
FTHelper), also we get a couple of errors when running SeqIO.t regarding
missing 'root location and seq_id' (this would be for split location
writing I'm guessing).

>
> There are really quite a lot of errors and warnings when running tests in
> the HEAD. It is difficult to see which are important and which are caused
> from missing binaries.
>

All of the wrappers around binary apps should be migrated away from the
core code, this may be some of the problems.  I've just removed the tests
from the core and added them to the bioperl-run, do a

% cvs update -dP

People writing run wrapper tests -- Make sure your code fails gracefully
when your exe is present.

> 	-heikki
>
> Heikki Lehvaslaiho wrote:
> >
> > I ran into a small problem with Bio::Locations and would like to
> > slightly refactor them.
> >
> >  From my point of view there are three types of exact sequence locations
> > which in feature table notation are: 23, 34..55 and 46^47. The first two
> > are handled by Bio::Location::Simple and have location_type('EXACT').
> > The last one is lumped into location_type('BETWEEN') together with
> > locations like 46^78 and handled by Bio::Location::Fuzzy. The source for
> > the confusion is that the feature table definition allows for locations
> > like 46^78 which I do not think are used anywhere. To stress, notation
> > 46^47 is essential when you have clean insertions between residues.
> >
> >
> > Currently we have Bio::LocationI which defines the interface,
> > Bio::Location::Simple and two subclasses of Simple: Bio::Location::Fuzzy
> > and  Bio::Location::Split.
> >
> > What I'd like to have is to rename the current Simple into Atomic to be
> > a common superclass and recreate Bio::Location::Simple so that it can
> > have two  values for the method location_type(): 'EXACT' and
> > 'IN-BETWEEN' ('TWEEN', 'TWIXT' ?). The object will throw an error if
> > location_type() is 'TWEEN' and
> > start() and end() are both defined and not adjacent. The length of
> > 'TWIXT' location is always zero. The default value of location_type()
> > will be 'EXACT'.
> >
> >
> > In practice the code changes seem to be easy to make and there might
> > even be slight speed increase: Current Simple does some thing slightly
> > convoluted way because methods are inherited by Fuzzy and Split.
> > Using Bio::Location::Simple in scripts and other modules is made more
> > complicated only if you are conserned about insertions (your should
> > be!). You can then test either location_type() or lenght().
> >
> >
> > The only other place in bioperl core outside Bio::Location that I have
> > found to be affected is FTHelper.pm where one more condition needs to be
> > added.
> >
> >
> > I have almost all the code changes ready for committing.
> >
> > Any comments?
> >
> >     -Heikki
> >
>
>
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu