[Bioperl-l] WARNING INCOMING: collection consolidation

Lincoln Stein lstein at cshl.org
Thu Feb 27 15:30:11 EST 2003


Hi Paul,

No, the collection branch is intended for breaking things.  Please check in 
your work!!!!  Otherwise the rest of us can't see what you're doing.

Lincoln

On Wednesday 26 February 2003 10:18 pm, Paul Edlefsen wrote:
> Hilmar Lapp wrote:
> > Just as an aside, a little more communication about what's going on in
> > the freaky branch wouldn't hurt if this changes a lot of things (as
> > opposed to adding things) and is ever to go into the main trunk ...
>
> I agree, and I apologize if it seems mysterious.
>
> Mostly it's the collection consolodation.  I've been holding back from
> checking in the bulk of my work on that because the same branch has been
> used by Lincoln and others to test out some (basically unrelated) ideas
> (relative locations and I think GFF3), and I don't want to check stuff
> in until the tests pass for fear of breaking other people's ongoing work.
>
> The unique identifier stuff is also unrelated and was a quick answer to
> a short discussion that Lincoln and I had about the bulkiness of the
> existing IdentifierI interface and my desire to have a lighter-weight
> one that could unify the disparate concepts of 'unique identifier' that
> I find confusing in BioPerl.  It has so far remained sequestered on the
> freak branch because we've all had so many other things to squabble about.
>
> The collection consolodation has been briefly mentioned on the list,
> mostly as a warning because it will affect users of feature collections,
> including DasI, GFF, and the gbrowse stuff.  The discussion brought up a
> lot of important issues that are still unresolved, particularly about a)
> handling relative ranges, b) the relationship between sequences and
> their annotations, and c) naming conventions.  I have had to trudge
> through with these things up in the air, so I've made some working
> decisions: a) I've added seq_id() to RangeI, but have documented that it
> can remain undef and that's okay; I've also created a RelRangeI (and an
> implementation, RelRange) that adds accessor methods for absolute start,
> end, and strand values, utility methods for conversion between absolute
> and relative range values, and an absolute() flag for forcing
> absoluteness (this all came from the Bio::DB::GFF::RelSegment class); my
> new interface Bio::SeqFeature::SegmentI isa RelRangeI and it is the only
> thing besides RelRange that presently extends/implements RelRangeI.  b)
> I'm just using the SeqFeatureI stuff as-is because I don't yet
> understand the proposed new model; I'm a bit wary about how that will
> work with the new Bio::SeqFeature::CollectionI stuff but I'm excited for
> the challenge.  c) I'm sticking with (the name)
> Bio::SeqFeature::CollectionI for now because I'm lazy and we can't seem
> to decide if it should be Bio::SeqFeatureCollectionI instead; this is a
> minor change downstream if necessary.
>
> On the whole the plan is to make sure that things remain
> backwards-compatible where possible.  The collection consolodation
> unites many existing classes that provide filtered access to feature
> lists, including Bio::SeqFeature::CollectionI,
> Bio::SeqFeature::Collection, Bio::Das, Bio::DasI, Bio::Das::Segment,
> Bio::DB::GFF, Bio::DB::GFF::Segment.  We've also made a new interface
> for _providers_ of collections, to unify access to databases and DAS
> servers and other things that store features.  The need for this is that
> gbrowse currently gets unified access to Das and GFF data sources via
> the DasI interface, which is poorly named and poorly placed for a
> generic data access interface.  The result is three new interfaces in
> Bio::DB, Bio::DB::FeatureProviderI, Bio::DB::SequenceProviderI, and
> Bio::DB::SegmentProviderI, where the latter is a simple extension of the
> two former interfaces.  SequenceProviderI isa Bio::DB::RandomAccessI and
> a Bio::DB::UpdateableSeqI.  All three interfaces provide a minimal core
> set of methods for adding, retrieving, updating, and deleting (features
> or sequences) from a data store.
>
> So far there's nothing (else) major here.  Some existing things will be
> deprecated, such as Bio::DB::GFF::RelSegment.  Some existing things will
> implement additional interfaces (eg. those many collections will now
> implement the common Bio::SeqFeature::CollectionI interface).
>
> I do not think that this email will suffice as a request for comment,
> but comments are welcome.  When it gets closer to real (like when I can
> get the tests to succeed and can check it all in to the freaky branch) I
> will get back to this list with a real proposal and can refer people to
> its working implementation.  I hope that the initial investment will pay
> off.  This is all groundwork for an overhaul of gbrowse's data access
> methodology, with the goal of making gbrowse more component-based and
> allowing for multiple simultaneous data sources of more disparate types.
>
> Thanks for reading all the way through this long message.  Please accept
> my apology if it seems that we have failed to solicit sufficient input
> from the group; your comments will be appreciated.
>
> :Paul
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein at cshl.org			                  Cold Spring Harbor, NY
========================================================================




More information about the Bioperl-l mailing list