[Bioperl-l] WARNING INCOMING: collection consolidation

Ewan Birney birney at ebi.ac.uk
Thu Feb 27 12:07:16 EST 2003



> The unique identifier stuff is also unrelated and was a quick answer to
> a short discussion that Lincoln and I had about the bulkiness of the
> existing IdentifierI interface and my desire to have a lighter-weight
> one that could unify the disparate concepts of 'unique identifier' that
> I find confusing in BioPerl.  It has so far remained sequestered on the
> freak branch because we've all had so many other things to squabble about.
>

I quite like the Identifier stuff, but I suspect if we do it I think we
should try to do it as much as possible across the entire set.

  Am I right in thinking that one of your classes is:


Uniquely-Identifiable-Object-For-This-Implementation-but-not-exportable-ids

and the other one is

Uniquely-Identifiable-Object-For-Planet-Bioinformatics-and-so-exportable/queryable-ids


I certainly find these two concepts separable and useful to distinguish
which was why back an ancient history on Bio::PrimarySeqI I had

  primary_id - the non world visible one, and a really stupid name

  accession_number - the world visible one


If I am right, what are your object names? If I am wrong... can you
enlighten me...?


> The collection consolodation has been briefly mentioned on the list,
> mostly as a warning because it will affect users of feature collections,
> including DasI, GFF, and the gbrowse stuff.  The discussion brought up a
> lot of important issues that are still unresolved, particularly about a)
> handling relative ranges, b) the relationship between sequences and
> their annotations, and c) naming conventions.  I have had to trudge
> through with these things up in the air, so I've made some working
> decisions: a) I've added seq_id() to RangeI, but have documented that it
> can remain undef and that's okay; I've also created a RelRangeI (and an
> implementation, RelRange) that adds accessor methods for absolute start,
> end, and strand values, utility methods for conversion between absolute
> and relative range values, and an absolute() flag for forcing
> absoluteness (this all came from the Bio::DB::GFF::RelSegment class); my
> new interface Bio::SeqFeature::SegmentI isa RelRangeI and it is the only
> thing besides RelRange that presently extends/implements RelRangeI.  b)
> I'm just using the SeqFeatureI stuff as-is because I don't yet
> understand the proposed new model; I'm a bit wary about how that will
> work with the new Bio::SeqFeature::CollectionI stuff but I'm excited for
> the challenge.  c) I'm sticking with (the name)
> Bio::SeqFeature::CollectionI for now because I'm lazy and we can't seem
> to decide if it should be Bio::SeqFeatureCollectionI instead; this is a
> minor change downstream if necessary.
>
> On the whole the plan is to make sure that things remain
> backwards-compatible where possible.  The collection consolodation
> unites many existing classes that provide filtered access to feature
> lists, including Bio::SeqFeature::CollectionI,
> Bio::SeqFeature::Collection, Bio::Das, Bio::DasI, Bio::Das::Segment,
> Bio::DB::GFF, Bio::DB::GFF::Segment.  We've also made a new interface
> for _providers_ of collections, to unify access to databases and DAS
> servers and other things that store features.  The need for this is that
> gbrowse currently gets unified access to Das and GFF data sources via
> the DasI interface, which is poorly named and poorly placed for a
> generic data access interface.  The result is three new interfaces in
> Bio::DB, Bio::DB::FeatureProviderI, Bio::DB::SequenceProviderI, and
> Bio::DB::SegmentProviderI, where the latter is a simple extension of the
> two former interfaces.  SequenceProviderI isa Bio::DB::RandomAccessI and
> a Bio::DB::UpdateableSeqI.  All three interfaces provide a minimal core
> set of methods for adding, retrieving, updating, and deleting (features
> or sequences) from a data store.
>

This jives well for me. At singapore I proposed a reordering of the
classes to deal with the "multiple coordinate system" (one feature being
on - say - 3 coordinate systems, being genomic, contig and cDNA) whilst
neatly maintaining backward compatibility of SeqFeatures and - very
attractively in my view - unifying the objects to store annotation about a
feature with the objects to store annotation about a sequence.


Did my proposal make sense? I think your Bio::DB::FeatureProviderI is very
close to my proposed Bio::Seq::CoordinateManagerI and/or
Bio::Seq::FeatureCollectionI.


Aaron is planning to do some commentary about this. Realistically we do
need to all get into the same room. Don't suppose you can fly
Seattle-->NY in the next couple of days?




> So far there's nothing (else) major here.  Some existing things will be
> deprecated, such as Bio::DB::GFF::RelSegment.  Some existing things will
> implement additional interfaces (eg. those many collections will now
> implement the common Bio::SeqFeature::CollectionI interface).
>
> I do not think that this email will suffice as a request for comment,
> but comments are welcome.  When it gets closer to real (like when I can
> get the tests to succeed and can check it all in to the freaky branch) I
> will get back to this list with a real proposal and can refer people to
> its working implementation.  I hope that the initial investment will pay
> off.  This is all groundwork for an overhaul of gbrowse's data access
> methodology, with the goal of making gbrowse more component-based and
> allowing for multiple simultaneous data sources of more disparate types.
>
> Thanks for reading all the way through this long message.  Please accept
> my apology if it seems that we have failed to solicit sufficient input
> from the group; your comments will be appreciated.


More communication.... good. We probably need a 3rd party (Aaron) to
produce the final insights....


>
> :Paul
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list