[Bioperl-l] feature holder for testing overlaps, etc

Jason Stajich jason@cgt.mc.duke.edu
Wed, 15 May 2002 18:45:11 -0400 (EDT)


Here is the proposal for an in-memory SeqFeature collection interface
and object tenatively called Bio::SeqFeature::FeatureCollectionI and
Bio::SeqFeature::Collection - which is analagous to ChrisM's described
IntersectionGraph (maybe it can inheriet from an InterfaceGraphI if
you want to help abstract those methods out).

SeqFeatureCollectionI interface
methods:
add_features    -- add a set of features to the collection

features_in_range -- returns a list of features that are contained in
		     a specified start & end,range or LocationI.
		     Optionally taking into account strand in the same
		     way the Range overlap/contains methods do.
		     Accept a flag as to whether to test for features
		     that overlap or are completely contained.
get_features(-tag => $tag) - returns a list features that have the
		     requested tag (this will only be more efficient
		     than grepping on the list if the # of features is
		     large.

It could be reasonable to let Bio::Seq objects use a
SeqFeatureCollection to hold their features depending on the
efficiency here - but one thing at a time.

Bio::SeqFeature::Collection would be implemeted with a BDB B-Tree and
use Lincoln's bin method from Bio::DB::GFF::Util::Binning.  I'm not
sure how to get things that fall within a range from the BDB B-Tree
interface - have to pull from a sorted list somehow and most of the
examples are for duplicate hash keys, hints appreciated.

-jason
-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu