[Bioperl-l] proposed additions to SeqFeatureI, RangeI and FeatureHolderI

Chris Mungall cjm at fruitfly.org
Wed Dec 3 14:46:29 EST 2003



On Thu, 20 Nov 2003, Lincoln Stein wrote:

> On Wednesday 19 November 2003 09:47 pm, Chris Mungall wrote:
> > I have some proposed changes I would like to commit to bioperl, mostly
> > for using GFF3.
> >
> > In both SeqFeatureI and SeqFeature::Generic I would like to add some
> > accessor methods. They would all map to tag-values.
> >
> >   ID         - synonym for tag_value('ID')[0]
> >   ParentIDs  - synonym for tag_value('Parent')
>
> I like this.
>
> >   add_ParentID
> >   remove_ParentID
> >   remove_ParentIDs
> >
> > Question - should the method be Parent or ParentID? In GFF3, the tag
> > is "Parent". But an accessor method called "Parents()" feels like it
> > should return objects, so I think ParentIDs() is better.
>
> Do the methods return IDs or objects?  If they're returning IDs, then the
> ParentID() name sounds right.

Ok, let's go for ParentID

> > Also, I realise it's contrary to bioperl convention to have method
> > names in caps, but it's nice to be consistent with the GFF3 tags.
>
> If you want to be completely consistent with convention, how about get_ID()
> and get_ParentIDs()?  I have a private convention that initial capitalized
> methods are autoloaded/autogenerated, but this is just me.

I had imagined these to be 'first-class' accessors, like primary_tag(),
seq(), etc (although they would be synonyms for get_tag_values('ID'),
set_tag_values('ID'), ...)

there seems to be 3 different kinds of attributes:

foo()                   foo($foo)
get_foo()               set_foo($foo)
get_tag_values('foo')   set_tag_values('foo', [$foo])

I'm not sure what the rules are for deciding which attributes have which
kinds of accessor

> > I also notice that in SeqFeatureI we have an accessor definition and
> > implementation for "primary_id". There is no definition for this.
> >
> > I propose either eliminating this, or making it a synonym of ID()
>
> Good with me.

Ok

> > I think we need clearly defined semantics for these fields. I think
> > the semantics should be such that the ID should uniquely identify the
> > feature. This is problemmatic, as most sources don't issue a unique
> > accession or identifier for features. For example, genbank files
> > provide a /gene for a lot of features, but this isn't even unique
> > e.g. with multicopy genes. In cases where the data source does not
> > provide a unique ID, we may want a way to generate them. So I think
> > there should also be a method:
> >
> >   generateID()
> >
> > which sets the ID field to something that's guaranteed unique. I'm not
> > sure how. Perhaps a combination of the timestamp and the object memory
> > reference?
>
> I think there was a proposal for globally_unique_ID() at some point.  Perhaps
> time to resurrect that thread?

This is a tricky one...

> > Because I'm lazy I'd rather do all this in SeqFeatureI - it all
> > delegates to existing methods. But I am unsure as to bioperl
> > conventions regarding when an 'interface' has implementation code.
>
> Happy to see it.

Ok

> >
> > ----
> >
> > I also want to add some code to FeatureHolderI, for dealing with the
> > "nesting hierarchy" in bioperl, i.e. features that contain other
> > features.
> >
> > The methods are:
> >
> >   nest_features()
> >
> > creates a feature nesting hierarchy based on the "ID" and "Parent"
> > tags. This is useful when parsing GFF3.
>
> Yes, I like this.
>
> >
> > Also:
> >
> >   flatten_features()
> >
> > for flattening the nesting hierarchy (so top_SeqFeatures and
> > get_SeqFeatures return the same thing)
>
> I like this too.
>
> >
> > Also:
> >
> >   set_ParentIDs_from_hierarchy()
> >
> > This will go through the FeatureHolder hierarchy; any time it sees a
> > feature with subfeatures, it will set the children's "Parent" tag
> > according to the "ID" tag of the parent. If the parent does not have
> > an ID, one will be generated.
>
> This sounds like an internal method that nobody should ever see in the API!

Ok

> > And nothing to do with the above code, I would like to add methods to
> > RangeI for interbase coordinates. Love em or hate em, these methods
> > will make some people's code easier at no cost to bioperl.
> >
> > First the interbase equivalent of start/end:
> >
> >   istart
> >   iend
> >
> > Of course, iend is just a synonym for end, but it's nice for
> > completion
> >
> > This is the equivalent of chado fmin/fmax.
> >
> > I would also like:
> >
> >   ifrom
> >   ito
> >
> > For interbase directional coordinates. This is equivalent to
> > istart,iend in the + strand, and the reverse of this in the - strand.
>
> I have no objection to these guys going into the Interface as the appropriate
> implemented methods.  That way they'd be available everywhere.

Ok

> Lincoln

Chris



More information about the Bioperl-l mailing list