[Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
Chris Fields
cjfields at illinois.edu
Thu Jun 25 17:02:48 UTC 2009
On Jun 25, 2009, at 9:46 AM, Chase Miller wrote:
> Hi all,
>
> Quick question I came across while writing the Bio::Nexml module.
>
> I'm trying to link taxon data to a Bio::LocatableSeq object inside a
> Bio::SimpleAlign object. Bio::SimpleAlign has the ability to add
> SeqFeatures, but according to this HowTo (
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
> considered to refer to a portion of a sequence, whereas something
> like taxon
> data would refer to the entire sequence and should be handled as an
> annotation. However, as far as I can tell Bio::LocatableSeq does not
> support
> annotation objects.
> What would be the best way to relate taxon data to a single sequence
> inside
> an alignment?
>
> Thanks,
> Chase
From working with feature/annotation-rich alignment formats such as
stockholm I found this is one of the areas for Align that needs some
rethinking. One way to work around this w/o major refactoring is to
have a full-length SeqFeature (pointing to the proper LocatableSeq)
that stores the Bio::Annotation. I don't necessarily like that
approach as a long-term solution, though, as it's a little hacky and
indirect, but it might get you started (just mark it as TODO so we can
catch it at some point).
For a long-term solution I don't think the answer is as simple as
making LocatableSeq Bio::AnnotatableI; that would not be congruent
with the PrimarySeq implementation (which is not AnnotatableI).
LocatableSeq is supposed to represent a simple PrimarySeq that can be
mapped to other sequences via start/end/strand, and thus inherits from
both Bio::PrimarySeq (note lack of 'I') and RangeI.
Three options:
1) Bio::Seq could be refactored to handle both Bio::PrimarySeq and
Bio::LocatableSeq, and SimpleAlign reworked to allow any simple RangeI.
2) Bio::PrimarySeq can be AnnotatableI (Bio::Seq would delegate to the
PrimarySeq AnnotationCollection).
3) All AnnotationI need to be linked back to the PrimarySeqI somehow
e.g. features.
I personally think option #2 is easiest, as this means anything that
is-a PrimarySeq is also AnnotatableI, and it might not break past
scripts. Not sure how this would affect overall performance though.
chris
More information about the Bioperl-l
mailing list