[Bioperl-l] Proposal for Meta data

Jason Stajich jason at bioperl.org
Fri Dec 15 14:28:13 UTC 2006


On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>
>> Hey Chris,
>>
>> My thoughts below.
>>
>>> [Chris]
>>> This could be used to annotate any
>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
>>> maybe in a collection (similar to AnnotationCollection).  I thought
>>> something like this may be of general use for any PrimarySeq
>>> (quality, structure), alignments like NEXUS and Stockholm,
>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>>> etc.
>>>
>>> However, this also seems to fall into the category of sequence
>>> annotation.  So, would it be better to have a set of Bio::Annotation
>>> classes used for this purpose?
>>
>>
>> To me, all meta data is equal. That is, your classic Genbank feature
>> annotation and a user's arbitrary meta-tag like "Bob thinks this is a
>> kinase domain" aren't different in kind even if they are different in
>> content.
>>
>> As resequencing projects multiply, the ability to create arbitrary
>> meta tags, attach them to different types of objects, and use those
>> tags to link them together will become desirable, if not essential.
>>
>> Keeping a common interface to all of these meta data types would be
>> advantageous, plus new users won't have to determine whether they
>> need to use Bio::Meta objects or Bio::Annotation objects.
>>
>> So I would argue for all of the meta data types to live "under one
>> roof". Which roof isn't as important. Bio::Annotation, since it
>> already exists for today's meta data, seems like a reasonable choice.
>> (assuming Annotation objects are flexible enough to be extended as
>> you propose)
>>
>> There, and no flames or jibes even. :)
>
> I guess what I want to know is whether there should to be a
> distinction between 'normal' sequence annotation (comments,
> references, and so on) and annotation that could be best described as
> position-specific (like RNA or protein structural annotation).  The
> current meta implementation is for sequence data only; I felt it
> would be nice to have a generic implementation that would be
> applicable to any object data.

my stream-of-consciousness for right now:

I was thinking Bio::Annotation is where this should go - that system  
doesn't have anything about it that makes it explicitly sequence  
related. What we're trying to hammer out here on the Alignment side -  
which fits with your RNA example - is have features, basically  
SeqFeatures - associated with alignments so columns can be annotated  
to cover things like character sets and partitions for phylogenetic  
analyses.  As for data which annotates non-contiguous things like  
RNAstems we may have  to be more creative about that or model it with  
a splitLocation.

So currently we've added code so that an Alignment is-a  
Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this  
end, with the goal of being able to capture more of the data that can  
be represented in a NEXUS file.

It feels more like a hack than an elegant Meta-data solution, but I  
am totally sure whether the data you are thinking about doing at this  
point, perhaps I need to spend more time thinking about it.
Or are you worried about the idea of whether the semantic mapping of  
the data into features or annotations is confusing users?






More information about the Bioperl-l mailing list