[Bioperl-l] Question about embl format

Lincoln Stein lstein at cshl.org
Fri Apr 18 17:07:25 EDT 2003


The SO (sequence ontology) terms tend to be very long, although the most 
common ones have short synonyms that often (but not always) match the 
GenBank/EMBL feature table tags.  What I *could* do is to replace the SO type 
tags with their accession numbers (SO:XXXXXX) and place the full name in a 
qualifiers /note as you suggest.

This will make a deep change in the API where the primary_tag could be an 
ontology term object rather than a string.  The best way to ensure backward 
compatibility with other people's codes would be to override the string 
method in the ontology term object in order to produce the term label.

Or we could reserve this type of change to bioperl 2.

Lincoln

On Friday 18 April 2003 03:54 am, Ewan Birney wrote:
> On Thu, 17 Apr 2003, Lincoln Stein wrote:
> > OK, so what to do about primary_tags that are >= 15 letters, since
> > BioPerl doesn't enforce a size limit on primary_tags?  If I implement
> > truncation at the write_seq level, then we'll lose round-tripping.
>
> What about coming up with a shorter tag in the database? Or is that a bad
> idea.
>
> Don't really know what to do. We could have a convention about truncated
> the key and then have a
>
>   /note="key=very_long_key_string"
>
> in the qualifiers

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein at cshl.org			                  Cold Spring Harbor, NY
========================================================================




More information about the Bioperl-l mailing list