[Bioperl-l] Question about embl format

Ewan Birney birney at ebi.ac.uk
Sat Apr 19 19:56:19 EDT 2003



On Fri, 18 Apr 2003, Lincoln Stein wrote:

> The SO (sequence ontology) terms tend to be very long, although the most
> common ones have short synonyms that often (but not always) match the
> GenBank/EMBL feature table tags.  What I *could* do is to replace the SO type
> tags with their accession numbers (SO:XXXXXX) and place the full name in a
> qualifiers /note as you suggest.

I suspect we should be very smart in the system with logic as follows:


  - if there is a shortname, use that

  - if not, use the SO-identifier, potential with this * prefix which in
the docs indicate how to put in user defined case

  - alternatively, we walk back up the SO tree untill we hit a shortname
(? EMBL ok shortname) which we can use

  - in all cases we put a

   /note="SO-term=SOxxxxxxx"
   /note="SO-descriptions=long description of SO term"


Though perhaps the description is too much of a denormalisation.


  - when we re-read EMBL/GenBank if we spot a /note="SO-term=SOxxxxxx"
that overrides all other magic for FT key--->SO mapping


>
> This will make a deep change in the API where the primary_tag could be an
> ontology term object rather than a string.  The best way to ensure backward
> compatibility with other people's codes would be to override the string
> method in the ontology term object in order to produce the term label.
>
> Or we could reserve this type of change to bioperl 2.
>

I think we should look at doing this now in 1.3 - the more magic we can
build in to make a SO a reality to use the more SO will become a
reality...



I might have a look at this. Do we have a SO-lite checked into
bioperl-live to work with?





More information about the Bioperl-l mailing list