[Biojava-dev] Annotation conversions

Keith James kdj at sanger.ac.uk
Tue Dec 16 07:10:36 EST 2003


>>>>> "Len" == Len Trigg <len at reeltwo.com> writes:

    Len> Hi folks,

    Len> I'd like to add support for BioSQL's comment table to our
    Len> binding, and am wondering about the best way to do it. It is
    Len> obvious that the comments should just be annotations, my
    Len> question is more about what key I should associate the
    Len> comments with (and should look for when storing sequences in
    Len> the database).

    Len> It seems that I could use either "CC" or "COMMENT" and the
    Len> right thing would happen most of the time. However, it seems
    Len> a bit silly to have to check for both types when persisting
    Len> comments to the database. Should there be a "canonical" key
    Len> that is used for comments. Then different I/O formats could
    Len> just check for this one key, rather than having to do do
    Len> things like this:

    Len> GenbankFileFormer.java:322 else if (key.equals("CC") ||
    Len> key.equals("COMMENT")) { ccb = new
    Len> StringBuffer(sequenceBufferCreator("COMMENT ", value)); }

Yeah - what we have now is really ugly, but also a maintenance
headache.

    Len> The same also applies to any of the annotations that are
    Len> shared between multiple output formats...

    Len> Suggestions?

Does anyone know of practical standards for this sort of biological
metadata?  It's that semantic heterogeneity problem again. Some of
these fields are present in many formats e.g. SwissProt, PDB, BSML
(but not always meaning the same thing). I wish there were a way of
finding which fields mean the same thing in different datasets.

I've been googling and looking on pubmed, but haven't seen anything
immediately helpful (i.e. simple, practical and applicable). I don't
think we should over-engineer. "Canonical" keys would be one way. I
would favour typed enums rather than, say, ints or strings, possibly
with a suitable toString() for UI presentation. There should also be a
method to get a definition (or a ResouceBundle of definitions) so we
don't slip right back into the problem of semantics.

It would be great if the sets of "keys" and the annotation builder
itself were plugins. Then we could switch from flat lists to
ontologies without disturbing things too much. Ontologies would be
great, but pragmatically not a runner just yet.

My 2c.

Keith

-- 

- Keith James <kdj at sanger.ac.uk> Microarray Facility, Team 65 -
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK -


More information about the biojava-dev mailing list