[Bioperl-l] load_ontology and GO - progress!

Dave Howorth dhoworth at mrc-lmb.cam.ac.uk
Fri Apr 23 09:30:45 EDT 2004


Sean Davis wrote:
 > I think you can load with --noobsolete (see perldoc for
 > load_ontology.pl). You may also want to use --safe so that if
 > there does happen to be a term already loaded, the entire load
 > does not fail (again, see perldoc).

Thanks Sean, that has been a successful workaround that has let me load 
the database.

Hilmar Lapp wrote:
> Did you read the load_ontology.pl POD, in particular the documentation  
> for the options that deal with obsolete terms?

Hi Hilmar,

Yes I had read the POD. Specifically, I used the options shown in the 
synopsis of that document "for loading the Gene Ontology". I had 
expected that to be a working example. If different options are needed, 
I would have expected them to be used, or at least mentioned, in the 
synopsis. Neither does the description of the --noobsolete option 
indicate that it is necessary to use it when loading GO, as opposed to 
something I might consider for reasons that aren't explained.

Remember, this is the first time I have used this database and loader 
and I am specifically using them now with the aim of learning what the 
issues are and how best to deal with them. Unless the documentation 
describes an issue, I'm not going to be aware of it until I trip over it :(

> Obsolete terms is not a trivial thing to deal with, and in the end you  
> need to make some decisions for yourself. load_ontology.pl offers  
> several choices but it's up to you what works best for you. There have  
> been prior threads on this; e.g. reading  
> http://bioperl.org/pipermail/bioperl-l/2004-February/014846.html may  
> give you some additional information.

I agree terms that become obsolete are complex to deal with.

I think there are two different issues associated with the two examples 
I gave (elastin and collagen).

Collagen appears to be an example of the case you discuss in the thread 
to which you refer.  Incidentally, adding the obsolete flag to the key 
will only ensure uniqueness through one obsolence event. If the events 
were to be repeated, it would fail. Hopefully that is an unlikely 
scenario :)  But if I wanted to handle that case, I would probably look 
to adding some form of versioning, rather than a boolean flag.

But the issue with elastin is not about terms that become obsolete, it's 
about two terms with the same name. The obsolesence appears to me to be 
incidental.  The GO files use two different terms (different GO IDs, 
different ontologies) with the same name. They happen to be obsolete but 
it looks like they both existed at the same time (the GO IDs differ only 
by 1 and the terms are used in separate ontology files), not that a term 
was made obsolete and another independent term was later created that 
happened to use the same name.

If that secenario ever occurs again, it will break the schema.  I 
surmise from the low numbered GO IDs that perhaps this was something 
that happened in the history of GO that will not be permitted to happen 
again?  But you appeared to think it is possible when you said in Feb:
  "This is not atypical for annotation being a work in progress" 
<http://bioperl.org/pipermail/bioperl-l/2004-February/014908.html>. 
Hence my interest in whether such an event would constitute a data error 
to report to GO or a schema error in biosql and my consequent curiosity 
about the exact rules for GO.

Cheers, Dave
-- 
Dave Howorth
MRC Centre for Protein Engineering
Hills Road, Cambridge, CB2 2QH
01223 252960



More information about the Bioperl-l mailing list