[Bioperl-l] OntologyTermI

Hilmar Lapp hlapp@gnf.org
Wed, 28 Aug 2002 09:53:55 -0700


(Sorry if you get this twice. Somewhere in the chain of smtp exchanges, this disappeared.)

Hi all,

we're going to need an Ontology interface and parsers for different 
formats pretty soon as we want to bring GO and other ontologies into 
Biosql. Ewan even put Ontology support on the road map for 1.2, so 
it may the right time to join forces here.

Our preliminary picture here so far is that we are going to need a 
basic interface describing an ontology entry conceptually, which is 
then realized by different implementations. To give it a name, say 
Bio::OntologyTermI, with implementations living in Bio::Ontology:

	Bio::Ontology::OntologyTerm  # base implementation,
                                  # is-a Bio::OntologyTermI
	Bio::Ontology::GOTerm        # is-a Bio::Ontology::OntologyTerm
     ... etc

We are looking at InterPro as in fact being another ontology, so in 
this scheme there would also be

	Bio::Ontology::InterPro

Now this sketchy picture doesn't pay a lot of attention to 
ontologies being graphs, and looks at them from the use-case point 
of view rather than the computer science abstraction view point.

The GO perl API in GO::Model::* in contrast lays out and implements 
the graph model. (Cool!)

Does the simple sketch above make any sense? Is it going to be 
useful and appropriate? Would copying all methods from 
GO::Model::Term into OntologyTermI provide for a good start?

To me it seems porting over GO::Model to Bioperl should be a pretty 
straight-forward process. Or should we prefer not to port it over 
but instead keep an external dependency to the GO perl API?

We'll also need a streaming IO. Again, the GO parser already exists 
(for the XML version of the dump too?). Peter on our end is going to 
add one for InterPro unless someone can point us at something we can 
steal for that purpose (which would be great). The interface I'd 
suggest should resemble the other streaming interfaces in Bioperl, 
e.g.

	package Bio::OntologyIO.pm

	# returns a Bio::OntologyTermI object
	sub next_term {
	}

	# serializes one or more Bio::OntologyI objects
	sub write_term {
	}

and drivers in Bio::OntologyIO::*.

Again, does this make any sense? I'm unsure how compatible the input 
being a graph is with a streaming next_XXX() kind of thing. I'm also 
wondering how a streaming interface can be plugged into the current 
GO::Parser/GO::Builder framework, without reading the entire file 
up-front. Would flatfile parsing need the XS extension in C as 
stated in GO::AppHandle? Any advice from the experts much 
appreciated ...

Ideally the groundwork for this can be steered by someone else than 
us, as we are clearly only beginners in this field. Chris? We'll 
just need something working pretty soon ...

	-hilmar
--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------