[Bioperl-l] GO categories and load_ontology.pl

Hilmar Lapp hlapp at gnf.org
Tue Apr 27 18:18:05 EDT 2004


Hi Annie,

you do need to look at the path in order to find out which terms are 
descendants of which of the three GO categories.

You can do this either

	- in bioperl by populating an ontology instance with all relationships 
from the database (see t/ontology.t in the bioperl-db test suite for an 
example) and then query the ontology object for all ancestors of your 
query term and see which category term is among them, or

	- like before, but instead of high-level querying for all ancestors 
use the graph that backs the engine and query the graph ($graph = 
$ontology->engine->graph for GO) directly using its path methods. The 
backing graph implementation is Graph::Directed; Nat Goodman said there 
are issues with some of the flow algorithms, but I don't remember the 
specifics (if you email Nat I'm sure he can tell you). Or,

	- query in SQL the term_path table, joining it to term in the same way 
as you would join term_relationship. You can automatically populate the 
transitive closure when you load or update GO by supplying --computetc 
to load_ontology.pl. See the POD for details.

Hth,

	-hilmar

On Tuesday, April 27, 2004, at 07:41  AM, Law, Annie wrote:

> Hi Hilmar and other bioperl enthusiasts,
>
> Thanks for your response.  I now have a better understanding of the 
> reason
> for the 200 entries
> Which are not in GO.  My focus now are the items that are in GO and 
> are not
> obsolete.
>
> My main concern now is given a GO id how do I find out what GO 
> category it
> belongs to.
> I want to know if it belongs to Molecular function, biological 
> process, or
> cellular component by the simplest method
> Available with bioperl.
> I don't see any obvious way of doing this since in the table called 
> term all
> of the entries in the term table are of
> Ontology Id = 1 (Gene ontology).
>
> It was suggested to me by another helpful bioperler that I could 
> somehow
> look to see if my term is a child of the
> GO id for molecular function, biological process, or cellular 
> component.
> This could be done by looking at the path
> Table if transitive closure was used.  I am not too sure about how to 
> go
> about doing this and would be open
> To other solutions.
>
> The other alternative I can think of is to load a text file from GO 
> ontology
> in my database which is the
> GO.terms_and_ids file which is provided by Gene Ontology.  This is 
> basically
> a tab delimited text file which
> includes the GO id and a letter such F, P, or C next to the ID to 
> indicate
> F = molecular function, P = biological process, C = cellular component.
> This is not really my first choice as a solution if I can access this
> information and it is somehow made available by the load_ontology.pl 
> script
> which I have already used.
>
> Thanks very much,
> Annie.
>
>
>
>
> -----------------------
> To: Law, Annie
> Cc: 'bioperl-l at bioperl.org'
> Subject: Re: [Bioperl-l] GO categories and load_ontology.pl
>
>
> Annie, I still owe you an answer for your earlier email. I haven't
> managed to get to that yet. See below for my response to this one.
>
> On Wednesday, March 17, 2004, at 08:50  AM, Law, Annie wrote:
>
>> It seems that most of the Entries in the term table are of Ontoloy Id
>> = 1
>> (Gene ontology) and only around 200 entries molecular function,
>> biological
>> process, and cellular component put together when there are about 
>> 16000
>> entries in the term table.
>> This is only true if I load locuslink into the database.
>
> This is because LocusLink lags behind the latest version of GO in terms
> of the release that they use for annotating sequences. I.e., LocusLink
> uses some terms which have meanwhile been retired or obsoleted from GO.
> Depending on whether they are still in GO's .defs file, they won't be
> in your database if you chose to ignore obsoleted entries (which is not
> a bad choice at all per se), or they aren't part of GO anymore at all.
>
> LocusLink doesn't give the ontology of GO terms (which would be 'Gene
> Ontology'); rather it gives the category. Because a term must have an
> ontology associated, the SeqIO LL parser interprets as the ontology
> what NCBI really meant to be the category.
>
> You'd have the following choices to proceed.
>
> 	- Ignore the 200 entries which aren't in Gene Ontology. You're not
> going to miss a significant amount of your annotation, and it's
> annotation with obsoleted terms anyway.
>
> 	- Load GO including obsoleted terms, and see with how many non-Gene
> Ontology terms that would leave you. If it's a lot less than 200, you
> may just want to ignore the rest.
>
> 	- Build a SeqProcessor module (see Bio::Factory::SeqProcessorI and
> Bio::Seq::BaseSeqProcessor) which takes the seq objects as the LL
> parser returns them, goes in and retrieves all GO term annotations, and
> replaces the ontology for those with 'Gene Ontology.' Then you pass
> your SeqProcessor to load_seqdatabase.pl using the --pipeline
> command-line option (see the script's POD).
>
> The last option may sound like but is really not a lot of work if you
> can program perl. Note, however, that then you still wouldn't have any
> relationships for those terms - they simply have been retired.
>
> Depending on what your project is, just ignoring those 200 may be the
> most reasonable way to go.
>
> 	-hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the Bioperl-l mailing list