[Bioperl-l] mapping UniGene IDs to GO ids

Paul Boutros pcboutro@engmail.uwaterloo.ca
Fri, 27 Dec 2002 22:42:20 -0500 (EST)


Hi,

I don't think the UniGene data files (i.e. Hs.data) include the GO
information in them.  In fact, I would be a little surprised if they
reassociated GO annotation with UniGene clusters for each new build.  I
think your best bet is to:

1. Use ClusterIO to parse out the LocusLink corresponding to each
cluster from the Xx.data file
2. Use the LocusLink parser (SeqIO::LocusLink) to parse out the GO
annotation from that file
3. Associate the two data sets, probably in a database (i.e. one table
for UniGene clusters, another for LocusLinks and join on the LocusLink ID)

If you want to be able to traverse the GO graphs, you might have to
download separately the GO ontology files and parse them.

Finally, there are also GO associations at www.geneontology.org for things
ilke MGD.  The MGD numbers are present in the UniGene data files, so that
is an alternate way of getting the information.  I think the two
annotation projects (MGD and NIH) are separate, so that you would get more
(better?) information by using both sources and merging the results.

Hope this helps,
Paul

> Hi,
>
> I would like to locally implement something along the lines of Fatigo:
>   http://bioinfo.cnio.es/cgi-bin/tools/FatiGO/FatiGO.cgi
> locally using Perl snippets.
> 
> Can someone point me to a code snippet to map a UniGene ID to the
> corresponding
> Gene Ontology IDs?  Using both BioPerl and the GO Perl API is fair game
> for my
> purposes.
> 
> TIA,
> 
> Jonathan
>