[BioRuby] Re: KEGG track in gbrowse

Tue Jan 25 08:36:41 EST 2005

Hi Venky,

In KEGG DAS (das.hgc.jp), I'm using following conf for GBrowse

--------------------------------------------------
link         = sub {
                 my $feature = shift;
                 my $name = $feature->display_name;
                 my $gene = $feature->attributes("Gene");
                 my $cgi = "http://www.genome.jp/dbget-bin/show_pathway";
                 my $url = "$cgi?$name+$gene";
                 return $url;
         }
--------------------------------------------------

and GFF lines correspond to this feature are something like
(this sample is taken from yeast's)

--------------------------------------------------
I       KEGG    pathway 31568   32941   .       +       .       path 
"sce00251" ; Gene "YAL062W"
I       KEGG    pathway 42881   45022   .       -       .       path 
"sce00010" ; Gene "YAL054C"
I       KEGG    pathway 42881   45022   .       -       .       path 
"sce00620" ; Gene "YAL054C"
   :
--------------------------------------------------

Unfortunately, KEGG DAS doesn't support human because
the KEGG GENES for human doesn't contain a gene
coordination on the chromosome for now.

In your case, you just need gene_id to pathway_id mappings.
That can be obtained from raw KEGG GENES entries (strategy 1)
or using KEGG API (strategy 2).

[Strategy 1]

KEGG GENES flat file for human is available at

   ftp://ftp.genome.jp/pub/kegg/genomes/genes/H.sapiens.ent

and the entry looks like

--------------------------------------------------
ENTRY       2                 CDS       H.sapiens
NAME        A2M
DEFINITION  alpha-2-macroglobulin
ORTHOLOG    KO: K03910  alpha-2-macroglobulin
CLASS       Environmental Information Processing; Immune System; 
Complement and
             coagulation cascades [PATH:hsa04610]
             Human Diseases; Neurodegenerative Disorders; Alzheimer's 
disease
             [PATH:hsa05010]
POSITION    12p13.3-p12.3
DBLINKS     LocusLink: 2
             GDB: 119639
             OMIM: 103950
   :
--------------------------------------------------

You can extract entry_id from the ENTRY field (consistent
with the LocusLink ID for human) and a list of pathway_ids
from CLASS field.

With BioRuby, you can do it by the following code.

gene2path.rb:
--------------------------------------------------
#!/usr/bin/env ruby

require 'bio'

Bio::FlatFile.auto(ARGF) do |flatfile|
   flatfile.each do |entry|
      pathways = entry.pathways
      pathways.each do |pathway_id|
        puts "#{entry.entry_id}\t#{pathway_id}"
      end
   end
end
--------------------------------------------------

You can run this script as

--------------------------------------------------
% ruby gene2path.rb H.sapiens.ent
2       hsa04610
2       hsa05010
13      hsa00623
13      hsa00650
13      hsa00960
15      hsa00380
   :
--------------------------------------------------

then integrate with your GFF.

[Strategy 2]

You can obtain genes on KEGG PATHWAY using KEGG API,
which is a SOAP/WSDL based web service.

Following code will do the job.

human_genes_on_pathways.rb
--------------------------------------------------
#!/usr/bin/env ruby

require 'bio'

serv = Bio::KEGG::API.new

# obtain a list of pathways for human
list = serv.list_pathways("hsa")

list.each do |pathway|
   pathway_id = pathway.entry_id

   # display current status on standard error
   STDERR.puts "Now processing... #{pathway_id} : #{pathway.definition}"

   # obtain a list of genes_ids on the pathway_id
   genes = serv.get_genes_by_pathway(pathway_id)

   genes.each do |gene|
     puts "#{gene}\t#{pathway_id}"
   end
end
--------------------------------------------------

Run by

--------------------------------------------------
% ruby human_genes_on_pathways.rb > result.txt
Now processing... path:hsa00010 : Glycolysis / Gluconeogenesis - Homo 
sapiens
Now processing... path:hsa00020 : Citrate cycle (TCA cycle) - Homo 
sapiens
Now processing... path:hsa00030 : Pentose phosphate pathway - Homo 
sapiens
   :
--------------------------------------------------

Contents of result.txt will be

--------------------------------------------------
hsa:10327       path:hsa00010
hsa:124         path:hsa00010
hsa:125         path:hsa00010
hsa:126         path:hsa00010
   :
--------------------------------------------------

Hope this helps.

Regards,
Toshiaki Katayama
--
Human Genome Center, Institute of Medical Science, University of Tokyo
4-6-1 Shirokanedai, Minato-ku, Tokyo 108-0071, Japan
tel://+81-3-5449-5614, fax://+81-3-5449-5434
BioRuby project     http://bioruby.org/~k/
GenomeNet/KEGG      http://www.genome.jp/
Human Genome Center http://www.hgc.jp/

On 2005/01/24, at 22:36, B R Venkatesh wrote:

> Hello Folks,
>
>   I am using *gbrowse* from GMOD to view human gene
> info but I need to connect genes
> to their pathways like KEGG.Is there a plugin or some
> sort to achive this??
>
> Apparently somebody has added pathway as TRACK in
> grbowse:
> http://das.hgc.jp/cgi-bin/gbrowse/cpv
>
>
> Hope to hear from you.
>
>
> Thanks in advance.
> Venky.
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com