[Bioperl-l] Parsing entrezgene with Bio::SeqIO

Liisa Koski koski at cenix-bioscience.com
Fri Mar 17 09:10:57 UTC 2006


Thanks Stefan,
Unfortunately I only parse out the URL and not the primary_id or any comments.


 print "\n\nDBlinks for geneid: ",$gene->id, "\t", 
          "acc: ", $gene->accession_number,"\n";
    my @dblinks= $ann->get_Annotations('dblink');
    foreach my $dblink (@dblinks) {
	next unless ($dblink->database eq "KEGG");
	print "primary_id:", "\t",$dblink->primary_id,"\n"; 
	print "url:", "\t", $dblink->url, "\n";  
	print "as_text:", "\t", $dblink->as_text, "\n"; 
	print "optional_id:","\t",$dblink->optional_id,"\n" ;
	print "comment:", "\t", $dblink->comment, "\n" ;
	print "object_id:", "\t", $dblink->object_id, "\n"; 
	print "namespase:", "\t", $dblink->namespace, "\n" ;
	print "authority:", "\t", $dblink->authority, "\n" ;

	print "\nhash_tree\n";
	my $hash_ref = $dblink->hash_tree;
         for my $key (keys %{$hash_ref}) {
           print $key,": ",$hash_ref->{$key},"\n";
         }
    }

Output:
------------------------------------
DBlinks for geneid: ABAT        acc: 18
Use of uninitialized value in print at ./entrez_gene_seqio.pl line 42.
primary_id:
url:    http://www.genome.jp/dbget-bin/www_bget?hsa:18
Use of uninitialized value in concatenation (.) or string 
at /netshare/home/koski/perl_modules/Bio/Annotation/DBLink.pm line 146.
as_text: Direct database link to  in database KEGG
Use of uninitialized value in print at ./entrez_gene_seqio.pl line 48.
optional_id:
Use of uninitialized value in print at ./entrez_gene_seqio.pl line 50.
comment:
Use of uninitialized value in print at ./entrez_gene_seqio.pl line 52.
object_id:
namespase:       KEGG
Use of uninitialized value in print at ./entrez_gene_seqio.pl line 56.
authority:

printing hash_tree
database: KEGG
Use of uninitialized value in print at ./entrez_gene_seqio.pl line 62.
primary_id:
-----------------------------------------

I see that on the gene page for ABAT (acc: 18) there are KEGG pathways:
KEGG pathway: Alanine and aspartate metabolism 00252
KEGG pathway: Butanoate metabolism 00650
KEGG pathway: Glutamate metabolism 00251
KEGG pathway: Propanoate metabolism 00640
KEGG pathway: Valine, leucine and isoleucine degradation 00280
KEGG pathway: beta-Alanine metabolism 00410 

Is it possible to pull out these pathway names? 

Thanks,
Liisa


On Thursday 16 March 2006 17:29, Stefan Kirov wrote:
> Do this:
> my @dblinks=$ann->get_Annotations('dblink');
> foreach my $link (@dblinks) {
>     next unless ($dblink->database eq 'KEGG");
>     print $dblink->primary_id,"\t",$dblink->url,"\n";
> }
> This works for me, hopefully it will for you too. Let me know if
> something is not right.
> Stefan
>
> Liisa Koski wrote:
> >Hi,
> >I'm using Bio::SeqIO to parse the EntrezGene file Homo_sapiens (from
> >ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_OLD/Mammalia/Homo_sapiens.gz).
> >
> >I'm using bioperl-1.5.1.
> >
> >I want to extract the KEGG annotations.
> >See code below.
> >
> >use Bio::SeqIO;
> >use Bio::ASN1::EntrezGene;
> >
> >my $seqio = Bio::SeqIO->new(-format => 'entrezgene',
> >                                             -file => 'Homo_sapiens');
> >while (my $gene = $seqio->next_seq){
> >    print "\n",$gene->id, "\t", $gene->accession_number, "\n";
> >    my $ann = $gene->annotation();
> >    foreach my $key ( $ann->get_all_annotation_keys() ) {
> >        my @values = $ann->get_Annotations($key);
> >        foreach my $value ( @values ) {
> >            print $key, "\t", "=", "\t", $value->as_text,"\n";
> >        }
> >    }
> >}
> >
> >Unfortunately the only KEGG annotation I see in the results looks like:
> >dblink  =       Direct database link to  in database KEGG
> >(Notice the space between 'to  in')
> >
> >Anyone have any ideas how to get the KEGG annotation results?
> >
> >Note: I also tried parsing the file
> >ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Homo_sapiens.ags.
> >gz but I got the below error:
> >
> >./entrez_gene_seqio.pl Homo_sapiens.ags
> >Data Error: none conforming data found on line 1 in Homo_sapiens.ags!
> >first 20 (or till end of input) characters including the non-conforming
> > data: 00
> > at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm
> >line 138
> >
> >
> >Thanks,
> >Liisa
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Liisa Koski
Bioinformatics Software Engineer
Cenix BioScience GmbH
Tatzberg 47
01307 Dresden
Germany
Phone: +49(351)4173-149




More information about the Bioperl-l mailing list