[Bioperl-l] Parsing entrezgene with Bio::SeqIO

Liisa Koski koski at cenix-bioscience.com
Thu Mar 16 15:14:24 UTC 2006

I'm using Bio::SeqIO to parse the EntrezGene file Homo_sapiens (from 

I'm using bioperl-1.5.1.

I want to extract the KEGG annotations.
See code below.

use Bio::SeqIO;
use Bio::ASN1::EntrezGene;

my $seqio = Bio::SeqIO->new(-format => 'entrezgene',
                                             -file => 'Homo_sapiens');
while (my $gene = $seqio->next_seq){
    print "\n",$gene->id, "\t", $gene->accession_number, "\n";
    my $ann = $gene->annotation();
    foreach my $key ( $ann->get_all_annotation_keys() ) {
        my @values = $ann->get_Annotations($key);
        foreach my $value ( @values ) {
            print $key, "\t", "=", "\t", $value->as_text,"\n";

Unfortunately the only KEGG annotation I see in the results looks like:
dblink  =       Direct database link to  in database KEGG 
(Notice the space between 'to  in')

Anyone have any ideas how to get the KEGG annotation results?

Note: I also tried parsing the file 
but I got the below error:

./entrez_gene_seqio.pl Homo_sapiens.ags
Data Error: none conforming data found on line 1 in Homo_sapiens.ags!
first 20 (or till end of input) characters including the non-conforming data:
 at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm 
line 138


More information about the Bioperl-l mailing list