[Bioperl-l] retrieve refseq ids from UIDs

Tue Jun 28 01:46:26 UTC 2011

Hi

I've been having some trouble with bioperl and I was hoping that
someone could help me. I'm trying to obtain the transcript and protein
id of RefSeq given a gene UID. For example, for the gene with UID 9555
http://www.ncbi.nlm.nih.gov/gene/9555 I'd like to get the transcripts
and proteins ids as in the section (NCBI Reference Sequence (RefSeq))
of that page, the ones like:

NM_001040158.1 → NP_001035248.1  core histone macro-H2A.1 isoform 2
NM_004893.2 → NP_004884.1  core histone macro-H2A.1 isoform 2

I believe to be fairly good with perl, just not with bioperl yet.
Currently, what I'm doing is after getting the UIDs, using EUtilities
esummary to get the genomic coordinates, then use efecth to extract
that sequence and parse it to obtain the protein_id and transcript_id.
Until now it's ok because I actually wanted all those things but now
I'd like to skip the first parts and get only the transcripts or
proteins. I'm sure there must be a "smarter" way to do this.

I've tried to use efetch, suply the UID as id and use
get_Response->content which gives me a some kind of structure with the
info that I want but how do access it properly? Like this

my @ids = qw(9555);
my $factory = Bio::DB::EUtilities->new(-eutil   => 'efetch',
                                       -db      => 'gene',
                                       -id      => \@ids,
                                       );
say $factory->get_Response->content;

Also, when using the einfo script that comes with bioperl, to get the
info I should be able to get when searching the database gene (running
as einfo -d=gene), it says that I should be able to get it. At least
one of the fields is the following:

Field Code          :ACCN
Field Name          :Nucleotide/Protein Accession
Description         :Nucleotide or protein accession(s) associated
with this gene
Term Count          :49104652
Attributes          :is_singletoken

How do I get to this field?

When using esummary to get a docum and then use the to_string method,
I still can't see anything useful.

  my $summaries = Bio::DB::EUtilities->new(
                                            -eutil  => 'esummary',
                                            -db     => 'gene',
                                            -id     => \@ids,
                                          );

  while (my $docsum = $summaries->next_DocSum) {
    say $docsum->to_string();
  }

Any help would be very appreciated. Thanks in advance,
Carnë Draug