[Bioperl-l] retrieve refseq ids from UIDs

Smithies, Russell Russell.Smithies at agresearch.co.nz
Tue Jun 28 03:20:48 UTC 2011


I assume you've had a look at the cookbook http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook
Also take a look at elink, it might do what you are after http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#I_want_a_list_of_database_.27x.27_UIDs_that_are_linked_from_a_list_of_database_.27y.27_UIDs
The Scrapbook is a good place to get ideas as well http://www.bioperl.org/wiki/Category:Scrapbook



--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Carnë Draug
> Sent: Tuesday, 28 June 2011 1:46 p.m.
> To: bioperl mailing list
> Subject: [Bioperl-l] retrieve refseq ids from UIDs
> 
> Hi
> 
> I've been having some trouble with bioperl and I was hoping that
> someone could help me. I'm trying to obtain the transcript and protein
> id of RefSeq given a gene UID. For example, for the gene with UID 9555
> http://www.ncbi.nlm.nih.gov/gene/9555 I'd like to get the transcripts
> and proteins ids as in the section (NCBI Reference Sequence (RefSeq))
> of that page, the ones like:
> 
> NM_001040158.1 → NP_001035248.1  core histone macro-H2A.1 isoform 2
> NM_004893.2 → NP_004884.1  core histone macro-H2A.1 isoform 2
> 
> I believe to be fairly good with perl, just not with bioperl yet.
> Currently, what I'm doing is after getting the UIDs, using EUtilities
> esummary to get the genomic coordinates, then use efecth to extract
> that sequence and parse it to obtain the protein_id and transcript_id.
> Until now it's ok because I actually wanted all those things but now
> I'd like to skip the first parts and get only the transcripts or
> proteins. I'm sure there must be a "smarter" way to do this.
> 
> I've tried to use efetch, suply the UID as id and use
> get_Response->content which gives me a some kind of structure with the
> info that I want but how do access it properly? Like this
> 
> my @ids = qw(9555);
> my $factory = Bio::DB::EUtilities->new(-eutil   => 'efetch',
>                                        -db      => 'gene',
>                                        -id      => \@ids,
>                                        );
> say $factory->get_Response->content;
> 
> Also, when using the einfo script that comes with bioperl, to get the
> info I should be able to get when searching the database gene (running
> as einfo -d=gene), it says that I should be able to get it. At least
> one of the fields is the following:
> 
> Field Code          :ACCN
> Field Name          :Nucleotide/Protein Accession
> Description         :Nucleotide or protein accession(s) associated
> with this gene
> Term Count          :49104652
> Attributes          :is_singletoken
> 
> How do I get to this field?
> 
> When using esummary to get a docum and then use the to_string method,
> I still can't see anything useful.
> 
>   my $summaries = Bio::DB::EUtilities->new(
>                                             -eutil  => 'esummary',
>                                             -db     => 'gene',
>                                             -id     => \@ids,
>                                           );
> 
>   while (my $docsum = $summaries->next_DocSum) {
>     say $docsum->to_string();
>   }
> 
> Any help would be very appreciated. Thanks in advance,
> Carnë Draug
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list