[Bioperl-l] how to retrieve organism name from accession number?
Smithies, Russell
Russell.Smithies at agresearch.co.nz
Sun Jan 10 20:34:39 UTC 2010
An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups.
In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms)
If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this:
my $taxid = $gi_taxid_nucl{$accession};
my $org_name = $names{$taxid};
--Russell
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> Sent: Saturday, 26 December 2009 4:52 p.m.
> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
>
> Bhakti,
> The following example (using EUtilities) may serve your purpose:
>
> use Bio::DB::EUtilities;
>
> my (%taxa, @taxa);
> my (%names, %idmap);
>
> # these are protein ids; nuc ids will work by changing -dbfrom =>
> 'nucleotide',
> # (probably)
>
> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>
> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> -db => 'taxonomy',
> -dbfrom => 'protein',
> -correspondence => 1,
> -id => \@ids);
>
> # iterate through the LinkSet objects
> while (my $ds = $factory->next_LinkSet) {
> $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> }
>
> @taxa = @taxa{@ids};
>
> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> -db => 'taxonomy',
> -id => \@taxa );
>
> while (local $_ = $factory->next_DocSum) {
> $names{($_->get_contents_by_name('TaxId'))[0]} =
> ($_->get_contents_by_name('ScientificName'))[0];
> }
>
> foreach (@ids) {
> $idmap{$_} = $names{$taxa{$_}};
> }
>
> # %idmap is
> # 1621261 => 'Mycobacterium tuberculosis H37Rv'
> # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
> # 68536103 => 'Corynebacterium jeikeium K411'
> # 730439 => 'Bacillus caldolyticus'
> # 89318838 => undef (this record has been removed from the db)
>
> 1;
>
> You probably will need to break up your 30000 into chunks
> (say, 1000-3000 each), and do the above on each chunk with a
>
> sleep 3;
>
> or so separating the queries.
> MAJ
> ----- Original Message -----
> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, December 25, 2009 9:46 PM
> Subject: [Bioperl-l] how to retrieve organism name from accession number?
>
>
> > Hi,
> >
> > Does anyone know how to retrieve the "Source" or the "Species name"
> given
> > the accession number using Bioperl. I have these 30,000 accession
> numbers
> > for which I need to get the source organisms. Any kind of help will be
> > appreciated.
> >
> > Thanks
> >
> > BD
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
More information about the Bioperl-l
mailing list