[Bioperl-l] how to retrieve organism name from accession number?

Mark A. Jensen maj at fortinbras.us
Fri Dec 25 22:52:10 EST 2009


Bhakti,
The following example (using EUtilities) may serve your purpose:

use Bio::DB::EUtilities;

my (%taxa, @taxa);
my (%names, %idmap);

# these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide',
# (probably)

my @ids = qw(1621261 89318838 68536103 20807972 730439);

my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
                                       -db => 'taxonomy',
                                       -dbfrom => 'protein',
                                       -correspondence => 1,
                                       -id => \@ids);

# iterate through the LinkSet objects
while (my $ds = $factory->next_LinkSet) {
    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
}

@taxa = @taxa{@ids};

$factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
        -db    => 'taxonomy',
        -id    => \@taxa );

while (local $_ = $factory->next_DocSum) {
    $names{($_->get_contents_by_name('TaxId'))[0]} = 
($_->get_contents_by_name('ScientificName'))[0];
}

foreach (@ids) {
    $idmap{$_} = $names{$taxa{$_}};
}

# %idmap is
#    1621261 => 'Mycobacterium tuberculosis H37Rv'
#    20807972 => 'Thermoanaerobacter tengcongensis MB4'
#    68536103 => 'Corynebacterium jeikeium K411'
#    730439 => 'Bacillus caldolyticus'
#    89318838 => undef    (this record has been removed from the db)

1;

You probably will need to break up your 30000 into chunks
(say, 1000-3000 each), and do the above on each chunk with a

sleep 3;

or so separating the queries.
MAJ
----- Original Message ----- 
From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, December 25, 2009 9:46 PM
Subject: [Bioperl-l] how to retrieve organism name from accession number?


> Hi,
>
> Does anyone know how to retrieve the "Source" or the "Species name" given
> the accession number using Bioperl.   I have these 30,000 accession numbers
> for which I need to get the source organisms.  Any kind of help will be
> appreciated.
>
> Thanks
>
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 



More information about the Bioperl-l mailing list