[Bioperl-l] Bio::DB::Taxonomy and each_Descendent

Paul Cantalupo pcantalupo at gmail.com
Mon Sep 20 14:46:32 UTC 2010


Jelle,

Below is my subroutine that returns the lineage corresponding to a
Taxonomy id. For example, if you use 10633 as the taxid, the
subroutine will return:

Viruses
dsDNA viruses, no RNA stage
Polyomaviridae
Polyomavirus
Simian virus 40

I hope this is what you wanted. Good luck

sub taxid2lineage {
   my ($id) = @_;
   return undef unless ($id);

   my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
                                          -db    => 'taxonomy',
                                          -email => 'pcantalupo at gmail.com',
                                          -id    => [ $id ],
                                          );

   my $res = $factory->get_Response->content;
   my $data = XMLin($res);

   if (!ref($data)) {
      # this happens when the Taxid is not found in the Taxonomy DB
      return $data;
   }

   my @lineage = ();

   foreach my $taxa (@{ $data->{Taxon}->{LineageEx}->{Taxon} } ) {
      # taxa is a hash with three keys ScientificName, TaxId, and Rank
      # I'm only saving the ScientificName but possible extensions to this
      # subroutine would be to return the TaxId and Rank as well.
      push (@lineage, $taxa->{ScientificName});
   }

   # add the Species to the end of the Lineage array.
   push (@lineage, $data->{Taxon}->{ScientificName});

   return wantarray ? return @lineage : join("; ", @lineage);
}

Paul Cantalupo
University of Pittsburgh


On Mon, Sep 20, 2010 at 4:04 AM, Jelle Scholtalbers
<j.scholtalbers at gmail.com> wrote:
>
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working.  I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> wrote:
>
> > Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> > Eric
> >
> > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu>
> > wrote:
> > > Sounds like this is going through an initial indexing step (for
> > flatfiles).  I would expect the initial indexing of the tables to take time
> > as you have to create the DB, but subsequent lookups post-indexing should be
> > much faster if the index is already present.  Maybe Jason could answer in
> > more detail?
> > >
> > > chris
> > >
> > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
> > >
> > >> Hello,
> > >>
> > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> > >> 5.8.5 with BioPerl 1.6.0
> > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
> > >>
> > >> It ran for 100 cpu seconds and output:
> > >>
> > >> 33090 Viridiplantae kingdom
> > >>
> > >> I was expecting it to also output the descendents. Some questions:
> > >>
> > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> > >> implemented? It looks to be in Taxon.pm but it is not documented and
> > >> when I ran Data::Dumper on $node the value '_desc' was empty.
> > >>
> > >> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> > >> with a call to 'entrez' it took only 0.02 cpu seconds to come
> > >> up with the same result.
> > >>
> > >> thanks,
> > >> Eric
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list