[Bioperl-l] Taxonomy hierarchy extraction

George Heller george.heller at yahoo.com
Tue Jun 19 00:29:31 UTC 2007

But the problem is that I don't really get any output on the screen. In the /tmp directory I get 4 files namely parents, nodes, id2names and names2id, but I dont know what to make of them. This is what my script looks like,
  use strict;
#use warnings;
use DBI;
  use Bio::Tree::Node;
use Bio::DB::Taxonomy;
use Bio::DB::Taxonomy::flatfile;
  my $idx_dir = '/tmp';
my $nodefile;
my $namesfile;

  my ($nodefile,$namesfile) = ('nodes.dmp','names.dmp');
my $db = new Bio::DB::Taxonomy(-source    => 'flatfile',
                               -nodesfile => $nodefile,
                               -namesfile => $namesfile,
                               -directory => $idx_dir);
 my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
 my @extant_children = grep { $_->is_Leaf } $node->get_all_Descendents;
for my $child ( @extant_children ) {
  print "id is ", $child->id, "\n"; # NCBI taxa id
  print "rank is ", $child->rank, "\n"; # e.g. species
  print "scientific name is ", $child->scientific_name, "\n"; #
scientific name

Jason Stajich <jason at bioperl.org> wrote:
    All the children are in this array.  

  You get to decide what you want to do with them. In the following example I print the id, rank, and scientific name out to the screen.  
  Because this is a taxonomy db query you are getting back Bio::Taxonomy::Taxon objects so read the documentation for this module to see what you can do with the object.
    I would also suggest spending a little time with the Getting started and HOWTO:Trees documentation on the website to get familiar with the objects and nomenclature.


  my @extant_children = grep { $_->is_Leaf } $node->get_all_Descendents;

  for my $child ( @extant_children ) {
      print "id is ", $child->id, "\n"; # NCBI taxa id
    print "rank is ", $child->rank, "\n"; # e.g. species
    print "scientific name is ", $child->scientific_name, "\n"; # scientific name

    On Jun 18, 2007, at 5:04 PM, George Heller wrote:

    Ok, I installed the latest of Scalar::Util and the script seems to be working. But I am confused where exactly I need to look for the descendent taxon ids once the script is run. I did look into the /tmp/ directory, but I couldnt understand much. 

    Sorry to be bothering, really appreaciate your patience.


  Jason Stajich <jason at bioperl.org> wrote:
    Try installing the latest Scalar::Util  
      On Jun 18, 2007, at 4:05 PM, George Heller wrote:

      This is the output of /usr/bin/perl -V


    Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
        osname=linux, osvers=2.6.9-22.18.bz155725.elsmp, archname=i386-linux-thread-multi
        uname='linux hs20-bc1-4.build.redhat.com 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686 i686 i386 gnulinux '
        config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost -Dperladmin=root at localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1 5.8.0'
        hint=recommended, useposix=true, d_sigaction=define
        usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
        useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
        use64bitint=undef use64bitall=undef uselongdouble=undef
        usemymalloc=n, bincompat5005=undef
        cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
        optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
        cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
        ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
        intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
        d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
        ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
        alignbytes=4, prototype=define
      Linker and Libraries:
        ld='gcc', ldflags =' -L/usr/local/lib'
        libpth=/usr/local/lib /lib /usr/lib
        libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
        perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
        libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
      Dynamic Linking:
        dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
        cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'


    Characteristics of this binary (from libperl):
      Built under linux
      Compiled at Jul 24 2006 18:28:10




    Hilmar Lapp <hlapp at gmx.net> wrote:
      The perl version appears to be 5.8.5 though, so something strange 
    appears to be going on too.


    George, can you please post the output of


    $ /usr/bin/perl -V




    On Jun 18, 2007, at 6:33 PM, Chris Fields wrote:


      As the error implies your local version of perl doesn't seem support
    weak references, which means it doesn't have Scalar::Utils (which was
    added to core after perl 5.6.1, I think). Try installing
    Scalar::Utils to see what happens.




    On Jun 18, 2007, at 5:18 PM, George Heller wrote:


      I tried running the below mentioned script and I seem to be getting
    the following error:


    Weak references are not implemented in the version of perl at /
    usr/lib/perl5/site_perl/5.8.5/Bio/Tree/Node.pm line 76
    BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.5/
    Bio/Tree/Node.pm line 76.
    Compilation failed in require at my.pl line 7.
    BEGIN failed--compilation aborted at my.pl line 7.


    My script looks something like,


    use strict;
    #use warnings;
    use DBI;
    use Bio::Tree::Node;
    use Bio::DB::Taxonomy;
    use Bio::DB::Taxonomy::flatfile;
    my $idx_dir = '/tmp';


    my ($nodefile,$namesfile) = ('nodes.dmp','names.dmp');
    my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
    -nodesfile => $nodesfile,
    -namesfile => $namesfile,
    -directory => $idx_dir);
    my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
    my @extant_children = grep { $_->is_Leaf } $node-


    foreach $field (@extant_children) {
    print "$field";
    print "|";
    print "\n";


    And I am running the script using the command,


    perl myscript.pl -v --names names.dmp --nodes nodes.dmp


    and I have the nodes.dmp and names.dmp files in the current






    Jason Stajich wrote:
    It is implemented in the implementing class - DB::Taxonomy is
    just the base class. For example see the flatfile implementation


    See the scripts/taxa/local_taxonomydb_query.PLS for example using
    nodes and names are from NCBI taxonomy database.




    Here is an un-debugged copy+paste for your question that *should*




    use Bio::DB::Taxonomy
    my $idx_dir = '/tmp';




    my ($nodefile,$namesfile) = ('nodes.dmp,'names.dmp');
    my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
    -nodesfile => $nodesfile,
    -namesfile => $namesfile,
    -directory => $idx_dir);
    my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
    my @extant_children = grep { $_->is_Leaf } $node-










    On Jun 18, 2007, at 10:07 AM, George Heller wrote:


    What exactly is the "node n" in the query below. When I issue
    this query, it says,




    relation "node" does not exist.




    I tried to use the get_all_Descendents method but it looks like
    in order to do a recursive call it calls the method
    each_Descendent. This method is not implemented in
    Bio::DB::Taxonomy. It just has a single line,












    Hilmar Lapp wrote:
    I'm a bit confused - it sounds like you have set up a local 
    database and loaded the NCBI taxonomy into the database. You can 
    use simple SQL to retrieve all descendants of a node in the tree
    given its NCBI taxonID such as




    SELECT tn.*, tnm.name FROM taxon tn, taxon_name tnm, node n
    n.ncbi_taxon_id = :taxonID
    AND tn.left_value > n. left_value
    AND tn.right_value < n.right_value
    AND tn.taxon_id = tnm.taxon_id
    AND tn.name_class = 'scientific_name'




    BioPerl doesn't have a Taxonomy::biosql module yet (though this
    seem like a worthwhile thing to add), so you can't use the
    Bio::DB::Taxonomy interface to do this against a BioSQL instance.




    However, BioPerl does have support for the flat-file download of 
    NCBI taxonomy database and indexes it, so you can simply use
    Taxonomy::{get_taxon,get_all_Descendants} using the flatfile
    to achieve what you wanted to do in a less than 5 lines of perl.




    Although the recursive implementation of
    () won't be lightning fast, it may still be perfectly fine for your
    application - are you sure it is not?








    On Jun 18, 2007, at 12:21 AM, George Heller wrote:




    Thanks. And how can I assign the $node here in the below code,
    that I can reference it to a particular taxon id record? I want to
    retrieve all the descendents from the taxonomy hierarchy, given a
    particular taxon id.




    I have a local db setup, in which I have uploaded data using the
    load_ncbi_taxonomy.pl script.








    Jason Stajich wrote:
    I assume you already figured out how to setup a local taxonomydb?








    You just want the extant species/leaves of the tree








    my @extant_children = grep { $_->is_Leaf } $node-












    On Jun 17, 2007, at 11:41 AM, George Heller wrote:




    Hi all,








    Can anyone point me to some example that uses the
    get_all_Descendents method from Bio::DB::Taxonomy? I am a newbie at
    this, and I am not quite sure how to implement it.
















    Sendu Bala wrote:
    George Heller wrote:
    Hi all,








    I am looking at extracting the taxonomy hierarchy for some taxon
    What I plan to do is, for a given taxon id, say 33090, I want to
    extract all taxon ids that are children of this species. I do not
    just want the immediate children, but the children's children 
    and so








    Any ideas on the way I can go about doing this?








    Well, you'll use Bio::DB::Taxonomy presumably, and
    each_Descendent in
    some kind of looping structure. Most easily a recursing sub.








    If you happen to code up something neat and efficient, why not
    share it
    with us and we could add it to the Taxonomy module(s).
























    Shape Yahoo! in your own image. Join our Network Research Panel
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org








    Jason Stajich
    jason at bioperl.org




























    Need a vacation? Get great deals to amazing places on Yahoo! 
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org




    : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
































    Take the Internet to Go: Yahoo!Go puts the Internet in your
    pocket: mail, news, photos & more.




    Jason Stajich
    jason at bioperl.org














    Bored stiff? Loosen up...
    Download and play hundreds of games for free on Yahoo! Games.
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org


    Christopher Fields
    Postdoctoral Researcher
    Lab of Dr. Robert Switzer
    Dept of Biochemistry
    University of Illinois Urbana-Champaign






    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org


    : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
















    Expecting? Get great news right away with email Auto-Check.
    Try the Yahoo! Mail Beta.
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org


    Jason Stajich
    jason at bioperl.org







  Building a website is a piece of cake. 
  Yahoo! Small Business gives you all the tools to get online.

  Jason Stajich
  jason at bioperl.org

Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel and lay it on us.

More information about the Bioperl-l mailing list