[Bioperl-l] Taxonomy hierarchy extraction

George Heller george.heller at yahoo.com
Tue Jun 19 00:29:31 UTC 2007


But the problem is that I don't really get any output on the screen. In the /tmp directory I get 4 files namely parents, nodes, id2names and names2id, but I dont know what to make of them. This is what my script looks like,
   
  #!/usr/bin/perl
  use strict;
#use warnings;
use DBI;
  use Bio::Tree::Node;
use Bio::DB::Taxonomy;
use Bio::DB::Taxonomy::flatfile;
  my $idx_dir = '/tmp';
my $nodefile;
my $namesfile;

  my ($nodefile,$namesfile) = ('nodes.dmp','names.dmp');
my $db = new Bio::DB::Taxonomy(-source    => 'flatfile',
                               -nodesfile => $nodefile,
                               -namesfile => $namesfile,
                               -directory => $idx_dir);
 my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
 my @extant_children = grep { $_->is_Leaf } $node->get_all_Descendents;
  
for my $child ( @extant_children ) {
  print "id is ", $child->id, "\n"; # NCBI taxa id
  print "rank is ", $child->rank, "\n"; # e.g. species
  print "scientific name is ", $child->scientific_name, "\n"; #
scientific name
}

Thanks.
  George
  
Jason Stajich <jason at bioperl.org> wrote:
    All the children are in this array.  
  

  You get to decide what you want to do with them. In the following example I print the id, rank, and scientific name out to the screen.  
  Because this is a taxonomy db query you are getting back Bio::Taxonomy::Taxon objects so read the documentation for this module to see what you can do with the object.
    I would also suggest spending a little time with the Getting started and HOWTO:Trees documentation on the website to get familiar with the objects and nomenclature.
  

  

  my @extant_children = grep { $_->is_Leaf } $node->get_all_Descendents;
  

  for my $child ( @extant_children ) {
      print "id is ", $child->id, "\n"; # NCBI taxa id
    print "rank is ", $child->rank, "\n"; # e.g. species
    print "scientific name is ", $child->scientific_name, "\n"; # scientific name
  }


    On Jun 18, 2007, at 5:04 PM, George Heller wrote:

    Ok, I installed the latest of Scalar::Util and the script seems to be working. But I am confused where exactly I need to look for the descendent taxon ids once the script is run. I did look into the /tmp/ directory, but I couldnt understand much. 
  

    Sorry to be bothering, really appreaciate your patience.
  

    Thanks.
    George
  

  Jason Stajich <jason at bioperl.org> wrote:
    Try installing the latest Scalar::Util  
      On Jun 18, 2007, at 4:05 PM, George Heller wrote:
  

      This is the output of /usr/bin/perl -V
  

  

    Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
      Platform:
        osname=linux, osvers=2.6.9-22.18.bz155725.elsmp, archname=i386-linux-thread-multi
        uname='linux hs20-bc1-4.build.redhat.com 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686 i686 i386 gnulinux '
        config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost -Dperladmin=root at localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1 5.8.0'
        hint=recommended, useposix=true, d_sigaction=define
        usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
        useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
        use64bitint=undef use64bitall=undef uselongdouble=undef
        usemymalloc=n, bincompat5005=undef
      Compiler:
        cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
        optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
        cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
        ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
        intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
        d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
        ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
        alignbytes=4, prototype=define
      Linker and Libraries:
        ld='gcc', ldflags =' -L/usr/local/lib'
        libpth=/usr/local/lib /lib /usr/lib
        libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
        perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
        libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
        gnulibc_version='2.3.4'
      Dynamic Linking:
        dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
        cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
  

  

    Characteristics of this binary (from libperl):
      Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
      Built under linux
      Compiled at Jul 24 2006 18:28:10
      @INC:
        /usr/lib/perl5/5.8.5/i386-linux-thread-multi
        /usr/lib/perl5/5.8.5
        /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
        /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
        /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
        /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
        /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
        /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
        /usr/lib/perl5/site_perl/5.8.5
        /usr/lib/perl5/site_perl/5.8.4
        /usr/lib/perl5/site_perl/5.8.3
        /usr/lib/perl5/site_perl/5.8.2
        /usr/lib/perl5/site_perl/5.8.1
        /usr/lib/perl5/site_perl/5.8.0
        /usr/lib/perl5/site_perl
        /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
        /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
        /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
        /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
        /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
        /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
        /usr/lib/perl5/vendor_perl/5.8.5
        /usr/lib/perl5/vendor_perl/5.8.4
        /usr/lib/perl5/vendor_perl/5.8.3
        /usr/lib/perl5/vendor_perl/5.8.2
        /usr/lib/perl5/vendor_perl/5.8.1
        /usr/lib/perl5/vendor_perl/5.8.0
        /usr/lib/perl5/vendor_perl
  

  

      Thanks.
      George
        .
  

  

    Hilmar Lapp <hlapp at gmx.net> wrote:
      The perl version appears to be 5.8.5 though, so something strange 
    appears to be going on too.
  

  

    George, can you please post the output of
  

  

    $ /usr/bin/perl -V
  

  

    -hilmar
  

  

    On Jun 18, 2007, at 6:33 PM, Chris Fields wrote:
  

  

      As the error implies your local version of perl doesn't seem support
    weak references, which means it doesn't have Scalar::Utils (which was
    added to core after perl 5.6.1, I think). Try installing
    Scalar::Utils to see what happens.
  

  

    chris
  

  

    On Jun 18, 2007, at 5:18 PM, George Heller wrote:
  

  

      I tried running the below mentioned script and I seem to be getting
    the following error:
  

  

    Weak references are not implemented in the version of perl at /
    usr/lib/perl5/site_perl/5.8.5/Bio/Tree/Node.pm line 76
    BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.5/
    Bio/Tree/Node.pm line 76.
    Compilation failed in require at my.pl line 7.
    BEGIN failed--compilation aborted at my.pl line 7.
  

  

    My script looks something like,
  

  

    #!/usr/bin/perl
    use strict;
    #use warnings;
    use DBI;
    use Bio::Tree::Node;
    use Bio::DB::Taxonomy;
    use Bio::DB::Taxonomy::flatfile;
    my $idx_dir = '/tmp';
  

  

    my ($nodefile,$namesfile) = ('nodes.dmp','names.dmp');
    my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
    -nodesfile => $nodesfile,
    -namesfile => $namesfile,
    -directory => $idx_dir);
    my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
    my @extant_children = grep { $_->is_Leaf } $node-
      get_all_Descendents;
  

  

    foreach $field (@extant_children) {
    print "$field";
    print "|";
    print "\n";
    }
  

  

    And I am running the script using the command,
  

  

    perl myscript.pl -v --names names.dmp --nodes nodes.dmp
  

  

    and I have the nodes.dmp and names.dmp files in the current
    directory.
  

  

    Thanks,
    George
  

  

  

  

    Jason Stajich wrote:
    It is implemented in the implementing class - DB::Taxonomy is
    just the base class. For example see the flatfile implementation
    Bio::DB::Taxonomy::flatfile
  

  

    See the scripts/taxa/local_taxonomydb_query.PLS for example using
    it:
    nodes and names are from NCBI taxonomy database.
  

  

  

  

    Here is an un-debugged copy+paste for your question that *should*
    work.
  

  

  

  

    use Bio::DB::Taxonomy
    my $idx_dir = '/tmp';
  

  

  

  

    my ($nodefile,$namesfile) = ('nodes.dmp,'names.dmp');
    my $db = new Bio::DB::Taxonomy(-source => 'flatfile',
    -nodesfile => $nodesfile,
    -namesfile => $namesfile,
    -directory => $idx_dir);
    my $node = $db->get_Taxonomy_Node(-taxonid => '33090');
    my @extant_children = grep { $_->is_Leaf } $node-
      get_all_Descendents;
  

  

  

  

  

  

  

  

    -jason
  

  

    On Jun 18, 2007, at 10:07 AM, George Heller wrote:
  

  

    What exactly is the "node n" in the query below. When I issue
    this query, it says,
  

  

  

  

    relation "node" does not exist.
  

  

  

  

    I tried to use the get_all_Descendents method but it looks like
    in order to do a recursive call it calls the method
    each_Descendent. This method is not implemented in
    Bio::DB::Taxonomy. It just has a single line,
  

  

  

  

    shift->throw_not_implemented();
  

  

  

  

    Thanks.
    George.
  

  

  

  

    Hilmar Lapp wrote:
    I'm a bit confused - it sounds like you have set up a local 
    BioSQL
    database and loaded the NCBI taxonomy into the database. You can 
    now
    use simple SQL to retrieve all descendants of a node in the tree
    given its NCBI taxonID such as
  

  

  

  

    SELECT tn.*, tnm.name FROM taxon tn, taxon_name tnm, node n
    WHERE
    n.ncbi_taxon_id = :taxonID
    AND tn.left_value > n. left_value
    AND tn.right_value < n.right_value
    AND tn.taxon_id = tnm.taxon_id
    AND tn.name_class = 'scientific_name'
  

  

  

  

    BioPerl doesn't have a Taxonomy::biosql module yet (though this
    would
    seem like a worthwhile thing to add), so you can't use the
    Bio::DB::Taxonomy interface to do this against a BioSQL instance.
  

  

  

  

    However, BioPerl does have support for the flat-file download of 
    the
    NCBI taxonomy database and indexes it, so you can simply use
    Taxonomy::{get_taxon,get_all_Descendants} using the flatfile
    download
    to achieve what you wanted to do in a less than 5 lines of perl.
  

  

  

  

    Although the recursive implementation of
    Taxonomy::get_all_Descendants
    () won't be lightning fast, it may still be perfectly fine for your
    application - are you sure it is not?
  

  

  

  

    -hilmar
  

  

  

  

    On Jun 18, 2007, at 12:21 AM, George Heller wrote:
  

  

  

  

    Thanks. And how can I assign the $node here in the below code,
    such
    that I can reference it to a particular taxon id record? I want to
    retrieve all the descendents from the taxonomy hierarchy, given a
    particular taxon id.
  

  

  

  

    I have a local db setup, in which I have uploaded data using the
    load_ncbi_taxonomy.pl script.
  

  

  

  

    Thanks.
    George
  

  

  

  

    Jason Stajich wrote:
    I assume you already figured out how to setup a local taxonomydb?
  

  

  

  

  

  

  

  

    You just want the extant species/leaves of the tree
  

  

  

  

  

  

  

  

    my @extant_children = grep { $_->is_Leaf } $node-
      get_all_Descedents;
  

  

  

  

  

  

  

  

  

  

  

  

    -jason
    On Jun 17, 2007, at 11:41 AM, George Heller wrote:
  

  

  

  

    Hi all,
  

  

  

  

  

  

  

  

    Can anyone point me to some example that uses the
    get_all_Descendents method from Bio::DB::Taxonomy? I am a newbie at
    this, and I am not quite sure how to implement it.
  

  

  

  

  

  

  

  

    Thanks.
    George
  

  

  

  

  

  

  

  

    Sendu Bala wrote:
    George Heller wrote:
    Hi all,
  

  

  

  

  

  

  

  

    I am looking at extracting the taxonomy hierarchy for some taxon
    ids.
    What I plan to do is, for a given taxon id, say 33090, I want to
    extract all taxon ids that are children of this species. I do not
    just want the immediate children, but the children's children 
    and so
    on.
  

  

  

  

  

  

  

  

    Any ideas on the way I can go about doing this?
  

  

  

  

  

  

  

  

    Well, you'll use Bio::DB::Taxonomy presumably, and
    each_Descendent in
    some kind of looping structure. Most easily a recursing sub.
  

  

  

  

  

  

  

  

    If you happen to code up something neat and efficient, why not
    share it
    with us and we could add it to the Taxonomy module(s).
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

    ---------------------------------
    Shape Yahoo! in your own image. Join our Network Research Panel
    today!
    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l
  

  

  

  

  

  

  

  

    --
    Jason Stajich
    jason at bioperl.org
    http://jason.open-bio.org/
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

    ---------------------------------
    Need a vacation? Get great deals to amazing places on Yahoo! 
    Travel.
    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l
  

  

  

  

    --
    ===========================================================
    : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
    ===========================================================
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

    ---------------------------------
    Take the Internet to Go: Yahoo!Go puts the Internet in your
    pocket: mail, news, photos & more.
  

  

  

  

    --
    Jason Stajich
    jason at bioperl.org
    http://jason.open-bio.org/
  

  

  

  

  

  

  

  

  

  

  

  

  

  

    ---------------------------------
    Bored stiff? Loosen up...
    Download and play hundreds of games for free on Yahoo! Games.
    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l
  

  

    Christopher Fields
    Postdoctoral Researcher
    Lab of Dr. Robert Switzer
    Dept of Biochemistry
    University of Illinois Urbana-Champaign
  

  

  

  

  

  

    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l
  

  

    -- 
    ===========================================================
    : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
    ===========================================================
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

    ---------------------------------
    Expecting? Get great news right away with email Auto-Check.
    Try the Yahoo! Mail Beta.
    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l
  

  

      --
    Jason Stajich
    jason at bioperl.org
    http://jason.open-bio.org/
  

  

  

  

  

  

  

  ---------------------------------
  Building a website is a piece of cake. 
  Yahoo! Small Business gives you all the tools to get online.


    --
  Jason Stajich
  jason at bioperl.org
  http://jason.open-bio.org/






       
---------------------------------
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel and lay it on us.



More information about the Bioperl-l mailing list