[Bioperl-l] Bio::Species / Bio::Taxonomy::Node

Jason Stajich jason at cgt.duhs.duke.edu
Fri Feb 6 13:09:52 EST 2004


hmm - I was thinking that it is possible to create Taxonomy::Node which
behaves just like a Bio::Species object if we feed it all the necessary
information up front (the classification array essentially).  It is only
necessary to have a Bio::DB::Taxonomy handle if you want to do more
sophisticated things [get all the sibling nodes at this level, etc].

Basically, I would expect Taxonomy::Node to be able to do everything that
Bio::Species can do, AND also be db aware.  It is just this pre-loaded
with data versus a fully DB-dependent object.

This differs from the way I built Taxonomy::Node at first, where if you
wanted the Kingdom for a species, you had to walk up the hierarchy - now
you push that all down at object creation time via the classification
array.

So for the simple case of Genbank/Swiss/EMBL parsing, we would operate as
normal, and create Bio::Species (Bio::Taxonomy::Node really) objects
as per normal.  Only if someone wanted to do fun Bio::Taxonomy stuff they
would need to instantiate a Bio::Taxonomy::Node from a taxonomydb (it
needs to get the ncbi_taxid and a dbhandle).

-jason

On Tue, 3 Feb 2004, Brian Osborne wrote:

> Jason,
>
> So you'd automatically create the Node object without knowing if the
> underlying names and nodes files are present? I agree with you, that could
> be confusing.
>
> Test for the existence of an env that specifies the directory that contains
> these indexed files?
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Jason Stajich
> Sent: Tuesday, February 03, 2004 4:28 PM
> To: Hilmar Lapp
> Cc: Bioperl
> Subject: Re: [Bioperl-l] Bio::Species / Bio::Taxonomy::Node
>
> We can start making things create Taxonomy::Node objects - I know there
> code floating out there which does
> if( $sp->isa('Bio::Species') ) { }
>
> so presumably we could make Bio::Species interface s.t. taxonomy::Node
> isa Bio::Species...?  I don't want to confuse people either.
>
> There may still be a little more functionality that is needed in the
> Taxonomy::Node objects and in the db - specifically how to deal with
> some of the methods which are really specific to the species level of
> the taxonomy (tips) such as classification/bionomial/ etc methods.
>
> -jason
>
> On Sat, 31 Jan 2004, Hilmar Lapp wrote:
>
> > Very cool Jason!!
> >
> > Now we can start hooking this into bioperl-db.
> >
> > And what about porting the SeqIO parsers, the target being to be able
> > to deprecate Bio::Species altogether? Alternatively, change the
> > SeqI/RichSeqI implementations to silently convert a Bio::Species
> > instance on set to a Bio::Taxonomy::Node instance?
> >
> >       -hilmar
> >
> > On Friday, January 30, 2004, at 02:07  PM, Jason Stajich wrote:
> >
> > > I think I've finally committed code which will allow
> > > Bio::Taxonomy::Node
> > > to act like Bio::Species while supporting the notion of being a node
> > > in a
> > > taxonomy hierarchy.  Added tests in t/Species.t to this effect.
> > >
> > > For Bio::DB::Taxonomy::flatfile I've added indexing by parent Id so it
> > > is
> > > quite fast to grab all the children for a given node.  So you can walk
> > > up
> > > and down the classification system now.  Practically speaking
> > > this means to get all the taxon ids of species in the same genus with a
> > > few simple lines like below.
> > >
> > > Unfortunately the the NCBI taxonomy API as part of E-Utils doesn't
> > > quite
> > > provide the information we need so the whole API can't be used without
> > > downloading the taxonomy db locally.
> > >
> > > nodefile and namesfile are the files from ncbi taxdump see
> > > Bio::DB::Taxonomy::flatfile for more info.
> > >
> > > #!/usr/bin/perl
> > > use strict;
> > > use warnings;
> > >
> > > use Bio::DB::Taxonomy;
> > > my $db = Bio::DB::Taxonomy->new
> > >     (-source => 'flatfile',
> > >      -nodesfile=> '/home/jason/taxonomy/nodes.dmp',
> > >      -namesfile=> '/home/jason/taxonomy/names.dmp');
> > >
> > > my $node = $db->get_Taxonomy_Node(-name => 'Caenorhabditis elegans');
> > >
> > > my $parent = $node->get_Parent_Node();
> > > for my $n ( $parent->get_Children_Nodes() ) {
> > >     print $n->binomial, "\t", $n->ncbi_taxid,"\n";
> > > }
> > >
> > > Someday I'll get around to making a HowTO unless someone else wants to
> > > do
> > > it... =)
> > >
> > > -jason
> > > --
> > > Jason Stajich
> > > Duke University
> > > jason at cgt.mc.duke.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list