[Bioperl-l] Taxonomy hierarchy extraction

Sendu Bala bix at sendu.me.uk
Tue Jun 19 19:48:50 UTC 2007


Hilmar Lapp wrote:
> Here's what I understand of your description of the problem:
> 
> - We would like nodes returned from Bio::DB::Taxonomy to use the  
> database for all hierarchical queries.
> 
> - We would like nodes used in a Bio::Tree::Tree not to use the  
> database for any hierarchical query.

Correct.


> What I understand that we have is
> 
> - Taxon node objects that have a db_handle set will use the database  
> for ancestor(), unless it has been set manually (?), but not for  
> each_Descendent().
> 
> - Taxon node objects that don't have a db_handle set won't use a  
> database but will function normally otherwise.
> 
> - This is needed to prevent Bio::Tree::Tree methods from pulling the  
> entire tree into memory.

Correct.


> If this is correct (I'm not sure it is), it sounds like we want to  
> temporarily divorce taxonomy nodes from their database capabilities  
> while they are being queried in a tree context?

Yes.


> I'm still trying to understand - if I create a Bio::Tree::Tree from a  
> single node, will the tree automatically contain all nodes along the  
> lineage of ancestors up to the root? So, even if extracting this  
> lineage involved querying a database it would be acceptable, but not  
> for querying descendents?

Yes. Asking the database for all the ancestors up to root only pulls a 
couple of nodes into the tree and is exactly what the user would want to 
happen. But if nodes are allowed to get their descendants from the 
database, when we get the root node from the database, we'd get all the 
root's descendants, and then for each of those we'd get all /their/ 
descendants... that's when the whole db gets sucked in.


> It sounds to me like what is needed is that nodes that get added to a  
> tree need to be stripped of their database capabilities. This could  
> be achieved by creating a wrapper class that delegates all non- 
> hierarchical methods to the wrapped Taxon object, and overriding all  
> hierarchical queries to not use a database. I'm not sure I fully  
> understand yet though, but the inconsistent behavior will be sure to  
> throw people off track.

When we're making a tree from a db Taxon we need db access to find all 
the ancestors; we just don't want to get any descendants outside our 
initiating Taxon's direct lineage.


my @names = ('Eukaryota', 'Mammalia', 'Primates', 'Homo', 'Homo sapiens');
my @ranks = qw(superkingdom class order genus species);
my $db = Bio::DB::Taxonomy->new(-source => 'list', -names => \@names,
                                                    -ranks => \@ranks);

@names = ('Eukaryota', 'Mammalia', 'Rodentia', 'Mus', 'Mus musculus');
$db->add_lineage(-names => \@names, -ranks => \@ranks);


my $homo = $db->get_taxon(-name => 'Homo');
isa_ok($homo, 'Bio::Taxon'); # PASS

is $homo->ancestor->scientific_name, 'Primates' # PASS
my @descs = $homo->each_Descendent;
is @descs, 1 # FAIL, we wanted it to contain the 'Homo sapiens' node


my $lineage = Bio::Tree::Tree->new(-node => $homo);
is $lineage->get_root_node->scientific_name, 'Eukaryota'; # PASS
my @nodes = $lineage->get_nodes;
ok @nodes, 4; # PASS: we didn't pull in Rodentia which would be 8

(on that last test I can't remember if the answer might actually be 5 
because our lineage does contain 'Homo sapiens')


If anyone can figure out how to get all those to pass, please let me know.



More information about the Bioperl-l mailing list