[Bioperl-l] proposed additions to Tree and cladogram
Georgii A Bazykin
gbazykin at Princeton.EDU
Fri Feb 3 15:38:04 EST 2006
Hi all,
a while ago, I mailed to bioperl-l some proposed additions to
phylogeny-related modules (see below). I am doing a project on hiv
phylogeny now, and rely on these additions heavily. They expand on
what was already present in the corresponding modules. I expected them
to be also of general usage (at least the first one).
However, I never got any answer, so I assumed that these additions
were considered superfluous by most.
I am now working on an addition to Tree::Draw::Cladogram module. For
my project, I need to color individual tree edges (including internal)
into colors from red to blue (according to the nosynonymous/synonymous
ratios of these branches). This should be technically easy (I guess I
will add -Rcolor, -Gcolor and -Bcolor tags to nodes and use them in
Cladogram to color preceding edges), but I have two questions:
- will this add-on be of general interest - should I try to do it
"the right way", updating the pods etc.;
- in general, are there any guidelines about how specific an issue
a method should address to be included in bioperl distribution?
Thanks,
Yegor Bazykin
This is a forwarded message
From: Georgii Bazykin <gbazykin at princeton.edu>
To: bioperl-l at bioperl.org
Date: Wednesday, October 26, 2005, 4:27:07 PM
Subject: suggestions for additions to Tree
===8<==============Original message text===============
Hi,
here are some tree-related methods I needed and added to my bioperl.
Hope someone else finds any of them useful as well.
Yegor Bazykin
=============================================
To NodeI:
# modified from total_branch_length in Tree:Tree module
# gets sum of branches in the subtree - descendents of given node
=head2 children_branch_length
Title : children_branch_length
Usage : my $size = $node->children_branch_length
Function: Returns the sum of the length of all branches of the subtree which starts at given node
Returns : integer
Args : none
=cut
sub children_branch_length {
my ($self) = @_;
return 0 if($self -> is_Leaf) ;
my $sum = 0;
for ($self -> get_all_Descendents) {
$sum += $_->branch_length || 0;
}
return $sum;
}
-----------------------------------
=head2 height_nodes
Title : height_nodes
Usage : my $len = $node->height_nodes
Function: Returns the height of the tree starting at this
node. Height is the maximum branchlength to get to the tip.
Returns : The longest length to a leaf, in nodes
Args : none
=cut
sub height_nodes{
my ($self) = @_;
return 0 if( $self->is_Leaf );
my $max = 0;
foreach my $subnode ( $self->each_Descendent ) {
my $s = $subnode->height_nodes + 1;
if( $s > $max ) { $max = $s; }
}
return $max;
}
----------------------------------
=head2 get_all_Descendent_Leaves
Title : get_all_Descendent_Leaves($sortby)
Usage : my @nodes = $node->get_all_Descendent_Leaves;
Function: Recursively fetch all the nodes and their descendents, only selecting leaves
*NOTE* This is different from each_Descendent
Returns : Array or Bio::Tree::NodeI objects
Args : $sortby [optional] "height", "creation" or coderef to be used
to sort the order of children nodes.
=cut
sub get_all_Descendent_Leaves{
my ($self, $sortby) = @_;
$sortby ||= 'height';
my @nodes;
foreach my $node ( $self->each_Descendent($sortby) ) {
if ($node->is_Leaf) {
push @nodes, $node;
}
else {
push @nodes, ($node->get_all_Descendents($sortby));
}
}
return @nodes;
}
=====================================================
To Tree:
=head2 total_internal_branch_length
Title : total_internal_branch_length
Usage : my $size = $tree->total_internal_branch_length
Function: Returns the sum of the length of all branches, excluding branches leading to leaves
Returns : integer
Args : none
=cut
sub total_internal_branch_length {
my ($self) = @_;
my $sum = 0;
if( defined $self->get_root_node ) {
for ( $self->get_root_node->get_Descendents() ) {
unless ($_->is_Leaf) { # YB: THIS IS ALL I ADDED
$sum += $_->branch_length || 0;
}
}
}
return $sum;
}
=================================================
To TreeFunctionsI:
=head2 distance_nodes
Title : distance_nodes
Usage : distance_nodes(-nodes => \@nodes )
Function: returns the distance between two given nodes in numbers of nodes
Returns : numerical distance
Args : -nodes => arrayref of nodes to test
=cut
# YB: distance_nodes is very similar to distance method in TreeFunctionsI except that
# it estimates distances between nodes in numbers of nodes (e.g., 1 between mother and
# daughter, 2 between two sisters, etc.)
sub distance_nodes {
my ($self, at args) = @_;
my ($nodes) = $self->_rearrange([qw(NODES)], at args);
if( ! defined $nodes ) {
$self->warn("Must supply -nodes parameter to distance_nodes() method");
return undef;
}
my ($node1,$node2) = $self->_check_two_nodes($nodes);
# algorithm:
# Find lca: Start with first node, find and save every node from it
# to root, saving cumulative distance. Then start with second node;
# for it and each of its ancestor nodes, check to see if it's in
# the first node's ancestor list - if so it is the lca. Return sum
# of (cumul. distance from node1 to lca) and (cumul. distance from
# node2 to lca)
# find and save every ancestor of node1 (including itself)
my %node1_ancestors; # keys are internal ids, values are objects
my %node1_cumul_dist; # keys are internal ids, values
# are cumulative distance from node1 to given node
my $place = $node1; # start at node1
my $cumul_dist = 0;
while ( $place ){
$node1_ancestors{$place->internal_id} = $place;
$node1_cumul_dist{$place->internal_id} = $cumul_dist;
$cumul_dist++; # YB
#YB if ($place->branch_length) {
#YB $cumul_dist += $place->branch_length; # include current branch
#YB # length in next iteration
#YB }
$place = $place->ancestor;
}
# now climb up node2, for each node checking whether
# it's in node1_ancestors
$place = $node2; # start at node2
$cumul_dist = 0;
while ( $place ){
foreach my $key ( keys %node1_ancestors ){ # ugh
if ( $place->internal_id == $key){ # we're at lca
return $node1_cumul_dist{$key} + $cumul_dist;
}
}
# include current branch length in next iteration
#YB $cumul_dist += $place->branch_length || 0;
$cumul_dist++; # YB
$place = $place->ancestor;
}
$self->warn("Could not find distance!"); # should never execute,
# if so, there's a problem
return undef;
}
===8<===========End of original message text===========
More information about the Bioperl-l
mailing list