[Bioperl-l] Tree Path

barry.m.dancis at gsk.com barry.m.dancis at gsk.com
Fri Nov 18 12:47:36 EST 2005


Hi-- 

        I would like to have a string to represent the node path of a leaf 
on a phylogeny tree such that when the paths are sorted, they are arranged 
in the same order as displayed by Forrester or tree view. The code below 
produces this output:

leaf          creation  path
iiiiii,         14,     11:
gggggg,         11,     121:
hhhhhh,         12,     122:
aaaaaa,         0,      21111:
bbbbbb,         1,      21112:
cccccc,         3,      2112:
dddddd,         5,      212:
eeeeee,         7,      221:
ffffff,         8,      222:

for the tree:

(((((aaaaaa:0.03000,bbbbbb:0.06091):0.04740,cccccc:0.12143):0.23166,dddddd:0.36034):0.00914,(eeeeee:0.30561,ffffff:0.36105):0.01494):0.01961,((gggggg:0.30365,hhhhhh:0.33271):0.02358,iiiiii:0.32490):0.02788);

Notice that the order of the output is the same as the order in the dnd 
file except that the last major branch of 3 nodes in shown first. 

if I create the paths using the order sorted by creation id 

  @nodes = sort { $a->_creation_id <=> $b->_creation_id; } 
$tree->each_Descendent;

I get:

leaf          creation  path
aaaaaa,         0,      11111:
bbbbbb,         1,      11112:
cccccc,         3,      1112:
dddddd,         5,      112:
eeeeee,         7,      121:
ffffff,         8,      122:
gggggg,         11,     211:
hhhhhh,         12,     212:
iiiiii,         14,     22:

which is now the same as the order in the dnd file.

Unfortunately, Forrester gives the order as:

bbbbbb, 1, 11112:
aaaaaa, 0, 11111:
cccccc, 3, 1112:
dddddd, 5, 112:
ffffff, 8, 122:
eeeeee, 7, 121:
hhhhhh, 12, 212:
gggggg, 11, 211:
iiiiii, 14, 22:

and tree view gives the order as:

dddddd, 5, 112:
cccccc, 3, 1112:
aaaaaa, 0, 11111:
bbbbbb, 1, 11112:
eeeeee, 7, 121:
ffffff, 8, 122:
iiiiii, 14, 22:
gggggg, 11, 211:
hhhhhh, 12, 212:


As expected, the differences in the orders only represent differences 
caused by flipping the order of branches and not due to some fundamental 
differences in the trees
For some other trees, treeview will give the same order as the path. When 
there are differences between the path and the location in the displays, 
it is difficult to find a leaf on a large tree diagram from the node path.

The following almost reproduces the Forrester order(igh branch appears at 
the top instead of the bottom):

@nodes = $tree->each_Descendent; #unsorted
        if ($nodes[0]->is_Leaf) {
          @nodes = reverse @nodes;
        }

My questions are:
How do I need to change my sorting so that the order of the path is the 
same as the order on the display in Forrestor and/or Tree View.
Has anyone else done similar things? Are there bioperl routines to do 
this? 

Thanks,

Barry

==========================================================================================================
sub get_phylo_paths {
 
  my ($treefile) = @_;
 
  my $treeio = new Bio::TreeIO( -format => 'newick', -file => $treefile);
  $tree = $treeio->next_tree;             # get the tree
  my $path = [];
  get_phylo_path ($tree->get_root_node,$path);
} # end get_phylo_paths


sub get_phylo_path {
  my ($tree, $ancestor_path) = @_;
  my @nodes;
  if ($tree->is_Leaf) {
        #Include ':' at end so that path is treated as a string not a 
number by Excel, Spotfire, etc
    print $tree->id . ', ' . $tree->_creation_id . ', ' . join 
($NODE_SEPARATOR,@$ancestor_path) . ":\n";
  }
  else {
    @nodes = $tree->each_Descendent; #unsorted
    my $i = 1; 
    foreach my $node (@nodes) {
        my @path = @$ancestor_path;
        push @path, $i++;#adds a number to the path node - either 1 or 2 
except for the root node where there will be a 3 as well
        get_phylo_path ($node, \@path);
      }
    }
}


More information about the Bioperl-l mailing list