[Bioperl-l] Bootstrap, root, reroot...

Tristan Lefebure tristan.lefebure at gmail.com
Thu Jul 9 15:50:20 UTC 2009


Hello,

I have been bumping into problems while rerooting trees that 
contained bootstrap scores. Basically, after re-rooting the 
tree, some scores end-up at the wrong place (i.e. node) and 
some nodes lose their score. I found this thread from Bank 
Beszter, back in 2007, that exactly explains the same 
problems:

http://lists.open-bio.org/pipermail/bioperl-l/2007-
May/025599.html

I attach a script that reproduces the bug and implements the 
fix that Bank described (at least this is my understanding, 
and it works on this example):


#! /usr/bin/perl

use strict;
use warnings;
use Bio::TreeIO;


my $in = Bio::TreeIO->new(-format => 'newick',
    -fh => \*DATA,
    -internal_node_id => 'bootstrap');
    
my $out = Bio::TreeIO->new(-format => 'newick', -file => 
">out.tree");

while( my $t = $in->next_tree ){
    my $old_root = $t->get_root_node();
    my ($b) = $t->find_node(-id =>"B");
    my $b_anc = $b->ancestor;
    $out->write_tree($t);

	#reroot with B -> wrong, and the tree is kind of weird
    $t->reroot($b);
    $out->write_tree($t);

	#reroot with B ancestor -> wrong
    $t->reroot($b_anc);
    $out->write_tree($t);
    
    #a fix, following Bank Beszteri description
    my $node = $old_root;
    while (my $anc_node = $node->ancestor) {
	 $node->bootstrap($anc_node->bootstrap());
	 $anc_node->bootstrap('');
	 $node = $anc_node;
    }
    $out->write_tree($t); #->good this time
}


__DATA__
(A:52,(B:46,C:50)68:11,D:70);


Here is the output:

(A:52,(B:46,C:50)68:11,D:70);
((C:50,(A:52,D:70):11)68:46)B;
(B:46,C:50,(A:52,D:70):11)68;
(B:46,C:50,(A:52,D:70)68:11);


Tree #2 and #3 have the score 68 moved to the wrong node, 
while tree #4 is OK. (BTW tree #2 is really weird, except if 
B, is the real ancestor (a fossil ?), it really does not 
make much sense to me). 

My understanding here is that the problem is linked to the 
well-known difficulty to differentiate node from branch 
labels in newick trees. Bootstrap scores are branch 
attributes not node attributes, but since Bio::TreeI has no 
branch/edge/bipartition object they are attached to a node, 
and in fact reflects the bootstrap score of the ancestral 
branch leading to that node. Troubles naturally come when 
you are dealing with an unrooted tree or reroot a tree: a 
child can become an ancestor, and, if the bootstrap scores 
is not moved from the old child to the new child, it will 
end up attached at the wrong place (i.e. wrong node). 

I see several fix to that:

1- incorporate Bank's fix into the root() method. I.e. if 
there is bootstrap score, after re-rooting, the one on the 
old to new ancestor path, should be moved to the right node. 

2- Modify the way trees are stored in bioperl to incorporate 
branch/edge/bipartition object, and move the bootstrap 
scores to them. That won't be easy and will break many 
things... 


What do you think?

--Tristan










More information about the Bioperl-l mailing list