[Bioperl-l] Bootstrap, root, reroot...
Tristan Lefebure
tristan.lefebure at gmail.com
Thu Jul 9 15:50:20 UTC 2009
Hello,
I have been bumping into problems while rerooting trees that
contained bootstrap scores. Basically, after re-rooting the
tree, some scores end-up at the wrong place (i.e. node) and
some nodes lose their score. I found this thread from Bank
Beszter, back in 2007, that exactly explains the same
problems:
http://lists.open-bio.org/pipermail/bioperl-l/2007-
May/025599.html
I attach a script that reproduces the bug and implements the
fix that Bank described (at least this is my understanding,
and it works on this example):
#! /usr/bin/perl
use strict;
use warnings;
use Bio::TreeIO;
my $in = Bio::TreeIO->new(-format => 'newick',
-fh => \*DATA,
-internal_node_id => 'bootstrap');
my $out = Bio::TreeIO->new(-format => 'newick', -file =>
">out.tree");
while( my $t = $in->next_tree ){
my $old_root = $t->get_root_node();
my ($b) = $t->find_node(-id =>"B");
my $b_anc = $b->ancestor;
$out->write_tree($t);
#reroot with B -> wrong, and the tree is kind of weird
$t->reroot($b);
$out->write_tree($t);
#reroot with B ancestor -> wrong
$t->reroot($b_anc);
$out->write_tree($t);
#a fix, following Bank Beszteri description
my $node = $old_root;
while (my $anc_node = $node->ancestor) {
$node->bootstrap($anc_node->bootstrap());
$anc_node->bootstrap('');
$node = $anc_node;
}
$out->write_tree($t); #->good this time
}
__DATA__
(A:52,(B:46,C:50)68:11,D:70);
Here is the output:
(A:52,(B:46,C:50)68:11,D:70);
((C:50,(A:52,D:70):11)68:46)B;
(B:46,C:50,(A:52,D:70):11)68;
(B:46,C:50,(A:52,D:70)68:11);
Tree #2 and #3 have the score 68 moved to the wrong node,
while tree #4 is OK. (BTW tree #2 is really weird, except if
B, is the real ancestor (a fossil ?), it really does not
make much sense to me).
My understanding here is that the problem is linked to the
well-known difficulty to differentiate node from branch
labels in newick trees. Bootstrap scores are branch
attributes not node attributes, but since Bio::TreeI has no
branch/edge/bipartition object they are attached to a node,
and in fact reflects the bootstrap score of the ancestral
branch leading to that node. Troubles naturally come when
you are dealing with an unrooted tree or reroot a tree: a
child can become an ancestor, and, if the bootstrap scores
is not moved from the old child to the new child, it will
end up attached at the wrong place (i.e. wrong node).
I see several fix to that:
1- incorporate Bank's fix into the root() method. I.e. if
there is bootstrap score, after re-rooting, the one on the
old to new ancestor path, should be moved to the right node.
2- Modify the way trees are stored in bioperl to incorporate
branch/edge/bipartition object, and move the bootstrap
scores to them. That won't be easy and will break many
things...
What do you think?
--Tristan
More information about the Bioperl-l
mailing list