[Bioperl-l] Unwise elimination of nodes in B:T:Node::remove_Descendent?
Chris Fields
cjfields at illinois.edu
Fri Feb 6 09:46:16 EST 2009
On Feb 6, 2009, at 8:21 AM, Hilmar Lapp wrote:
>
> On Feb 6, 2009, at 12:49 AM, Mark A. Jensen wrote:
>
>> A B C D E
>> |5 |5 |4 |4 |
>> \ / \ / /
>> x y /
>> |2 |1 /
>> \ / /
>> \ _/ / 10
>> \ / /
>> z /
>> |3 /
>> \ /
>> ROOT
>>
>>> [...]
>>> __DATA__
>>> (((A:5,B:5)x:2,(C:4,D:4)y:1)z:3,E:10);
>>> [...]
>>
>> The first problem was that the "complementary approaches" didn't give
>> the same answer, and one threw an error. This is the bug in the
>> code above. If you look at the tree, the nodes he really wants to
>> keep are
>> the leaves [A,B,E], *plus* the internal nodes [x,y,z]; that is...
>>
>> $tree->splice(-keep_id=>[A,B,E,x,'y',z])
>> $tree->total_branch_length
>>
>> doesn't throw and returns 25, which is correct.
>
> If you replace 'y' with the root in the above, yes. Otherwise no.
>
>> The second problem is that removing [C, D] also gives him 25, which
>> is
>> what he wanted, but is not correct.
>
> Correct.
>
>> [...]The problem arises in this code at the end of remove_Descendent:
>>
>> # remove unecessary nodes if we have removed the part |||Node.pm
>> LINE 272
>> # which branches.
>> my $a1 = $self->ancestor;
>> if( $a1 ) {
>> my $bl = $self->branch_length || 0;
>> my @d = $self->each_Descendent;
>> if (scalar @d == 1) {
>> $d[0]->branch_length($bl + ($d[0]->branch_length || 0));
>> $a1->add_Descendent($d[0]);
>> $a1->remove_Descendent($self);
>> }
>> }
>>
>> When node C is removed from the example tree, the node 'y' is removed
>> by this code apparently as a convenience.
>
> I guess I'm confused by the above code snippet. If @d are the
> descendants of $a1, then I don't understand what the purpose of
> adding $d[0] as a descendant of $a1 is (after altering its branch
> length). Furthermore, if $a1 is the ancestor of $self, and $a1 has
> only one descendant, wouldn't that mean that $d[0] == $self?
>
> What am I missing?
>
> I also think it's a bad idea to simplify the tree (or collapse
> internal nodes of degree 1) as an implicit operation. It should be
> explicit (and I believe there is a method simplify() or something
> similar, isn't there? Ah - I see you quote contract_linear_paths()).
>
> Furthermore, I think it's also a bad idea to remove descendant leaf
> nodes if you remove an internal node. What if you really just wanted
> to remove the internal node because, for example, its branching
> point isn't well supported? So removing node 'y' should make z a
> node of degree 3, but not remove C and D unless you ask to remove
> the subtree beginning at 'y'.
>
> -hilmar
I suppose the best way to deal with some of these questions (and
ensure Node/Tree is acting as expected) is to come up with several
vetted test cases indicating what we expect the proper behavior to be
for remove_Descendant(), contract_linear_paths(), and any other
problematic Node/Tree/TreeFunctionI methods. In fact, I highly
recommend any code changes like this add tests to the test suite
demonstrating the issue.
Possibly related to all this is a fairly significant lingering bug
dealing with Bio::Tree::TreeFunctionsI::reroot() (http://bugzilla.open-bio.org/show_bug.cgi?id=2456
). Any takers?
chris
More information about the Bioperl-l
mailing list