[Bioperl-l] Bio::Species, Bio::Taxonomy::Node overhaul

Sendu Bala bix at sendu.me.uk
Mon Aug 7 13:57:19 UTC 2006


Hilmar Lapp wrote:
> 
> On Aug 7, 2006, at 4:38 AM, Sendu Bala wrote:
> 
>> So there will be a behaviour change - now get_lca really does only get
>> the lowest common ancestor of input nodes, which necessarily can't be
>> any of the input nodes themselves.
>>
>> I'd call the old behaviour a bug that has now been fixed. (Though the
>> code had a comment to the effect that it was a quite deliberate choice
>> on the part of the author.)
> 
> I inclined to call the new behavior a bug. Why would the lca between 
> node A and node B not be defined, or be an ancestor node of A instead of 
> A itself? [...]
> 
> Or am I missing something?

Well, it's the 'lowest common ancestor', isn't it? How can the ancestor 
of something be itself?

I'm interested that you think that the LCA has to be defined; the 
original implementation makes the same assumption in its comments.

Consider two lineages:

A---B---C---D---E

X---Y---Z---D---F

The old implementation would not only expect that E and F have an lca, 
but return the answer D, which is wrong. E and F do not have a common 
ancestor; their direct ancestors just happen to have the same 
descriptor. (In typical usage it was probably never wrong, since the 
descriptors used were script-unique, and unchangeable without using an 
internal method.)

Or more obviously:

A--B

X--C

B and C do not have an lca. I think there is the assumption that both 
nodes being compared belong to the same properly constructed tree, but 
you don't even need a Tree object to use get_lca().
my $lca = Bio::Tree::TreeFunctionsI->get_lca(@nodes);
(Not that I'd suggest anyone do that.)


> Likewise, why would the lca between a node A and its child 
> node B not be A but instead an ancestor node of A?

I could certainly be wrong, I just can't find anything authoritative 
that explicitly states the correct answer to that either way.

For example 
http://66.102.9.104/search?q=cache:5cDyg4Um8GEJ:dept-info.labri.fr/~gavoille/article/AGKR02

Defines the nearest common ancestor (== lowest common ancestor) like:

Let T be a rooted tree. A node x ∈ T is an ancestor of a
node y ∈ T if the path from the root of T to y goes through
x. A node v ∈ T is a common ancestor of x and y if it is
an ancestor of both x and y. The nearest common ancestor,
nca, of two nodes x, y is the common ancestor of x and y
whose distance to x (and to y) is smaller than the distance
to x of any other common ancestor of x and y.

According to that, v cannot be x or y.



More information about the Bioperl-l mailing list