[Biopython-dev] Bio.Phylo to_adjacency_matrix function
Eric Talevich
eric.talevich at gmail.com
Wed Jan 20 04:08:16 UTC 2010
On Tue, Jan 19, 2010 at 10:47 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Tue, Jan 19, 2010 at 3:22 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
>>
>> On Tue, Jan 19, 2010 at 5:49 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>> Hi Eric (and everyone else),
>>>
>>> I just spotted the to_adjacency_matrix function in utils:
>>> http://github.com/biopython/biopython/blob/master/Bio/Phylo/_utils.py
>>>
>>> It looks like your adjacency matrix starts as a numpy array of zeros,
>>> and then you sets some edges to branch lengths. How do you tell
>>> apart a non-connection and a real connection of length zero?
>>
>> Shoot, you're right. I can think of three reasonable mitigations:
>> (a) Use a boolean or 0-1 matrix instead of branch lengths to indicate
>> adjacency -- this seems more standard in textbooks, actually.
>> (b) Issue a warning or raise an error if the given tree contains a
>> 0-length branch.
>> (c) Delete the function.
>>
>> Which do you recommend?
>> ....
>
> I did wonder about further options,
>
> (d) Since the distances are floats, we can use a NA as
> a flag for no connection. However, this does not seem
> very useful.
Or infinity -- I think that's reasonably common in graph algorithms
that use a matrix representation.
Anyway, I commented it out for now. The main problem is that I don't
have a clear use case for the function at the moment, just a notion
that it could be useful for some novel statistical analysis or
possibly rooting an unrooted tree based on a molecular clock. I'll
look at other libraries to see how they use adjacency matrices, if at
all.
> (e) Collapse nodes separated by a zero length branch
> while building the adjacency matrix.
>
> Or, raise an error (b) but provide a tree method to collapse
> nodes separated by a zero length branch which could be
> called to "clean up" a problematic tree before making the
> adjacency matrix.
Should be easy enough for the user to do manually:
for clade in tree.find_clades(branch_length=0):
tree.collapse(clade)
I'm going to do some serious work on the wiki documentation soon so
this sort of operation should be fairly apparent to users.
> P.S. Another potentially interesting thing would be a matrix using
> the bootstrap support values (where again you have a problem
> with zero bootstrap support vs no connection). I'm not sure if this
> has any practical uses though.
Well, the commented-out code is still visible if any brave scientist
is interested in modifying it for this purpose. I'm reading Joe
Felsenstein's book right now, so I'll probably get the urge to add
more mathy toys to Bio.Phylo soon. I'll check with the list before
committing them to the trunk, though. ;)
More information about the Biopython-dev
mailing list