[Biopython-dev] Bio.Phylo to_adjacency_matrix function

Eric Talevich eric.talevich at gmail.com
Tue Jan 19 15:22:30 UTC 2010


On Tue, Jan 19, 2010 at 5:49 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi Eric (and everyone else),
>
> I just spotted the to_adjacency_matrix function in utils:
> http://github.com/biopython/biopython/blob/master/Bio/Phylo/_utils.py
>
> The dostring says:
>
>> Create an adjacency matrix (NumPy array) from clades/branches in tree.
>  >
>> Also returns a list of all clades in tree ("allclades"), where the position
>> of each clade in the list corresponds to a row and column of the numpy
>> array. So, a cell i,j in the array represents the length of the branch from
>> allclades[i] to allclades[j].
>>
>> @return: tuple of (allclades, adjacency_matrix) where allclades is a list
>> and adjacency_matrix is a NumPy 2D array.
>
> It looks like your adjacency matrix starts as a numpy array of zeros,
> and then you sets some edges to branch lengths. How do you tell
> apart a non-connection and a real connection of length zero? These
> do occur, for example if you have three identical sequences, then
> you might expect a single node with three children. However IIRC,
> in (some) NJ trees each node has two children by construction,
> so you get an extra node connected with a branch of length zero.

Shoot, you're right. I can think of three reasonable mitigations:
(a) Use a boolean or 0-1 matrix instead of branch lengths to indicate
adjacency -- this seems more standard in textbooks, actually.
(b) Issue a warning or raise an error if the given tree contains a
0-length branch.
(c) Delete the function.

Which do you recommend?

The idea was to give mathematicians something to play with. For
example, Chapter 2 of this report represents phylogenies this way,
using 0 or 1 to indicate the presence of a branch:
http://www.metaheuristics.net/~mdorigo/HomePageDorigo/thesis/dea/CatanzaroDEA.pdf

Thanks for the heads-up,
Eric




More information about the Biopython-dev mailing list