[Biopython-dev] Bio.Phylo to_adjacency_matrix function

Tue Jan 19 10:49:31 UTC 2010

Hi Eric (and everyone else),

I just spotted the to_adjacency_matrix function in utils:
http://github.com/biopython/biopython/blob/master/Bio/Phylo/_utils.py

The dostring says:

> Create an adjacency matrix (NumPy array) from clades/branches in tree.
 >
> Also returns a list of all clades in tree ("allclades"), where the position
> of each clade in the list corresponds to a row and column of the numpy
> array. So, a cell i,j in the array represents the length of the branch from
> allclades[i] to allclades[j].
>
> @return: tuple of (allclades, adjacency_matrix) where allclades is a list
> and adjacency_matrix is a NumPy 2D array.

It looks like your adjacency matrix starts as a numpy array of zeros,
and then you sets some edges to branch lengths. How do you tell
apart a non-connection and a real connection of length zero? These
do occur, for example if you have three identical sequences, then
you might expect a single node with three children. However IIRC,
in (some) NJ trees each node has two children by construction,
so you get an extra node connected with a branch of length zero.

Peter