[Biopython] NJ tree constructor never completes
Andrew Sanchez
aas229 at nau.edu
Mon Aug 14 15:21:54 UTC 2017
I am trying to construct a tree from a DistanceMatrix object with len of 6303 with the following command: `tree = constructor.nj(bio_dmx)`.
The matrix and constructor were derived like so:
bio_dmx = _DistanceMatrix(names, nested_dmx)
constructor = DistanceTreeConstructor()
I've tested my workflow on a much smaller distance matrix, just following the examples at http://biopython.org/wiki/Phylo and it worked just fine. When I try to do it with this larger dataset, the process just hangs. I don't know where to begin debugging. First of all, how long should I expect this process to take? From wikipedia: “...typical run times proportional to approximately the square of the number of taxa."
Maybe it is normal for a tree of this size to take so long to construct? If so, is there a way to run tree = constructor.nj(bio_dmx) so that it produces some output that will allow me to at least see that something is happening?
I was trying to do this in an IPython session, and eventually I just cancelled the process which had been going for about 48 hours. The result of the keyboard interrupt was:
/home/aas229/anaconda3/envs/gbfilter/lib/python3.4/site-packages/Bio/Phylo/TreeConstruction.py in nj(self, distance_matrix)
697 node_dist[i] = 0
698 for j in range(0, len(dm)):
--> 699 node_dist[i] += dm[i, j]
700 node_dist[i] = node_dist[i] / (len(dm) - 2)
701
/home/aas229/anaconda3/envs/gbfilter/lib/python3.4/site-packages/Bio/Phylo/TreeConstruction.py in __getitem__(self, item)
166 raise TypeError("Invalid index type.")
167 # check index
--> 168 if row_index > len(self) - 1 or col_index > len(self) - 1:
169 raise IndexError("Index out of range.")
170 if row_index > col_index:
/home/aas229/anaconda3/envs/gbfilter/lib/python3.4/site-packages/Bio/Phylo/TreeConstruction.py in __len__(self)
284 def __len__(self):
285 """Matrix length"""
--> 286 return len(self.names)
287
288 def __repr__(self):
Does this output suggest that the job was in fact running just fine, but just taking a really long time?
Is there any other info that would be helpful in figuring this out?
Thank you,
Andrew
More information about the Biopython
mailing list