[Biopython] NJ tree constructor never completes

Andrew Sanchez aas229 at nau.edu
Mon Aug 14 15:21:54 UTC 2017


I am trying to construct a tree from a DistanceMatrix object with len of 6303 with the following command:  `tree = constructor.nj(bio_dmx)`.

The matrix and constructor were derived like so:

bio_dmx = _DistanceMatrix(names, nested_dmx)
constructor = DistanceTreeConstructor()

I've tested my workflow on a much smaller distance matrix, just following the examples at http://biopython.org/wiki/Phylo and it worked just fine.  When I try to do it with this larger dataset, the process just hangs.  I don't know where to begin debugging.  First of all, how long should I expect this process to take?  From wikipedia:  “...typical run times proportional to approximately the square of the number of taxa."

Maybe it is normal for a tree of this size to take so long to construct?  If so, is there a way to run tree = constructor.nj(bio_dmx) so that it produces some output that will allow me to at least see that something is happening?

I was trying to do this in an IPython session, and eventually I just cancelled the process which had been going for about 48 hours.  The result of the keyboard interrupt was:

/home/aas229/anaconda3/envs/gbfilter/lib/python3.4/site-packages/Bio/Phylo/TreeConstruction.py in nj(self, distance_matrix)
   697                 node_dist[i] = 0
   698                 for j in range(0, len(dm)):
--> 699                     node_dist[i] += dm[i, j]
   700                 node_dist[i] = node_dist[i] / (len(dm) - 2)
   701

/home/aas229/anaconda3/envs/gbfilter/lib/python3.4/site-packages/Bio/Phylo/TreeConstruction.py in __getitem__(self, item)
   166                 raise TypeError("Invalid index type.")
   167             # check index
--> 168             if row_index > len(self) - 1 or col_index > len(self) - 1:
   169                 raise IndexError("Index out of range.")
   170             if row_index > col_index:

/home/aas229/anaconda3/envs/gbfilter/lib/python3.4/site-packages/Bio/Phylo/TreeConstruction.py in __len__(self)
   284     def __len__(self):
   285         """Matrix length"""
--> 286         return len(self.names)
   287
   288     def __repr__(self):

Does this output suggest that the job was in fact running just fine, but just taking a really long time?

Is there any other info that would be helpful in figuring this out?

Thank you,
Andrew


More information about the Biopython mailing list