[Biopython] NJ tree constructor never completes
Peter Cock
p.j.a.cock at googlemail.com
Mon Aug 14 15:31:45 UTC 2017
Hi Andrew,
My guess is you are simply seeing the quadratic scaling being a
problem. Can you try timing a series of subsets, say 10 entries, 50,
100, 200, 250, 500, 1000 - that approach ought to be enough to
estimate how long the full 6000 or so would take.
It may be you would be better off with a compiled command line
phylogenetic tree for work at this scale?
Peter
On Mon, Aug 14, 2017 at 4:21 PM, Andrew Sanchez <aas229 at nau.edu> wrote:
> I am trying to construct a tree from a DistanceMatrix object with len of 6303 with the following command: `tree = constructor.nj(bio_dmx)`.
>
> The matrix and constructor were derived like so:
>
> bio_dmx = _DistanceMatrix(names, nested_dmx)
> constructor = DistanceTreeConstructor()
>
> I've tested my workflow on a much smaller distance matrix, just following the examples at http://biopython.org/wiki/Phylo and it worked just fine. When I try to do it with this larger dataset, the process just hangs. I don't know where to begin debugging. First of all, how long should I expect this process to take? From wikipedia: “...typical run times proportional to approximately the square of the number of taxa."
>
> Maybe it is normal for a tree of this size to take so long to construct? If so, is there a way to run tree = constructor.nj(bio_dmx) so that it produces some output that will allow me to at least see that something is happening?
>
> I was trying to do this in an IPython session, and eventually I just cancelled the process which had been going for about 48 hours. The result of the keyboard interrupt was:
>
> /home/aas229/anaconda3/envs/gbfilter/lib/python3.4/site-packages/Bio/Phylo/TreeConstruction.py in nj(self, distance_matrix)
> 697 node_dist[i] = 0
> 698 for j in range(0, len(dm)):
> --> 699 node_dist[i] += dm[i, j]
> 700 node_dist[i] = node_dist[i] / (len(dm) - 2)
> 701
>
> /home/aas229/anaconda3/envs/gbfilter/lib/python3.4/site-packages/Bio/Phylo/TreeConstruction.py in __getitem__(self, item)
> 166 raise TypeError("Invalid index type.")
> 167 # check index
> --> 168 if row_index > len(self) - 1 or col_index > len(self) - 1:
> 169 raise IndexError("Index out of range.")
> 170 if row_index > col_index:
>
> /home/aas229/anaconda3/envs/gbfilter/lib/python3.4/site-packages/Bio/Phylo/TreeConstruction.py in __len__(self)
> 284 def __len__(self):
> 285 """Matrix length"""
> --> 286 return len(self.names)
> 287
> 288 def __repr__(self):
>
> Does this output suggest that the job was in fact running just fine, but just taking a really long time?
>
> Is there any other info that would be helpful in figuring this out?
>
> Thank you,
> Andrew
> _______________________________________________
> Biopython mailing list - Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list