[BioRuby] GSOC PhyloXML profiling, bottleneck is Bio::Tree#children
Christian Zmasek
czmasek at burnham.org
Fri Aug 7 20:47:31 UTC 2009
Hi, Diana:
I think your array based solution is fine.
Although, at one point Bioruby's Tree might need to be enhanced, in order to avoid being slowed down by this bfs_shortest_path method. This issue is likely to prohibit (time-wise) many types of algorithms acting on large tree objects.
Christian
________________________________________
From: Diana Jaunzeikare [rozziite at gmail.com]
Sent: Friday, August 07, 2009 10:31 AM
To: Christian Zmasek; Phyloinformatics Group; Pjotr Prins; bioruby at lists.open-bio.org; Naohisa GOTO
Subject: GSOC PhyloXML profiling, bottleneck is Bio::Tree#children
Hi all,
Here is update on Google Summer of Code Bioruby PhyloXML project. I was profiling and refactoring Bioruby PhyloXML Parser code and got 67% speed increase.
With profiling PhyloXML Writer the story is different. It takes 24minutes to write the 1.5MB mollusca taxonomy tree and forever other larger files. Again the bottleneck is bfs_shortest_path, which is called from Tree#children method. It takes forever to just iterate over all the children nodes.
To solve this I propose to save an array of the children of the node within my PhyloXML::Node (which corresponds to a clade) class. This would also ensure that when a phyloxml file is parsed and then written back, clades would be the same order in the input and output files.
Have a good weekend,
Diana
More information about the BioRuby
mailing list