[BioRuby] GSOC PhyloXML profiling, bottleneck is Bio::Tree#children
Diana Jaunzeikare
rozziite at gmail.com
Fri Aug 7 17:31:45 UTC 2009
Hi all,
Here is update on Google Summer of Code Bioruby PhyloXML project. I was
profiling and refactoring Bioruby PhyloXML Parser code and got 67% speed
increase.
With profiling PhyloXML Writer the story is different. It takes 24minutes to
write the 1.5MB mollusca taxonomy tree and forever other larger files.
Again the bottleneck is bfs_shortest_path, which is called from
Tree#children method. It takes forever to just iterate over all the children
nodes.
To solve this I propose to save an array of the children of the node within
my PhyloXML::Node (which corresponds to a clade) class. This would also
ensure that when a phyloxml file is parsed and then written back, clades
would be the same order in the input and output files.
Have a good weekend,
Diana
More information about the BioRuby
mailing list