[Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees

Tristan Lefebure tristan.lefebure at gmail.com
Thu Aug 7 13:35:24 EDT 2008


Hi list,

I'm using a script very similar to bp_taxonomy2tree.pl distributed with 
BioPerl (with the only difference that I'm using taxids instead of taxon 
names). Basically, the script generates a taxonomic tree given a list of 
taxids using the NCBI taxonomy db. For each taxon, it generates a taxon 
object, and then merge this object to a tree object that keeps growing. It 
runs very well with a small number of taxa, but with many taxa (>1000), it is 
very very very slow (about a week for 3000 taxa).

The slowness is due to the  function merge_lineage (line 65), which merges the 
existing tree object with a new taxon object. I guess that the difficulty 
with a big tree (i.e. more than 1000 leaf) is to find the nodes in common 
between the tree and the new taxon object...

Would you have any idea of how to get around the problem? Should I look under 
the hood of merge_lineage to try to improve it for large trees?

Thanks!

Version: bioperl-1.5.2_102
OS: GNU/Linux

-Tristan



More information about the Bioperl-l mailing list