[Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees

Sendu Bala bix at sendu.me.uk
Thu Aug 7 18:20:29 EDT 2008


Tristan Lefebure wrote:
> I'm using a script very similar to bp_taxonomy2tree.pl distributed with 
> BioPerl (with the only difference that I'm using taxids instead of taxon 
> names). Basically, the script generates a taxonomic tree given a list of 
> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon 
> object, and then merge this object to a tree object that keeps growing. It 
> runs very well with a small number of taxa, but with many taxa (>1000), it is 
> very very very slow (about a week for 3000 taxa).
> 
> The slowness is due to the  function merge_lineage (line 65), which merges the 
> existing tree object with a new taxon object. I guess that the difficulty 
> with a big tree (i.e. more than 1000 leaf) is to find the nodes in common 
> between the tree and the new taxon object...
> 
> Would you have any idea of how to get around the problem? Should I look under 
> the hood of merge_lineage to try to improve it for large trees?

Yes, please do. It might have been me that wrote that, in which case I 
didn't do anything fancy or consider the above problem.


More information about the Bioperl-l mailing list