[Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees
Sendu Bala
bix at sendu.me.uk
Thu Aug 7 22:20:29 UTC 2008
Tristan Lefebure wrote:
> I'm using a script very similar to bp_taxonomy2tree.pl distributed with
> BioPerl (with the only difference that I'm using taxids instead of taxon
> names). Basically, the script generates a taxonomic tree given a list of
> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon
> object, and then merge this object to a tree object that keeps growing. It
> runs very well with a small number of taxa, but with many taxa (>1000), it is
> very very very slow (about a week for 3000 taxa).
>
> The slowness is due to the function merge_lineage (line 65), which merges the
> existing tree object with a new taxon object. I guess that the difficulty
> with a big tree (i.e. more than 1000 leaf) is to find the nodes in common
> between the tree and the new taxon object...
>
> Would you have any idea of how to get around the problem? Should I look under
> the hood of merge_lineage to try to improve it for large trees?
Yes, please do. It might have been me that wrote that, in which case I
didn't do anything fancy or consider the above problem.
More information about the Bioperl-l
mailing list