[Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species

Chris Fields cjfields at uiuc.edu
Mon Dec 18 20:55:55 UTC 2006


On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:
>>
>>> However, on a larger set of 190 species, which are all present in
>>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect,
>>> something must be wrong with the merge_lineage method in the major
>>> rewrite of the taxonomy2tree script. Can someone please check this?
>>> I'm attaching the 190 species call to the script. Thanks,
>>>
>>> Gabriel
>>
>> I can confirm that.  It is definitely dropping them in merge_lineage
>>  (); if you add a call to get_leaf_nodes to check how many are
>> present after each merge_lineage() call, you can see it dropping
>> nodes along the trace.
>
> I confirm the 'dropped' nodes, but also claim that this is no bug.
>
> For example, the first 'drop' happens for the 101st species which is
> 'Leptospira interrogans serovar Copenhageni'. This is a variation
> (descendant) of species 24: 'Leptospira interrogans'. So when the
> variation is added it becomes a leaf and 'Leptospira interrogans'  
> is no
> longer a leaf, so the overall number of leaves does not increase.
>
> The next drop is for species 103 'Prochlorococcus marinus subsp.
> pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'.
> Same deal. I didn't check any others, but suspect the same issue  
> arises
> in all cases.

Makes sense now.  I personally would consider this a bug since the  
results are unexpected (so the docs need to be modified in order to  
clarify).  Some say tomato...

I suppose this is one of the issues one might run into when using  
NCBI taxonomy to build trees.

> Gabriel, please confirm this isn't a bug, or suggest how you  
> propose to
> see your taxa when they are not all leaves of the tree.

Having the nodes appear internally seems semantically correct to me.   
Is there any other way?

> PS. I changed the merge_lineage() algorithm to be 18x faster (from the
> absurd 3mins for making the 190 species tree to a more reasonable  
> 10s),
> without changing the tree produced.

Definitely an improvement!

chris



More information about the Bioperl-l mailing list