[Biopython-dev] Code review request for phyloxml branch

Hilmar Lapp hlapp at gmx.net
Fri Sep 25 07:39:03 EDT 2009


On Sep 25, 2009, at 5:59 AM, Peter wrote:

> On Fri, Sep 25, 2009 at 5:34 AM, Eric Talevich <eric.talevich at gmail.com 
> > wrote:
>>>
>>> On a related point, do you think a BioSQL TaxonTree subclass is  
>>> possible?
>>> i.e. Something mimicking the new Tree objects (as a subclass), but  
>>> which
>>> loads data on demand from the taxon tables in a BioSQL database?  
>>> This
>>> would provide a nice way to work with the NCBI taxonomy (once loaded
>>> into BioSQL), which is a very large tree. For an example use case,  
>>> I might
>>> want to extract just the bacteria as a subtree, and save that to a  
>>> file.
>>>
>>
>> Doing BioSQL integration was on the original roadmap, but research  
>> hasn't
>> taken me back there lately. I would like to do it eventually...  
>> anyway, that
>> would solve the indexing issue nicely. I'll drop the extra  
>> attributes -- I
>> get the impression they're not meant to be accessed directly in  
>> BioSQL
>> either, so there's no use for them in Biopython.
>
> As things stand, there is no usage of the left/right index fields in
> Biopython.

The left/right fields are really a crutch for doing hierarchical  
(recursive) queries in SQL more efficiently. SQL doesn't have native  
support for recursive queries, and the left/right index values allow  
you to rewrite an otherwise recursive query as a single-hit set.

Within an object-oriented programming language that supports recursion  
these values are of no use - they don't let you traverse a tree faster  
than you would already be able to do through recursing up or down your  
tree data structure. If there's a natural order of nodes, you can  
speed up finding nodes through binary search. But for pulling out  
lineages or subtrees I doubt that this will help at all - it'll have  
to be your data structure (such as having double links) that makes  
those operations efficient.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================





More information about the Biopython-dev mailing list