[Biopython-dev] Code review request for phyloxml branch
Hilmar Lapp
hlapp at gmx.net
Fri Sep 25 11:39:03 UTC 2009
On Sep 25, 2009, at 5:59 AM, Peter wrote:
> On Fri, Sep 25, 2009 at 5:34 AM, Eric Talevich <eric.talevich at gmail.com
> > wrote:
>>>
>>> On a related point, do you think a BioSQL TaxonTree subclass is
>>> possible?
>>> i.e. Something mimicking the new Tree objects (as a subclass), but
>>> which
>>> loads data on demand from the taxon tables in a BioSQL database?
>>> This
>>> would provide a nice way to work with the NCBI taxonomy (once loaded
>>> into BioSQL), which is a very large tree. For an example use case,
>>> I might
>>> want to extract just the bacteria as a subtree, and save that to a
>>> file.
>>>
>>
>> Doing BioSQL integration was on the original roadmap, but research
>> hasn't
>> taken me back there lately. I would like to do it eventually...
>> anyway, that
>> would solve the indexing issue nicely. I'll drop the extra
>> attributes -- I
>> get the impression they're not meant to be accessed directly in
>> BioSQL
>> either, so there's no use for them in Biopython.
>
> As things stand, there is no usage of the left/right index fields in
> Biopython.
The left/right fields are really a crutch for doing hierarchical
(recursive) queries in SQL more efficiently. SQL doesn't have native
support for recursive queries, and the left/right index values allow
you to rewrite an otherwise recursive query as a single-hit set.
Within an object-oriented programming language that supports recursion
these values are of no use - they don't let you traverse a tree faster
than you would already be able to do through recursing up or down your
tree data structure. If there's a natural order of nodes, you can
speed up finding nodes through binary search. But for pulling out
lineages or subtrees I doubt that this will help at all - it'll have
to be your data structure (such as having double links) that makes
those operations efficient.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Biopython-dev
mailing list