[Biopython-dev] Code review request for phyloxml branch

Peter biopython at maubp.freeserve.co.uk
Fri Sep 25 05:59:08 EDT 2009


On Fri, Sep 25, 2009 at 5:34 AM, Eric Talevich <eric.talevich at gmail.com> wrote:
>>
>> On a related point, do you think a BioSQL TaxonTree subclass is possible?
>> i.e. Something mimicking the new Tree objects (as a subclass), but which
>> loads data on demand from the taxon tables in a BioSQL database? This
>> would provide a nice way to work with the NCBI taxonomy (once loaded
>> into BioSQL), which is a very large tree. For an example use case, I might
>> want to extract just the bacteria as a subtree, and save that to a file.
>>
>
> Doing BioSQL integration was on the original roadmap, but research hasn't
> taken me back there lately. I would like to do it eventually... anyway, that
> would solve the indexing issue nicely. I'll drop the extra attributes -- I
> get the impression they're not meant to be accessed directly in BioSQL
> either, so there's no use for them in Biopython.

As things stand, there is no usage of the left/right index fields in
Biopython.

The current Biopython BioSQL code focusses on the database
variants of the Seq and SeqRecord objects. The only interaction
with the taxon tables is to load/retrieve the species annotations,
and for this we don't need the complications of the left/right index.
We leave them empty if we populate the taxonomy via Entrez
(recalculating the left/right values is computationally expensive).

However, any "DBTaxonTree" object (or whatever we call it) could
potentially offer us a way to (a) populate and (b) use the these
alternative indexes as a way to speed up various subtree operations.

Peter


More information about the Biopython-dev mailing list