[BioSQL-l] left_value and right_value in taxon table

aaron.j.mackey at gsk.com aaron.j.mackey at gsk.com
Tue Apr 8 15:58:56 UTC 2008


I believe that the first thing the load_ncbi_taxonomy.pl script does is to 
wipe out everything already in the table.  So you're incremental update 
strategy (with deferred left/right calculation) won't work.

depending on the type of update you're making (e.g. you only add one new 
terminal taxonomic node, having no children), the incremental updates are 
pretty fast, computationally speaking (no tree traversal is required).  I 
won't be able to recite them off the top of my head, but Joe Celko's "SQL 
For Smarties" book has the necessary code.  In a nutshell, it's something 
like if the overall topology of the tree remains unchanged, you'll need to 
increment the right/left values of each node "to the right" of the new 
node you've inserted by 2, but it's a tiny bit more complicated than that.

-Aaron

biosql-l-bounces at lists.open-bio.org wrote on 04/08/2008 11:24:41 AM:

> > > Dear all,
> > >
> > > I hope that I am not the 100th persons asking the following 
questions:
> > > 1) what are left and right values in the taxon table for?
> > >
> >
> >  they hold the nested set values. Nested sets are enumeration 
algorithm
> > described in Joe Celko's SQL for Smarties books, and Aaron Mackey 
gives a
> > good introduction here:
> >
> >  http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html
> >
> >  (This is in the schema DDL file, though obviously should be 
documented
> > better. Good candidate for an FAQ, I suppose.)
> 
> That link does a good job of explaining the idea.
> 
> > > 2) How are they computed
> >
> >  load_ncbi_taxonomy.pl recomputes them automatically after each 
update. It's
> > a simple recursive depth-first graph traversal algorithm.
> 
> I have the impression the recomputation is slow, and also moderately
> complex.  This is fine for a weekly (or even daily) update which runs
> the load_ncbi_taxonomy.pl script.
> 
> We (Biopython) are interested in incremental updates triggered when a
> new sequences is added to the database with a novel taxon id.  Eric is
> looking at downloading the missing taxon data and updating the
> taxon/taxon_name tables "on the fly", transparently to the user.
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 (Biopython bug)
> 
> Hilmar, am I right in thinking the following:  Suppose when loading a
> new sequence into the database with a novel NCBI taxon, we record a
> new minimal taxon/taxon_names entry (without the lineage, a single
> taxon entry with null left/right entries).  If the user then runs
> load_ncbi_taxonomy.pl, assuming the NCBI's online database contains
> the new taxon, will this update nicely?  i.e. When the new sequence is
> retrieved from the database, its full lineage will be available.
> 
> Thanks
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 





More information about the BioSQL-l mailing list