[BioSQL-l] left_value and right_value in taxon table

Peter biopython at maubp.freeserve.co.uk
Tue Apr 8 15:24:41 UTC 2008


> > Dear all,
> >
> > I hope that I am not the 100th persons asking the following questions:
> > 1) what are left and right values in the taxon table for?
> >
>
>  they hold the nested set values. Nested sets are enumeration algorithm
> described in Joe Celko's SQL for Smarties books, and Aaron Mackey gives a
> good introduction here:
>
>  http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html
>
>  (This is in the schema DDL file, though obviously should be documented
> better. Good candidate for an FAQ, I suppose.)

That link does a good job of explaining the idea.

> > 2) How are they computed
>
>  load_ncbi_taxonomy.pl recomputes them automatically after each update. It's
> a simple recursive depth-first graph traversal algorithm.

I have the impression the recomputation is slow, and also moderately
complex.  This is fine for a weekly (or even daily) update which runs
the load_ncbi_taxonomy.pl script.

We (Biopython) are interested in incremental updates triggered when a
new sequences is added to the database with a novel taxon id.  Eric is
looking at downloading the missing taxon data and updating the
taxon/taxon_name tables "on the fly", transparently to the user.

http://bugzilla.open-bio.org/show_bug.cgi?id=2475 (Biopython bug)

Hilmar, am I right in thinking the following:  Suppose when loading a
new sequence into the database with a novel NCBI taxon, we record a
new minimal taxon/taxon_names entry (without the lineage, a single
taxon entry with null left/right entries).  If the user then runs
load_ncbi_taxonomy.pl, assuming the NCBI's online database contains
the new taxon, will this update nicely?  i.e. When the new sequence is
retrieved from the database, its full lineage will be available.

Thanks

Peter



More information about the BioSQL-l mailing list