From stefan.guenther at charite.de  Wed Apr  2 05:35:05 2008
From: stefan.guenther at charite.de (Stefan Guenther)
Date: Wed, 02 Apr 2008 11:35:05 +0200
Subject: [BioSQL-l] swissprot - gene names
Message-ID: <47F35349.4010105@charite.de>

Hi,

I have uploaded the swissprot flatfile (uniprot_sprot.dat) into the 
biosql scheme using load_seqdatabase.pl. Now I'm searching for the 
swissprot gene names in the bioseqdb-tables. I cannot find them in the 
bioentry table. Aren't them included?

Stefan


From hlapp at gmx.net  Wed Apr  2 11:44:36 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 2 Apr 2008 11:44:36 -0400
Subject: [BioSQL-l] swissprot - gene names
In-Reply-To: <47F35349.4010105@charite.de>
References: <47F35349.4010105@charite.de>
Message-ID: <FEC1F993-DF82-4B0F-AAA5-587C7688167D@gmx.net>

Are you talking about the gene symbols and names in the GN line?  
These will be in bioentry_qualifier_value associations, with the tag  
being gene_name (I think; check your term table to be sure).

Let me know if that doesn't work.

	-hilmar

On Apr 2, 2008, at 5:35 AM, Stefan Guenther wrote:
> Hi,
>
> I have uploaded the swissprot flatfile (uniprot_sprot.dat) into the  
> biosql scheme using load_seqdatabase.pl. Now I'm searching for the  
> swissprot gene names in the bioseqdb-tables. I cannot find them in  
> the bioentry table. Aren't them included?
>
> Stefan
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From ericgibert at yahoo.fr  Fri Apr  4 09:43:34 2008
From: ericgibert at yahoo.fr (Eric Gibert)
Date: Fri, 4 Apr 2008 13:43:34 +0000 (GMT)
Subject: [BioSQL-l] left_value and right_value in taxon table
Message-ID: <942508.77770.qm@web26507.mail.ukl.yahoo.com>

Dear all,

I hope that I am not the 100th persons asking the following questions:
1) what are left and right values in the taxon table for?
2) How are they computed?

Thank you for your input or link to an explanation page.

Eric


      _____________________________________________________________________________ 
Envoyez avec Yahoo! Mail. Une boite mail plus intelligente http://mail.yahoo.fr

From hlapp at gmx.net  Fri Apr  4 18:40:56 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 4 Apr 2008 18:40:56 -0400
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <942508.77770.qm@web26507.mail.ukl.yahoo.com>
References: <942508.77770.qm@web26507.mail.ukl.yahoo.com>
Message-ID: <60C3976C-26BF-4981-9159-207835C9B5D0@gmx.net>

Hi Eric,

On Apr 4, 2008, at 9:43 AM, Eric Gibert wrote:
> Dear all,
>
> I hope that I am not the 100th persons asking the following questions:
> 1) what are left and right values in the taxon table for?

they hold the nested set values. Nested sets are enumeration  
algorithm described in Joe Celko's SQL for Smarties books, and Aaron  
Mackey gives a good introduction here:

http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html

(This is in the schema DDL file, though obviously should be  
documented better. Good candidate for an FAQ, I suppose.)

> 2) How are they computed

load_ncbi_taxonomy.pl recomputes them automatically after each  
update. It's a simple recursive depth-first graph traversal algorithm.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Tue Apr  8 11:24:41 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Apr 2008 16:24:41 +0100
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <60C3976C-26BF-4981-9159-207835C9B5D0@gmx.net>
References: <942508.77770.qm@web26507.mail.ukl.yahoo.com>
	<60C3976C-26BF-4981-9159-207835C9B5D0@gmx.net>
Message-ID: <320fb6e00804080824x2bd92d41p884c8a4a61c04702@mail.gmail.com>

> > Dear all,
> >
> > I hope that I am not the 100th persons asking the following questions:
> > 1) what are left and right values in the taxon table for?
> >
>
>  they hold the nested set values. Nested sets are enumeration algorithm
> described in Joe Celko's SQL for Smarties books, and Aaron Mackey gives a
> good introduction here:
>
>  http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html
>
>  (This is in the schema DDL file, though obviously should be documented
> better. Good candidate for an FAQ, I suppose.)

That link does a good job of explaining the idea.

> > 2) How are they computed
>
>  load_ncbi_taxonomy.pl recomputes them automatically after each update. It's
> a simple recursive depth-first graph traversal algorithm.

I have the impression the recomputation is slow, and also moderately
complex.  This is fine for a weekly (or even daily) update which runs
the load_ncbi_taxonomy.pl script.

We (Biopython) are interested in incremental updates triggered when a
new sequences is added to the database with a novel taxon id.  Eric is
looking at downloading the missing taxon data and updating the
taxon/taxon_name tables "on the fly", transparently to the user.

http://bugzilla.open-bio.org/show_bug.cgi?id=2475 (Biopython bug)

Hilmar, am I right in thinking the following:  Suppose when loading a
new sequence into the database with a novel NCBI taxon, we record a
new minimal taxon/taxon_names entry (without the lineage, a single
taxon entry with null left/right entries).  If the user then runs
load_ncbi_taxonomy.pl, assuming the NCBI's online database contains
the new taxon, will this update nicely?  i.e. When the new sequence is
retrieved from the database, its full lineage will be available.

Thanks

Peter

From aaron.j.mackey at gsk.com  Tue Apr  8 11:58:56 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 8 Apr 2008 11:58:56 -0400
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <320fb6e00804080824x2bd92d41p884c8a4a61c04702@mail.gmail.com>
Message-ID: <OF04FBA8CC.8898BA9E-ON85257425.005776DB-85257425.0057CB97@gsk.com>

I believe that the first thing the load_ncbi_taxonomy.pl script does is to 
wipe out everything already in the table.  So you're incremental update 
strategy (with deferred left/right calculation) won't work.

depending on the type of update you're making (e.g. you only add one new 
terminal taxonomic node, having no children), the incremental updates are 
pretty fast, computationally speaking (no tree traversal is required).  I 
won't be able to recite them off the top of my head, but Joe Celko's "SQL 
For Smarties" book has the necessary code.  In a nutshell, it's something 
like if the overall topology of the tree remains unchanged, you'll need to 
increment the right/left values of each node "to the right" of the new 
node you've inserted by 2, but it's a tiny bit more complicated than that.

-Aaron

biosql-l-bounces at lists.open-bio.org wrote on 04/08/2008 11:24:41 AM:

> > > Dear all,
> > >
> > > I hope that I am not the 100th persons asking the following 
questions:
> > > 1) what are left and right values in the taxon table for?
> > >
> >
> >  they hold the nested set values. Nested sets are enumeration 
algorithm
> > described in Joe Celko's SQL for Smarties books, and Aaron Mackey 
gives a
> > good introduction here:
> >
> >  http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html
> >
> >  (This is in the schema DDL file, though obviously should be 
documented
> > better. Good candidate for an FAQ, I suppose.)
> 
> That link does a good job of explaining the idea.
> 
> > > 2) How are they computed
> >
> >  load_ncbi_taxonomy.pl recomputes them automatically after each 
update. It's
> > a simple recursive depth-first graph traversal algorithm.
> 
> I have the impression the recomputation is slow, and also moderately
> complex.  This is fine for a weekly (or even daily) update which runs
> the load_ncbi_taxonomy.pl script.
> 
> We (Biopython) are interested in incremental updates triggered when a
> new sequences is added to the database with a novel taxon id.  Eric is
> looking at downloading the missing taxon data and updating the
> taxon/taxon_name tables "on the fly", transparently to the user.
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 (Biopython bug)
> 
> Hilmar, am I right in thinking the following:  Suppose when loading a
> new sequence into the database with a novel NCBI taxon, we record a
> new minimal taxon/taxon_names entry (without the lineage, a single
> taxon entry with null left/right entries).  If the user then runs
> load_ncbi_taxonomy.pl, assuming the NCBI's online database contains
> the new taxon, will this update nicely?  i.e. When the new sequence is
> retrieved from the database, its full lineage will be available.
> 
> Thanks
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 


From hlapp at gmx.net  Tue Apr  8 19:57:41 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 8 Apr 2008 19:57:41 -0400
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <OF04FBA8CC.8898BA9E-ON85257425.005776DB-85257425.0057CB97@gsk.com>
References: <OF04FBA8CC.8898BA9E-ON85257425.005776DB-85257425.0057CB97@gsk.com>
Message-ID: <5EDD43ED-57DD-4BAC-9A93-3B7DAE67ACB4@gmx.net>


On Apr 8, 2008, at 11:58 AM, aaron.j.mackey at gsk.com wrote:
> I believe that the first thing the load_ncbi_taxonomy.pl script  
> does is to
> wipe out everything already in the table.

That may have been in true in its beginnings but hasn't been for a  
long time :-) It only updates changed nodes, adds new ones, and  
deletes retired ones (unless you say --nodelete). The script does  
recompute *all* nested set values, though.

> [...]
> depending on the type of update you're making (e.g. you only add  
> one new
> terminal taxonomic node, having no children), the incremental  
> updates are
> pretty fast, computationally speaking (no tree traversal is  
> required).  I
> won't be able to recite them off the top of my head, but Joe  
> Celko's "SQL
> For Smarties" book has the necessary code.  In a nutshell, it's  
> something
> like if the overall topology of the tree remains unchanged, you'll  
> need to
> increment the right/left values of each node "to the right" of the new
> node you've inserted by 2, but it's a tiny bit more complicated  
> than that.

Though you can have very cheap cases indeed, in reality it turns out  
that on average you still need to traverse and update at least half  
of the nodes, so personally I really doubt you would save any  
significant amount of time by not just redoing all of them. And it's  
not that time-intensive either; typically it takes about 10-20mins,  
depending on CPU etc.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed Apr  9 00:31:31 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 9 Apr 2008 00:31:31 -0400
Subject: [BioSQL-l] BioSQL-1.0.0 + bioperl-db_1.5.2_100 + gbrowse
	question
In-Reply-To: <47FA895F.4080407@embarqmail.com>
References: <47F52F80.7030409@embarqmail.com>
	<E68ABF4F-D65B-43AF-9DC8-C554ACF1719D@gmx.net>
	<47FA895F.4080407@embarqmail.com>
Message-ID: <1094E323-C648-42AD-BE4B-623978FF7035@gmx.net>

Hi Doug,

thanks for your sleuthing work, it's much appreciated. I'm sure  
someone among the GBrowse crowd (I've copied the GBrowse mailing  
list) can make the fix according to what you found out, and they can  
help you out with the other problems you are having too if you  
elaborate what the issues are that you are seeing.

As for the path tables, which ones are you referring to? I'm not sure  
the GBrowse BioSQL adapter uses any of them, so even if you do  
populate them it wouldn't change much. There is the load_ontology.pl  
script in Bioperl-db that can automatically compute the transitive  
closure for ontology terms, but as far as I am aware the feature or  
sequence loading scripts don't do that for seqfeature_path or  
bioentry_path, respectively.

	-hilmar

On Apr 7, 2008, at 4:51 PM, doug brown wrote:

> Hi Hilmar,
>
>   Thank you for your reply.
>
>   After much gnashing of teeth and much more code trawling, it  
> seems as it the basic problem I encountered was indeed one of  
> configuration.  The current coding of  
> Bio::DB::Das::BioSQL::BioDatabaseAdaptor uses 'location' as the  
> keyword for representing the database host. The GFF adapter used,  
> and I guess still uses, 'host' for that purpose. The distributed  
> sample file 06.biosql.conf specified 'host' rather than 'location'.
> Here is the correct configuration file section that now works for me:
>
> description = Magnaporthe grisea V5 genbank BioSQL
> db_args     = driver    mysql
>               dbname    M_grisea_genbank_biosql
>               location  mycelium.fgl.ncsu.edu
>               user      www
>               pass      ""
>               namespace genbank
>               version   1
>
> gbrowse starts up OK now. Could you please tell me whom would be  
> the responsible party so that I can send him/her a note?  There are  
> also other problems with the file that I am trying to resolve.  
> Thank you.
>
> Now I need to figure out how to get gbrowse to display all of my  
> features (I finally have a database rich enough to represent them  
> but not so complex as to be unwieldy) .....
>
> Oh, I noticed that the various path tables are not populated and  
> one piece of documentation said something about "... database  
> dependent...". Could you point me to any code samples that populate  
> those tables? Or, perhaps, pointers to specific mailing list  
> message chains even.
>
> In general, any clues or bread crumbs would be greatly appreciated.
>
> Regards,
> Doug Brown
>
> Hilmar Lapp wrote:
>>
>> Hi Doug,
>>
>> I'm not exactly sure what the problem is but the error you are  
>> seeing is raised by code with GBrowse. I'd recommend that you post  
>> this to the GBrowse list. I have tried to trace the meaning of the  
>> Gbrowse conf parameters, and it seems to me that biodbname is  
>> obsolete and should be namespace instead. However, I also can't  
>> find where the error is being generated using the error message as  
>> guide, so I suspect you are using an outdated version of GBrowse.  
>> The folks on the Gbrowse list should be able to tell you more  
>> specifics, though.
>>
>>  -hilmar
>>
>> On Apr 3, 2008, at 3:26 PM, doug brown wrote:
>>> Hello Hilmar,
>>>
>>>   First off, congratulations for achieving the version 1.0  
>>> release of BioSQL.
>>>
>>> I am attempting to get BioSQL working with bioperl and gbrowse.  
>>> All are, I believe, the most recent versions of the software.  
>>> Unfortunately, I cam running into problems with getting gbrowse  
>>> to access the BioSQL database.
>>>
>>> It is my sincere hope that my problem is a configuration issue.  
>>> However, after trying multiple permutations of gbrowse  
>>> configuration params and much trawling through the bioperl code,  
>>> I am unable to resolve the problem. Could you take a moment and  
>>> see if there is an obvious solution to my problem?
>>>
>>> I have long awaited the 1.0 mature release of BioSQL and I am  
>>> eager to start using it in place my my ad hoc in-house databases!
>>>
>>> Here is the error from the apache log files:
>>> ------------- EXCEPTION  -------------
>>> MSG: error while executing query in  
>>> Bio::DB::Das::BioSQL::PartialSeqAdaptor::find_by_query: No  
>>> database selected
>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_query / 
>>> Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1248
>>> STACK  
>>> Bio::DB::Das::BioSQL::BioDatabaseAdaptor::fetch_Seq_by_accession / 
>>> Library/Perl/5.8.6/darwin-thread-multi-2level/Bio/DB/Das/BioSQL/ 
>>> BioDatabaseAdaptor.pm:120
>>> STACK Bio::DB::Das::BioSQL::get_feature_by_name /Library/Perl/ 
>>> 5.8.6/darwin-thread-multi-2level/Bio/DB/Das/BioSQL.pm:220
>>> STACK Bio::Graphics::Browser::_feature_get /Library/Perl/5.8.6/ 
>>> darwin-thread-multi-2level/Bio/Graphics/Browser.pm:1884
>>> STACK Bio::Graphics::Browser::name2segments /Library/Perl/5.8.6/ 
>>> darwin-thread-multi-2level/Bio/Graphics/Browser.pm:1828
>>> STACK main::lookup_features_from_db /Library/WebServer/CGI- 
>>> Executables/gbrowse:1677
>>> STACK main::get_features /Library/WebServer/CGI-Executables/ 
>>> gbrowse:1577
>>> STACK toplevel /Library/WebServer/CGI-Executables/gbrowse:182
>>>
>>> My configuration file:
>>> [GENERAL]
>>> # based on 06.biosql.conf,v 1.2.6.2.2.1 2006/06/07 20:50:29 which  
>>> was obtained
>>> # from the basic gbrowse installation
>>> description = Magnaporthe grisea V5 genbank BioSQL
>>> db_adaptor  = Bio::DB::Das::BioSQL
>>> #nb: the dashes are required and were missing in the original
>>> db_args     = driver    mysql
>>>           -dbname    M_grisea_genbank_biosql
>>>           -namespace bioperl
>>>           -biodbname genbank
>>>           -version     1
>>>           -host      mycelium.fgl.ncsu.edu
>>>           -user      www
>>>           -pass      ""
>>>           -port 3306
>>>
>>> plugins =  FastaDumper RestrictionAnnotator
>>> # deb 2-apr-08 SequenceDumper can't be found. So, dont use it.
>>> #SequenceDumper
>>>
>>> ... remainder is unchanged ...
>>>
>>> database creation and load:
>>> mysql -udebrown -p -h mycelium.fgl.ncsu.edu
>>> drop database M_grisea_genbank_biosql;
>>> create database M_grisea_genbank_biosql;
>>> grant select on M_grisea_genbank_biosql.* to  
>>> www at marray.fgl.ncsu.edu;
>>> use M_grisea_genbank_biosql;
>>>
>>> mysql -udebrown -p -h mycelium.fgl.ncsu.edu   
>>> M_grisea_genbank_biosql <C:\installKits\biosql-1.0.0\sql\biosqldb- 
>>> mysql.sql
>>>
>>> C:\Perl\site\bin\load_seqdatabase.bat\load_seqdatabase.bat  --dsn  
>>> "dbi:mysql:database=M_grisea_genbank_biosql;host=mycelium.fgl.ncsu.e 
>>> du" -dbuser debrown --dbpass XXXXXXX  --format genbank genbank 
>>> \CH476760.gb --namespace genbank
>>>
>>> machine (node) layout:
>>> marray.fgl.ncsu.edu is the web server
>>> mycelium.fgl.ncsu.edu is the database server
>>> dougslaptop.fgl.ncsu.edu is a development system.
>>>
>>>
>>> Thank you for your time,
>>>
>>> Regards,
>>> Doug Brown
>>> -- 
>>> Doug Brown - Bioinformatics
>>> Fungal Genomics Laboratory
>>> Center for Integrated Fungal Research
>>> North Carolina State University
>>> Campus Box 7251, Raleigh, NC 27695-7251
>>> https://www.fungalgenomics.ncsu.edu/~debrown/
>>> Tel: (919) 513-0394, Fax (919) 513-0024
>>> e-mail: doug_brown at ncsu.edu
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
> -- 
> Doug Brown - Bioinformatics
> Fungal Genomics Laboratory
> Center for Integrated Fungal Research
> North Carolina State University
> Campus Box 7251, Raleigh, NC 27695-7251
> https://www.fungalgenomics.ncsu.edu/~debrown/
> Tel: (919) 513-0394, Fax (919) 513-0024
> e-mail: doug_brown at ncsu.edu

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Wed Apr  9 06:02:27 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Apr 2008 11:02:27 +0100
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <5EDD43ED-57DD-4BAC-9A93-3B7DAE67ACB4@gmx.net>
References: <OF04FBA8CC.8898BA9E-ON85257425.005776DB-85257425.0057CB97@gsk.com>
	<5EDD43ED-57DD-4BAC-9A93-3B7DAE67ACB4@gmx.net>
Message-ID: <320fb6e00804090302i5ab447acx912ce205c5580b4@mail.gmail.com>

On Wed, Apr 9, 2008 at 12:57 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
>  The [load_ncbi_taxonomy.pl] script does recompute *all*
>  nested set values, though...
>  Though you can have very cheap cases indeed, in reality it
>  turns out that on average you still need to traverse and update
>  at least half of the nodes, so personally I really doubt you
>  would save any significant amount of time by not just redoing
>  all of them. And it's not that time-intensive either;  typically it
>  takes about 10-20mins, depending on CPU etc.

This does mean that in general, trying to fully update the taxon table
when adding a new sequence with a novel NCBI taxon id would take at
least 10mins (in addition to the drawback of having the Bio* project
reimplement much of the load_ncbi_taxonomy.pl script's logic).

This probably helps explain why when the NCBI taxon ID wasn't already
defined, the old Biopython code would actually create new taxon table
entries for the entire lineage (based on the species lineage names in
a GenBank file) without linking into any existing taxon table entries
which may have matched.  Because these new entries were independent of
everything else, their left/right values could be calculated trivially
(starting above the largest existing left/right value). This had the
advantage of recording as much information as possible (without having
to use load_ncbi_taxonomy.pl at all), but left the taxon table full of
redundant entries.

I think that in this case, when trying to load a sequence with a novel
NCBI taxon id, the best solution may be just to add a single minimal
taxon table entry with NULL left/right values (and let the
load_ncbi_taxonomy.pl fill in the lineage later).

Peter

From aaron.j.mackey at gsk.com  Wed Apr  9 08:45:43 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Wed, 9 Apr 2008 08:45:43 -0400
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <5EDD43ED-57DD-4BAC-9A93-3B7DAE67ACB4@gmx.net>
Message-ID: <OFA9D0B6AA.59010AAB-ON85257426.0045DC28-85257426.00461B57@gsk.com>

> On Apr 8, 2008, at 11:58 AM, aaron.j.mackey at gsk.com wrote:
> > I believe that the first thing the load_ncbi_taxonomy.pl script 
> > does is to
> > wipe out everything already in the table.
> 
> That may have been in true in its beginnings but hasn't been for a 
> long time :-) It only updates changed nodes, adds new ones, and 
> deletes retired ones (unless you say --nodelete). The script does 
> recompute *all* nested set values, though.

Ahh right, I remember all that now.  It was the wiping out of the 
left/right values that I was thinking of.

Thanks,

-Aaron


From mmokrejs at ribosome.natur.cuni.cz  Fri Apr 11 20:50:58 2008
From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=)
Date: Sat, 12 Apr 2008 02:50:58 +0200
Subject: [BioSQL-l] left_value and right_value in taxon table
Message-ID: <48000772.4060903@ribosome.natur.cuni.cz>

>> On Apr 8, 2008, at 11:58 AM, aaron.j.mackey at gsk.com wrote:
>> > I believe that the first thing the load_ncbi_taxonomy.pl script 
>> > does is to
>> > wipe out everything already in the table.
>> 
>> That may have been in true in its beginnings but hasn't been for a 
>> long time :-) It only updates changed nodes, adds new ones, and 
>> deletes retired ones (unless you say --nodelete). The script does 
>> recompute *all* nested set values, though.
> 
> Ahh right, I remember all that now.  It was the wiping out of the 
> left/right values that I was thinking of.
> 
> Thanks,
> 
> -Aaron

Hi,

Maybe you have meant the other taxonomy loading script? ;-)

http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/scripts/load_itis_taxonomy.pl

<quote>
You can use this script to load the taxonomy data into a fresh instance of
biosql. Otherwise an already existing ITIS tree will be deleted first.
</quote>

I just don't understand why the very first sentence of the documentation
within the scripts says something about 'update':

<quote>
This script loads or updates a biosql schema with phylodb extension
with the ITIS taxonomy as a phylogenetic trees, one tree for each
kingdom.
</quote>


Regards,
Martin

From mmokrejs at ribosome.natur.cuni.cz  Fri Apr 11 21:32:14 2008
From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=)
Date: Sat, 12 Apr 2008 03:32:14 +0200
Subject: [BioSQL-l] [Bioperl-l] Loading sequences with novel NCBI
	taxon_id
In-Reply-To: <CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>	<93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
	<CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
Message-ID: <4800111E.3030802@ribosome.natur.cuni.cz>

Chris Fields wrote:
> The counter to that perspective (using new sequences with old tax info) 
> would be to regularly update NCBI taxonomy, particularly in 
> circumstances prior to adding new sequences.  Hilmar mentioned that once 
> tax is loaded it doesn't take as long to update, so you could set up a 
> cron job to update regularly.
> 
> I remember someone mentioning weekly or monthly updates on the list 
> quite a while ago, but I'm unsure how often NCBI updates tax information 
> (i.e. with every release, monthly, weekly, etc).  I can see instances 
> popping up where you used the an up-to-date taxonomy but a new sequence 
> contains a tax ID not present.  I think bioperl-db handles these but I'm 
> not sure what other Bio* do.
> 

I spent some time benchmarking this and inspecting the mysql log files.
The current load_ncbi_taxonomy.pl script with minor modification to
show timestamps does this on initial import into mysql and then update
of the database using exactly same dataset (but anyway it has to walk
through all the data):

$ ./load_ncbi_taxonomy.pl --dbname=biosqldb --driver=mysql --host=127.0.01 \
  --port=3306 --directory=/home/mmokrejs/bioinformatics/databases/ncbitax/dump \
  --chunksize=0 --verbose=2 --mycnf=~/.my.cnf
Sat Apr 12 01:58:43 MEST 2008
Loading NCBI taxon database in /home/mmokrejs/bioinformatics/databases/ncbitax/dump:
       ... retrieving all taxon nodes in the database
Sat Apr 12 01:58:43 MEST 2008
       ... reading in taxon nodes from nodes.dmp
Sat Apr 12 01:58:58 MEST 2008
       ... insert / update / delete taxon nodes
                10000/421098 done (in 5 secs, 2000.0 rows/s)
                20000/421098 done (in 4 secs, 2500.0 rows/s)
...
                420000/421098 done (in 4 secs, 2500.0 rows/s)
Sat Apr 12 02:02:21 MEST 2008
       ... (committing nodes)
Sat Apr 12 02:02:21 MEST 2008
       ... rebuilding nested set left/right values
                10000 done (in 24 secs, 416.7 rows/s)
                20000 done (in 26 secs, 384.6 rows/s)
                30000 done (in 24 secs, 416.7 rows/s)
...
                420004 done (in 23 secs, 434.8 rows/s)
Sat Apr 12 02:19:25 MEST 2008
       ... reading in taxon names from names.dmp
Sat Apr 12 02:19:25 MEST 2008
       ... deleting old taxon names
Sat Apr 12 02:19:25 MEST 2008
       ... inserting new taxon names
                10000 done (in 8 secs, 1250.0 rows/s)
                20000 done (in 8 secs, 1250.0 rows/s)
...
                580000 done (in 5 secs, 2000.0 rows/s)
Sat Apr 12 02:24:48 MEST 2008
       ... cleaning up
Sat Apr 12 02:24:49 MEST 2008
Done.
$


I decided to re-import the same data to mimic at least somehow
the future updates, although no record should be UPDATEd,
except zapping left and right values with NULL. :((

$ ./load_ncbi_taxonomy.pl --dbname=biosqldb --driver=mysql --host=127.0.01
  --port=3306 --directory=/home/mmokrejs/bioinformatics/databases/ncbitax/dump \
  --chunksize=0 --verbose=2 --mycnf=~/.my.cnf
Sat Apr 12 02:35:20 MEST 2008
Loading NCBI taxon database in /home/mmokrejs/bioinformatics/databases/ncbitax/dump:
        ... retrieving all taxon nodes in the database
Sat Apr 12 02:35:26 MEST 2008
       ... reading in taxon nodes from nodes.dmp
Sat Apr 12 02:35:46 MEST 2008
       ... insert / update / delete taxon nodes
                10000/421098 done (in 0 secs, 10000.0 rows/s)
                20000/421098 done (in 0 secs, 10000.0 rows/s)
...
                410000/421098 done (in 0 secs, 10000.0 rows/s)
                420000/421098 done (in 0 secs, 10000.0 rows/s)
Sat Apr 12 02:35:55 MEST 2008
       ... (committing nodes)
Sat Apr 12 02:35:55 MEST 2008
       ... rebuilding nested set left/right values
                10000 done (in 9 secs, 1111.1 rows/s)
                20000 done (in 9 secs, 1111.1 rows/s)
...
                410004 done (in 8 secs, 1250.0 rows/s)
                420004 done (in 9 secs, 1111.1 rows/s)
Sat Apr 12 02:41:54 MEST 2008
       ... reading in taxon names from names.dmp
Sat Apr 12 02:41:54 MEST 2008
       ... deleting old taxon names
Sat Apr 12 02:41:55 MEST 2008
       ... inserting new taxon names
                10000 done (in 5 secs, 2000.0 rows/s)
                20000 done (in 5 secs, 2000.0 rows/s)
...
                570000 done (in 6 secs, 1666.7 rows/s)
                580000 done (in 5 secs, 2000.0 rows/s)
Sat Apr 12 02:47:27 MEST 2008
       ... cleaning up
Sat Apr 12 02:47:27 MEST 2008
Done.
$ ls -la /var/log/mysql/mysql.log 
-rw-rw---- 1 mysql mysql 483443314 Apr 12 03:15 /var/log/mysql/mysql.log
$

Pentium4 M laptop, 1.8GHz, 1 GB RAM, mysql-5.0.56 with enabled
SQL text logging, the slow version of logging all SQL commands
compared to binary logging. The log was cleared before the tests.
I could provide some bits from the log or upload it somewhere
if anybody else would like to dig into the details.


I believe the recalculation step could be made faster. See what
happens:

                     31 Query       SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '1' ORDER BY ncbi_taxon_id
                     31 Query       SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '10239' ORDER BY ncbi_taxon_id
                     31 Query       SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12333' ORDER BY ncbi_taxon_id
                     31 Query       SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12335' ORDER BY ncbi_taxon_id
                     31 Query       UPDATE taxon SET left_value = NULL, right_value = NULL WHERE left_value = '4'
                     31 Query       UPDATE taxon SET left_value = NULL, right_value = NULL WHERE right_value = '5'
                     31 Query       UPDATE taxon SET left_value = '4', right_value = '5' WHERE taxon_id = '12335'
                     31 Query       SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12340' ORDER BY ncbi_taxon_id
                     31 Query       UPDATE taxon SET left_value = NULL, right_value = NULL WHERE left_value = '6'
                     31 Query       UPDATE taxon SET left_value = NULL, right_value = NULL WHERE right_value = '7'
                     31 Query       UPDATE taxon SET left_value = '6', right_value = '7' WHERE taxon_id = '12340'


The columns left_value and right_value have NULL value upon
the table is created, so no need to write again NULL into
them. This would mean writing a wrapper function which would
mimic update() but before doing that it would do 'SELECT * FROM',
compare the values with those to be written and include in the
final UPDATE statement only those columns for which values have
been changed. We use such a smart wrapper for our code in python.
;-)

When the columns for left and right are to be made NULL during
update of an existing database, I think it would be much faster
to drop the columns and re-create them again with NULL values.


I think it could be investigated more the possibility to create
empty taxon and taxon_name tables as MyISAM tables and only after
all the import and updates they could be converted into InnoDB
tables. One would have to probably think a bit more of the foreign
keys but it might be they would not even be lost during the conversion
back and forth.

Actually, easy to check. Dump your current taxon and taxon_name
tables (maybe even without sql data using --without-data), run
'ALTER TABLE taxon ... type=MyISAM'
followed by
'ALTER TABLE taxon ... type=InnoDB'
dump again the database structure and compare by diff with
the original.

But, time for sleep here.
Martin


From hlapp at gmx.net  Fri Apr 11 22:48:29 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 11 Apr 2008 22:48:29 -0400
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <48000772.4060903@ribosome.natur.cuni.cz>
References: <48000772.4060903@ribosome.natur.cuni.cz>
Message-ID: <82CF4290-2F0F-4E8D-86C9-B4C59072C74C@gmx.net>


On Apr 11, 2008, at 8:50 PM, Martin MOKREJ? wrote:
>>> On Apr 8, 2008, at 11:58 AM, aaron.j.mackey at gsk.com wrote:
>>> > I believe that the first thing the load_ncbi_taxonomy.pl script  
>>> > does is to
>>> > wipe out everything already in the table.
>>> That may have been in true in its beginnings but hasn't been for  
>>> a long time :-) It only updates changed nodes, adds new ones, and  
>>> deletes retired ones (unless you say --nodelete). The script does  
>>> recompute *all* nested set values, though.
>> Ahh right, I remember all that now.  It was the wiping out of the  
>> left/right values that I was thinking of.
>> Thanks,
>> -Aaron
>
> Hi,
>
> Maybe you have meant the other taxonomy loading script? ;-)
>
> http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/ 
> trunk/scripts/load_itis_taxonomy.pl

This loads a taxonomy too, but into the PhyloDB tables (because the  
ITIS taxonomy consists of multiple hierarchies, not a single one like  
NCBI).

>
> <quote>
> You can use this script to load the taxonomy data into a fresh  
> instance of
> biosql. Otherwise an already existing ITIS tree will be deleted first.
> </quote>
>
> I just don't understand why the very first sentence of the  
> documentation
> within the scripts says something about 'update':
>
> <quote>
> This script loads or updates a biosql schema with phylodb extension
> with the ITIS taxonomy as a phylogenetic trees, one tree for each
> kingdom.
> </quote>

The 'update' here is forward looking :) At this point there won't  
really be an update as any existing trees within the ITIS namespace  
are deleted first.

	-hilmar

>
>
> Regards,
> Martin
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Fri Apr 11 23:23:21 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 11 Apr 2008 23:23:21 -0400
Subject: [BioSQL-l] Loading sequences with novel NCBI taxon_id
In-Reply-To: <4800111E.3030802@ribosome.natur.cuni.cz>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>	<93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
	<CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
	<4800111E.3030802@ribosome.natur.cuni.cz>
Message-ID: <BC9F045A-AA5C-48A3-9DE4-289EA3F8628A@gmx.net>


On Apr 11, 2008, at 9:32 PM, Martin MOKREJ? wrote:
> I decided to re-import the same data to mimic at least somehow
> the future updates, although no record should be UPDATEd,
> except zapping left and right values with NULL. :((

Not sure what made you frown here?

> [...]
>
> I believe the recalculation step could be made faster. See what
> happens:
> [...]
> The columns left_value and right_value have NULL value upon
> the table is created, so no need to write again NULL into
> them.

But that's only true the first time you load. For almost all real  
databases, all except the first run of the script won't be able to  
take advantage of that.

> This would mean writing a wrapper function which would
> mimic update() but before doing that it would do 'SELECT * FROM',
> compare the values with those to be written and include in the
> final UPDATE statement only those columns for which values have
> been changed. We use such a smart wrapper for our code in python.
> ;-)

What you see is the "optimization" for MySQL. For all other RDBMSs it  
does both left and right in one update.

BTW note that SELECT does not have zero cost, it requires both an  
index and a table read, only to find on average 50% of the time that  
you will need to update anyway. So what you gain 50% of the time you  
lose the other 50% of the time.

>
> When the columns for left and right are to be made NULL during
> update of an existing database, I think it would be much faster
> to drop the columns and re-create them again with NULL values.

In terms of speed, that may be how MySQL works indeed. In PostgreSQL  
it would even be transactional (but very slow with concurrent  
queries), but with most databases you are now outside of a  
transaction (because it is DDL), which not only leaves the data in an  
inconsistent state, but also will immediately break any application  
you run against it because the table structure changed under its feet.


> [...] I think it could be investigated more the possibility to create
> empty taxon and taxon_name tables as MyISAM tables and only after
> all the import and updates they could be converted into InnoDB
> tables.

I'm sure there are lots of hacks and tricks that would make this  
faster for one particular RBDMS, and you are welcome to explore  
those. But the script is written to deal with several RDBMSs, and it  
does so as transactionally safe as possible. The assumption is that  
you are running this against a live database that is being queried  
concurrently.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Apr 12 14:10:44 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 12 Apr 2008 14:10:44 -0400
Subject: [BioSQL-l] personal vs list email
Message-ID: <E5D49A1A-24F0-4224-9980-30F418EED978@gmx.net>

I'm not sure why but I have received several Bioperl or BioSQL- 
related email inquiries directed to me *personally* over the past few  
weeks.

I have been responding as I get to them, but I feel that I am doing  
both the senders and this community a poor service, because sometimes  
someone else on the list could have responded much faster, and when I  
respond, others on the list who happen to be interested in the same  
question don't get to see the answer.

So from now on as a policy I will redirect *every* email sent to me  
personally and that asks a question related to one of the projects to  
the respective mailing list. If you don't want this, please  
conspicuously say so at the top of your email, and in that case if  
you do ask a project-related question be prepared to wait and to  
possibly needing to follow up.

As an aside, it's a pretty safe assumption to make that all other  
core developers, and quite possibly *all* developers are following a  
similar policy, whether expressly or not.

Isn't this somewhere in the FAQ too?

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sat Apr 12 16:17:43 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 12 Apr 2008 15:17:43 -0500
Subject: [BioSQL-l] personal vs list email
In-Reply-To: <E5D49A1A-24F0-4224-9980-30F418EED978@gmx.net>
References: <E5D49A1A-24F0-4224-9980-30F418EED978@gmx.net>
Message-ID: <E7962E90-8309-4ADA-B002-950793B61D74@uiuc.edu>


On Apr 12, 2008, at 1:10 PM, Hilmar Lapp wrote:

> I'm not sure why but I have received several Bioperl or BioSQL- 
> related email inquiries directed to me *personally* over the past  
> few weeks.
>
> I have been responding as I get to them, but I feel that I am doing  
> both the senders and this community a poor service, because  
> sometimes someone else on the list could have responded much faster,  
> and when I respond, others on the list who happen to be interested  
> in the same question don't get to see the answer.
>
> So from now on as a policy I will redirect *every* email sent to me  
> personally and that asks a question related to one of the projects  
> to the respective mailing list. If you don't want this, please  
> conspicuously say so at the top of your email, and in that case if  
> you do ask a project-related question be prepared to wait and to  
> possibly needing to follow up.
>
> As an aside, it's a pretty safe assumption to make that all other  
> core developers, and quite possibly *all* developers are following a  
> similar policy, whether expressly or not.

I agree; I'm sure several other core devs feel the same way.  I always  
try to forward these to the list if I feel it is more relevant there.

> Isn't this somewhere in the FAQ too?
>
> 	-hilmar

No, but I've added it to the bioperl FAQ; might be worth checking over  
and editing.

chris


From aaron.j.mackey at gsk.com  Mon Apr 14 09:00:52 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Mon, 14 Apr 2008 09:00:52 -0400
Subject: [BioSQL-l] [Bioperl-l] personal vs list email
In-Reply-To: <E5D49A1A-24F0-4224-9980-30F418EED978@gmx.net>
Message-ID: <OF3ED0BD19.1CBA005A-ON8525742B.00473A95-8525742B.00477DEC@gsk.com>

I try to take it even one step further: I require the person to re-ask 
their question on the mailing list (and then try to answer it there). This 
has the added benefit of causing the person to pause a moment to reflect 
on their question, and (sometimes) to spend a bit more time preparing the 
question for more broader public consumption.

-Aaron


From biopython at maubp.freeserve.co.uk  Wed Apr 23 05:04:33 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 23 Apr 2008 10:04:33 +0100
Subject: [BioSQL-l] BioSQL script to update taxon table left/right values
Message-ID: <320fb6e00804230204w5161e6afg143a66a3cbb8aa66@mail.gmail.com>

Dear list,

In addition to loading the NCBI taxonomy, the load_ncbi_taxonomy.pl
script also recalculates the left/right values.

Is there a separate BioSQL script which ONLY recalculates the left/right values?

I was asked this by a Biopython user.  Possible use-cases include
people using a non-NCBI taxonomy.

Peter

From hlapp at gmx.net  Wed Apr 23 10:14:19 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 23 Apr 2008 10:14:19 -0400
Subject: [BioSQL-l] BioSQL script to update taxon table left/right values
In-Reply-To: <320fb6e00804230204w5161e6afg143a66a3cbb8aa66@mail.gmail.com>
References: <320fb6e00804230204w5161e6afg143a66a3cbb8aa66@mail.gmail.com>
Message-ID: <57EF37D9-295A-495A-ADFD-E13722DBD642@gmx.net>

No there isn't but it's a good idea. Would you mind posting it as a  
BioSQL bug/feature request?

	-hilmar

On Apr 23, 2008, at 5:04 AM, Peter wrote:
> Dear list,
>
> In addition to loading the NCBI taxonomy, the load_ncbi_taxonomy.pl
> script also recalculates the left/right values.
>
> Is there a separate BioSQL script which ONLY recalculates the left/ 
> right values?
>
> I was asked this by a Biopython user.  Possible use-cases include
> people using a non-NCBI taxonomy.
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Wed Apr 23 11:41:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 23 Apr 2008 16:41:00 +0100
Subject: [BioSQL-l] BioSQL script to update taxon table left/right values
In-Reply-To: <57EF37D9-295A-495A-ADFD-E13722DBD642@gmx.net>
References: <320fb6e00804230204w5161e6afg143a66a3cbb8aa66@mail.gmail.com>
	<57EF37D9-295A-495A-ADFD-E13722DBD642@gmx.net>
Message-ID: <320fb6e00804230841p188b4897q6c7f08dcf1ead552@mail.gmail.com>

On Wed, Apr 23, 2008 at 3:14 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> No there isn't but it's a good idea. Would you mind posting it as a BioSQL
> bug/feature request?

Sure, I've filed an enhancement request:

Bug 2493 - New script to recalculate left/right values in the taxon table
http://bugzilla.open-bio.org/show_bug.cgi?id=2493

Peter

From mmokrejs at ribosome.natur.cuni.cz  Wed Apr 23 11:58:18 2008
From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=)
Date: Wed, 23 Apr 2008 17:58:18 +0200
Subject: [BioSQL-l] BioSQL script to update taxon table left/right values
In-Reply-To: <320fb6e00804230841p188b4897q6c7f08dcf1ead552@mail.gmail.com>
References: <320fb6e00804230204w5161e6afg143a66a3cbb8aa66@mail.gmail.com>	<57EF37D9-295A-495A-ADFD-E13722DBD642@gmx.net>
	<320fb6e00804230841p188b4897q6c7f08dcf1ead552@mail.gmail.com>
Message-ID: <480F5C9A.60007@ribosome.natur.cuni.cz>

I would just propose to make the current script more modular and provide
a command-line option argument which would just establish the database
handler and update the fields and close $dbh.
M.


Peter wrote:
> On Wed, Apr 23, 2008 at 3:14 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>> No there isn't but it's a good idea. Would you mind posting it as a BioSQL
>> bug/feature request?
> 
> Sure, I've filed an enhancement request:
> 
> Bug 2493 - New script to recalculate left/right values in the taxon table
> http://bugzilla.open-bio.org/show_bug.cgi?id=2493
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

From darin.london at duke.edu  Tue Apr 29 12:52:54 2008
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Tue, 29 Apr 2008 12:52:54 -0400
Subject: [BioSQL-l] BOSC 2008 Announcement and Call For Submissions
Message-ID: <200804291653.m3TGqsF5020841@tenero.duhs.duke.edu>


BOSC 2008 Call for Abstracts Reminder

The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008).

This is a reminder to submit your proposals for talks to the BOSC submission system before May 11.

Submission Process:
All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php).
The form will ask for a small Abstract Text to be pasted into it, and a full paper.  The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details)
Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom.  The full-length abstract should include the title, authors, and affiliations.  We prefer your abstract to be in PDF format, although plain t

Important Dates:
May 11: Abstract submission deadline.
June 2: Notification of accepted talks.
June 4: Early registration discount cut-off.
July 18-19: BOSC 2008!

We hope to see you at BOSC 2008!

Kam Dahlquist and Darin London
BOSC 2008 Co-organizers

			 
From stefan.guenther at charite.de  Wed Apr  2 09:35:05 2008
From: stefan.guenther at charite.de (Stefan Guenther)
Date: Wed, 02 Apr 2008 11:35:05 +0200
Subject: [BioSQL-l] swissprot - gene names
Message-ID: <47F35349.4010105@charite.de>

Hi,

I have uploaded the swissprot flatfile (uniprot_sprot.dat) into the 
biosql scheme using load_seqdatabase.pl. Now I'm searching for the 
swissprot gene names in the bioseqdb-tables. I cannot find them in the 
bioentry table. Aren't them included?

Stefan


From hlapp at gmx.net  Wed Apr  2 15:44:36 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 2 Apr 2008 11:44:36 -0400
Subject: [BioSQL-l] swissprot - gene names
In-Reply-To: <47F35349.4010105@charite.de>
References: <47F35349.4010105@charite.de>
Message-ID: <FEC1F993-DF82-4B0F-AAA5-587C7688167D@gmx.net>

Are you talking about the gene symbols and names in the GN line?  
These will be in bioentry_qualifier_value associations, with the tag  
being gene_name (I think; check your term table to be sure).

Let me know if that doesn't work.

	-hilmar

On Apr 2, 2008, at 5:35 AM, Stefan Guenther wrote:
> Hi,
>
> I have uploaded the swissprot flatfile (uniprot_sprot.dat) into the  
> biosql scheme using load_seqdatabase.pl. Now I'm searching for the  
> swissprot gene names in the bioseqdb-tables. I cannot find them in  
> the bioentry table. Aren't them included?
>
> Stefan
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From ericgibert at yahoo.fr  Fri Apr  4 13:43:34 2008
From: ericgibert at yahoo.fr (Eric Gibert)
Date: Fri, 4 Apr 2008 13:43:34 +0000 (GMT)
Subject: [BioSQL-l] left_value and right_value in taxon table
Message-ID: <942508.77770.qm@web26507.mail.ukl.yahoo.com>

Dear all,

I hope that I am not the 100th persons asking the following questions:
1) what are left and right values in the taxon table for?
2) How are they computed?

Thank you for your input or link to an explanation page.

Eric


      _____________________________________________________________________________ 
Envoyez avec Yahoo! Mail. Une boite mail plus intelligente http://mail.yahoo.fr


From hlapp at gmx.net  Fri Apr  4 22:40:56 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 4 Apr 2008 18:40:56 -0400
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <942508.77770.qm@web26507.mail.ukl.yahoo.com>
References: <942508.77770.qm@web26507.mail.ukl.yahoo.com>
Message-ID: <60C3976C-26BF-4981-9159-207835C9B5D0@gmx.net>

Hi Eric,

On Apr 4, 2008, at 9:43 AM, Eric Gibert wrote:
> Dear all,
>
> I hope that I am not the 100th persons asking the following questions:
> 1) what are left and right values in the taxon table for?

they hold the nested set values. Nested sets are enumeration  
algorithm described in Joe Celko's SQL for Smarties books, and Aaron  
Mackey gives a good introduction here:

http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html

(This is in the schema DDL file, though obviously should be  
documented better. Good candidate for an FAQ, I suppose.)

> 2) How are they computed

load_ncbi_taxonomy.pl recomputes them automatically after each  
update. It's a simple recursive depth-first graph traversal algorithm.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Tue Apr  8 15:24:41 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Apr 2008 16:24:41 +0100
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <60C3976C-26BF-4981-9159-207835C9B5D0@gmx.net>
References: <942508.77770.qm@web26507.mail.ukl.yahoo.com>
	<60C3976C-26BF-4981-9159-207835C9B5D0@gmx.net>
Message-ID: <320fb6e00804080824x2bd92d41p884c8a4a61c04702@mail.gmail.com>

> > Dear all,
> >
> > I hope that I am not the 100th persons asking the following questions:
> > 1) what are left and right values in the taxon table for?
> >
>
>  they hold the nested set values. Nested sets are enumeration algorithm
> described in Joe Celko's SQL for Smarties books, and Aaron Mackey gives a
> good introduction here:
>
>  http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html
>
>  (This is in the schema DDL file, though obviously should be documented
> better. Good candidate for an FAQ, I suppose.)

That link does a good job of explaining the idea.

> > 2) How are they computed
>
>  load_ncbi_taxonomy.pl recomputes them automatically after each update. It's
> a simple recursive depth-first graph traversal algorithm.

I have the impression the recomputation is slow, and also moderately
complex.  This is fine for a weekly (or even daily) update which runs
the load_ncbi_taxonomy.pl script.

We (Biopython) are interested in incremental updates triggered when a
new sequences is added to the database with a novel taxon id.  Eric is
looking at downloading the missing taxon data and updating the
taxon/taxon_name tables "on the fly", transparently to the user.

http://bugzilla.open-bio.org/show_bug.cgi?id=2475 (Biopython bug)

Hilmar, am I right in thinking the following:  Suppose when loading a
new sequence into the database with a novel NCBI taxon, we record a
new minimal taxon/taxon_names entry (without the lineage, a single
taxon entry with null left/right entries).  If the user then runs
load_ncbi_taxonomy.pl, assuming the NCBI's online database contains
the new taxon, will this update nicely?  i.e. When the new sequence is
retrieved from the database, its full lineage will be available.

Thanks

Peter


From aaron.j.mackey at gsk.com  Tue Apr  8 15:58:56 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 8 Apr 2008 11:58:56 -0400
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <320fb6e00804080824x2bd92d41p884c8a4a61c04702@mail.gmail.com>
Message-ID: <OF04FBA8CC.8898BA9E-ON85257425.005776DB-85257425.0057CB97@gsk.com>

I believe that the first thing the load_ncbi_taxonomy.pl script does is to 
wipe out everything already in the table.  So you're incremental update 
strategy (with deferred left/right calculation) won't work.

depending on the type of update you're making (e.g. you only add one new 
terminal taxonomic node, having no children), the incremental updates are 
pretty fast, computationally speaking (no tree traversal is required).  I 
won't be able to recite them off the top of my head, but Joe Celko's "SQL 
For Smarties" book has the necessary code.  In a nutshell, it's something 
like if the overall topology of the tree remains unchanged, you'll need to 
increment the right/left values of each node "to the right" of the new 
node you've inserted by 2, but it's a tiny bit more complicated than that.

-Aaron

biosql-l-bounces at lists.open-bio.org wrote on 04/08/2008 11:24:41 AM:

> > > Dear all,
> > >
> > > I hope that I am not the 100th persons asking the following 
questions:
> > > 1) what are left and right values in the taxon table for?
> > >
> >
> >  they hold the nested set values. Nested sets are enumeration 
algorithm
> > described in Joe Celko's SQL for Smarties books, and Aaron Mackey 
gives a
> > good introduction here:
> >
> >  http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html
> >
> >  (This is in the schema DDL file, though obviously should be 
documented
> > better. Good candidate for an FAQ, I suppose.)
> 
> That link does a good job of explaining the idea.
> 
> > > 2) How are they computed
> >
> >  load_ncbi_taxonomy.pl recomputes them automatically after each 
update. It's
> > a simple recursive depth-first graph traversal algorithm.
> 
> I have the impression the recomputation is slow, and also moderately
> complex.  This is fine for a weekly (or even daily) update which runs
> the load_ncbi_taxonomy.pl script.
> 
> We (Biopython) are interested in incremental updates triggered when a
> new sequences is added to the database with a novel taxon id.  Eric is
> looking at downloading the missing taxon data and updating the
> taxon/taxon_name tables "on the fly", transparently to the user.
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 (Biopython bug)
> 
> Hilmar, am I right in thinking the following:  Suppose when loading a
> new sequence into the database with a novel NCBI taxon, we record a
> new minimal taxon/taxon_names entry (without the lineage, a single
> taxon entry with null left/right entries).  If the user then runs
> load_ncbi_taxonomy.pl, assuming the NCBI's online database contains
> the new taxon, will this update nicely?  i.e. When the new sequence is
> retrieved from the database, its full lineage will be available.
> 
> Thanks
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 


From hlapp at gmx.net  Tue Apr  8 23:57:41 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 8 Apr 2008 19:57:41 -0400
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <OF04FBA8CC.8898BA9E-ON85257425.005776DB-85257425.0057CB97@gsk.com>
References: <OF04FBA8CC.8898BA9E-ON85257425.005776DB-85257425.0057CB97@gsk.com>
Message-ID: <5EDD43ED-57DD-4BAC-9A93-3B7DAE67ACB4@gmx.net>


On Apr 8, 2008, at 11:58 AM, aaron.j.mackey at gsk.com wrote:
> I believe that the first thing the load_ncbi_taxonomy.pl script  
> does is to
> wipe out everything already in the table.

That may have been in true in its beginnings but hasn't been for a  
long time :-) It only updates changed nodes, adds new ones, and  
deletes retired ones (unless you say --nodelete). The script does  
recompute *all* nested set values, though.

> [...]
> depending on the type of update you're making (e.g. you only add  
> one new
> terminal taxonomic node, having no children), the incremental  
> updates are
> pretty fast, computationally speaking (no tree traversal is  
> required).  I
> won't be able to recite them off the top of my head, but Joe  
> Celko's "SQL
> For Smarties" book has the necessary code.  In a nutshell, it's  
> something
> like if the overall topology of the tree remains unchanged, you'll  
> need to
> increment the right/left values of each node "to the right" of the new
> node you've inserted by 2, but it's a tiny bit more complicated  
> than that.

Though you can have very cheap cases indeed, in reality it turns out  
that on average you still need to traverse and update at least half  
of the nodes, so personally I really doubt you would save any  
significant amount of time by not just redoing all of them. And it's  
not that time-intensive either; typically it takes about 10-20mins,  
depending on CPU etc.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed Apr  9 04:31:31 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 9 Apr 2008 00:31:31 -0400
Subject: [BioSQL-l] BioSQL-1.0.0 + bioperl-db_1.5.2_100 + gbrowse
	question
In-Reply-To: <47FA895F.4080407@embarqmail.com>
References: <47F52F80.7030409@embarqmail.com>
	<E68ABF4F-D65B-43AF-9DC8-C554ACF1719D@gmx.net>
	<47FA895F.4080407@embarqmail.com>
Message-ID: <1094E323-C648-42AD-BE4B-623978FF7035@gmx.net>

Hi Doug,

thanks for your sleuthing work, it's much appreciated. I'm sure  
someone among the GBrowse crowd (I've copied the GBrowse mailing  
list) can make the fix according to what you found out, and they can  
help you out with the other problems you are having too if you  
elaborate what the issues are that you are seeing.

As for the path tables, which ones are you referring to? I'm not sure  
the GBrowse BioSQL adapter uses any of them, so even if you do  
populate them it wouldn't change much. There is the load_ontology.pl  
script in Bioperl-db that can automatically compute the transitive  
closure for ontology terms, but as far as I am aware the feature or  
sequence loading scripts don't do that for seqfeature_path or  
bioentry_path, respectively.

	-hilmar

On Apr 7, 2008, at 4:51 PM, doug brown wrote:

> Hi Hilmar,
>
>   Thank you for your reply.
>
>   After much gnashing of teeth and much more code trawling, it  
> seems as it the basic problem I encountered was indeed one of  
> configuration.  The current coding of  
> Bio::DB::Das::BioSQL::BioDatabaseAdaptor uses 'location' as the  
> keyword for representing the database host. The GFF adapter used,  
> and I guess still uses, 'host' for that purpose. The distributed  
> sample file 06.biosql.conf specified 'host' rather than 'location'.
> Here is the correct configuration file section that now works for me:
>
> description = Magnaporthe grisea V5 genbank BioSQL
> db_args     = driver    mysql
>               dbname    M_grisea_genbank_biosql
>               location  mycelium.fgl.ncsu.edu
>               user      www
>               pass      ""
>               namespace genbank
>               version   1
>
> gbrowse starts up OK now. Could you please tell me whom would be  
> the responsible party so that I can send him/her a note?  There are  
> also other problems with the file that I am trying to resolve.  
> Thank you.
>
> Now I need to figure out how to get gbrowse to display all of my  
> features (I finally have a database rich enough to represent them  
> but not so complex as to be unwieldy) .....
>
> Oh, I noticed that the various path tables are not populated and  
> one piece of documentation said something about "... database  
> dependent...". Could you point me to any code samples that populate  
> those tables? Or, perhaps, pointers to specific mailing list  
> message chains even.
>
> In general, any clues or bread crumbs would be greatly appreciated.
>
> Regards,
> Doug Brown
>
> Hilmar Lapp wrote:
>>
>> Hi Doug,
>>
>> I'm not exactly sure what the problem is but the error you are  
>> seeing is raised by code with GBrowse. I'd recommend that you post  
>> this to the GBrowse list. I have tried to trace the meaning of the  
>> Gbrowse conf parameters, and it seems to me that biodbname is  
>> obsolete and should be namespace instead. However, I also can't  
>> find where the error is being generated using the error message as  
>> guide, so I suspect you are using an outdated version of GBrowse.  
>> The folks on the Gbrowse list should be able to tell you more  
>> specifics, though.
>>
>>  -hilmar
>>
>> On Apr 3, 2008, at 3:26 PM, doug brown wrote:
>>> Hello Hilmar,
>>>
>>>   First off, congratulations for achieving the version 1.0  
>>> release of BioSQL.
>>>
>>> I am attempting to get BioSQL working with bioperl and gbrowse.  
>>> All are, I believe, the most recent versions of the software.  
>>> Unfortunately, I cam running into problems with getting gbrowse  
>>> to access the BioSQL database.
>>>
>>> It is my sincere hope that my problem is a configuration issue.  
>>> However, after trying multiple permutations of gbrowse  
>>> configuration params and much trawling through the bioperl code,  
>>> I am unable to resolve the problem. Could you take a moment and  
>>> see if there is an obvious solution to my problem?
>>>
>>> I have long awaited the 1.0 mature release of BioSQL and I am  
>>> eager to start using it in place my my ad hoc in-house databases!
>>>
>>> Here is the error from the apache log files:
>>> ------------- EXCEPTION  -------------
>>> MSG: error while executing query in  
>>> Bio::DB::Das::BioSQL::PartialSeqAdaptor::find_by_query: No  
>>> database selected
>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_query / 
>>> Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1248
>>> STACK  
>>> Bio::DB::Das::BioSQL::BioDatabaseAdaptor::fetch_Seq_by_accession / 
>>> Library/Perl/5.8.6/darwin-thread-multi-2level/Bio/DB/Das/BioSQL/ 
>>> BioDatabaseAdaptor.pm:120
>>> STACK Bio::DB::Das::BioSQL::get_feature_by_name /Library/Perl/ 
>>> 5.8.6/darwin-thread-multi-2level/Bio/DB/Das/BioSQL.pm:220
>>> STACK Bio::Graphics::Browser::_feature_get /Library/Perl/5.8.6/ 
>>> darwin-thread-multi-2level/Bio/Graphics/Browser.pm:1884
>>> STACK Bio::Graphics::Browser::name2segments /Library/Perl/5.8.6/ 
>>> darwin-thread-multi-2level/Bio/Graphics/Browser.pm:1828
>>> STACK main::lookup_features_from_db /Library/WebServer/CGI- 
>>> Executables/gbrowse:1677
>>> STACK main::get_features /Library/WebServer/CGI-Executables/ 
>>> gbrowse:1577
>>> STACK toplevel /Library/WebServer/CGI-Executables/gbrowse:182
>>>
>>> My configuration file:
>>> [GENERAL]
>>> # based on 06.biosql.conf,v 1.2.6.2.2.1 2006/06/07 20:50:29 which  
>>> was obtained
>>> # from the basic gbrowse installation
>>> description = Magnaporthe grisea V5 genbank BioSQL
>>> db_adaptor  = Bio::DB::Das::BioSQL
>>> #nb: the dashes are required and were missing in the original
>>> db_args     = driver    mysql
>>>           -dbname    M_grisea_genbank_biosql
>>>           -namespace bioperl
>>>           -biodbname genbank
>>>           -version     1
>>>           -host      mycelium.fgl.ncsu.edu
>>>           -user      www
>>>           -pass      ""
>>>           -port 3306
>>>
>>> plugins =  FastaDumper RestrictionAnnotator
>>> # deb 2-apr-08 SequenceDumper can't be found. So, dont use it.
>>> #SequenceDumper
>>>
>>> ... remainder is unchanged ...
>>>
>>> database creation and load:
>>> mysql -udebrown -p -h mycelium.fgl.ncsu.edu
>>> drop database M_grisea_genbank_biosql;
>>> create database M_grisea_genbank_biosql;
>>> grant select on M_grisea_genbank_biosql.* to  
>>> www at marray.fgl.ncsu.edu;
>>> use M_grisea_genbank_biosql;
>>>
>>> mysql -udebrown -p -h mycelium.fgl.ncsu.edu   
>>> M_grisea_genbank_biosql <C:\installKits\biosql-1.0.0\sql\biosqldb- 
>>> mysql.sql
>>>
>>> C:\Perl\site\bin\load_seqdatabase.bat\load_seqdatabase.bat  --dsn  
>>> "dbi:mysql:database=M_grisea_genbank_biosql;host=mycelium.fgl.ncsu.e 
>>> du" -dbuser debrown --dbpass XXXXXXX  --format genbank genbank 
>>> \CH476760.gb --namespace genbank
>>>
>>> machine (node) layout:
>>> marray.fgl.ncsu.edu is the web server
>>> mycelium.fgl.ncsu.edu is the database server
>>> dougslaptop.fgl.ncsu.edu is a development system.
>>>
>>>
>>> Thank you for your time,
>>>
>>> Regards,
>>> Doug Brown
>>> -- 
>>> Doug Brown - Bioinformatics
>>> Fungal Genomics Laboratory
>>> Center for Integrated Fungal Research
>>> North Carolina State University
>>> Campus Box 7251, Raleigh, NC 27695-7251
>>> https://www.fungalgenomics.ncsu.edu/~debrown/
>>> Tel: (919) 513-0394, Fax (919) 513-0024
>>> e-mail: doug_brown at ncsu.edu
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
> -- 
> Doug Brown - Bioinformatics
> Fungal Genomics Laboratory
> Center for Integrated Fungal Research
> North Carolina State University
> Campus Box 7251, Raleigh, NC 27695-7251
> https://www.fungalgenomics.ncsu.edu/~debrown/
> Tel: (919) 513-0394, Fax (919) 513-0024
> e-mail: doug_brown at ncsu.edu

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Wed Apr  9 10:02:27 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Apr 2008 11:02:27 +0100
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <5EDD43ED-57DD-4BAC-9A93-3B7DAE67ACB4@gmx.net>
References: <OF04FBA8CC.8898BA9E-ON85257425.005776DB-85257425.0057CB97@gsk.com>
	<5EDD43ED-57DD-4BAC-9A93-3B7DAE67ACB4@gmx.net>
Message-ID: <320fb6e00804090302i5ab447acx912ce205c5580b4@mail.gmail.com>

On Wed, Apr 9, 2008 at 12:57 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
>  The [load_ncbi_taxonomy.pl] script does recompute *all*
>  nested set values, though...
>  Though you can have very cheap cases indeed, in reality it
>  turns out that on average you still need to traverse and update
>  at least half of the nodes, so personally I really doubt you
>  would save any significant amount of time by not just redoing
>  all of them. And it's not that time-intensive either;  typically it
>  takes about 10-20mins, depending on CPU etc.

This does mean that in general, trying to fully update the taxon table
when adding a new sequence with a novel NCBI taxon id would take at
least 10mins (in addition to the drawback of having the Bio* project
reimplement much of the load_ncbi_taxonomy.pl script's logic).

This probably helps explain why when the NCBI taxon ID wasn't already
defined, the old Biopython code would actually create new taxon table
entries for the entire lineage (based on the species lineage names in
a GenBank file) without linking into any existing taxon table entries
which may have matched.  Because these new entries were independent of
everything else, their left/right values could be calculated trivially
(starting above the largest existing left/right value). This had the
advantage of recording as much information as possible (without having
to use load_ncbi_taxonomy.pl at all), but left the taxon table full of
redundant entries.

I think that in this case, when trying to load a sequence with a novel
NCBI taxon id, the best solution may be just to add a single minimal
taxon table entry with NULL left/right values (and let the
load_ncbi_taxonomy.pl fill in the lineage later).

Peter


From aaron.j.mackey at gsk.com  Wed Apr  9 12:45:43 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Wed, 9 Apr 2008 08:45:43 -0400
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <5EDD43ED-57DD-4BAC-9A93-3B7DAE67ACB4@gmx.net>
Message-ID: <OFA9D0B6AA.59010AAB-ON85257426.0045DC28-85257426.00461B57@gsk.com>

> On Apr 8, 2008, at 11:58 AM, aaron.j.mackey at gsk.com wrote:
> > I believe that the first thing the load_ncbi_taxonomy.pl script 
> > does is to
> > wipe out everything already in the table.
> 
> That may have been in true in its beginnings but hasn't been for a 
> long time :-) It only updates changed nodes, adds new ones, and 
> deletes retired ones (unless you say --nodelete). The script does 
> recompute *all* nested set values, though.

Ahh right, I remember all that now.  It was the wiping out of the 
left/right values that I was thinking of.

Thanks,

-Aaron


From mmokrejs at ribosome.natur.cuni.cz  Sat Apr 12 00:50:58 2008
From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=)
Date: Sat, 12 Apr 2008 02:50:58 +0200
Subject: [BioSQL-l] left_value and right_value in taxon table
Message-ID: <48000772.4060903@ribosome.natur.cuni.cz>

>> On Apr 8, 2008, at 11:58 AM, aaron.j.mackey at gsk.com wrote:
>> > I believe that the first thing the load_ncbi_taxonomy.pl script 
>> > does is to
>> > wipe out everything already in the table.
>> 
>> That may have been in true in its beginnings but hasn't been for a 
>> long time :-) It only updates changed nodes, adds new ones, and 
>> deletes retired ones (unless you say --nodelete). The script does 
>> recompute *all* nested set values, though.
> 
> Ahh right, I remember all that now.  It was the wiping out of the 
> left/right values that I was thinking of.
> 
> Thanks,
> 
> -Aaron

Hi,

Maybe you have meant the other taxonomy loading script? ;-)

http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/scripts/load_itis_taxonomy.pl

<quote>
You can use this script to load the taxonomy data into a fresh instance of
biosql. Otherwise an already existing ITIS tree will be deleted first.
</quote>

I just don't understand why the very first sentence of the documentation
within the scripts says something about 'update':

<quote>
This script loads or updates a biosql schema with phylodb extension
with the ITIS taxonomy as a phylogenetic trees, one tree for each
kingdom.
</quote>


Regards,
Martin


From mmokrejs at ribosome.natur.cuni.cz  Sat Apr 12 01:32:14 2008
From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=)
Date: Sat, 12 Apr 2008 03:32:14 +0200
Subject: [BioSQL-l] [Bioperl-l] Loading sequences with novel NCBI
	taxon_id
In-Reply-To: <CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>	<93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
	<CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
Message-ID: <4800111E.3030802@ribosome.natur.cuni.cz>

Chris Fields wrote:
> The counter to that perspective (using new sequences with old tax info) 
> would be to regularly update NCBI taxonomy, particularly in 
> circumstances prior to adding new sequences.  Hilmar mentioned that once 
> tax is loaded it doesn't take as long to update, so you could set up a 
> cron job to update regularly.
> 
> I remember someone mentioning weekly or monthly updates on the list 
> quite a while ago, but I'm unsure how often NCBI updates tax information 
> (i.e. with every release, monthly, weekly, etc).  I can see instances 
> popping up where you used the an up-to-date taxonomy but a new sequence 
> contains a tax ID not present.  I think bioperl-db handles these but I'm 
> not sure what other Bio* do.
> 

I spent some time benchmarking this and inspecting the mysql log files.
The current load_ncbi_taxonomy.pl script with minor modification to
show timestamps does this on initial import into mysql and then update
of the database using exactly same dataset (but anyway it has to walk
through all the data):

$ ./load_ncbi_taxonomy.pl --dbname=biosqldb --driver=mysql --host=127.0.01 \
  --port=3306 --directory=/home/mmokrejs/bioinformatics/databases/ncbitax/dump \
  --chunksize=0 --verbose=2 --mycnf=~/.my.cnf
Sat Apr 12 01:58:43 MEST 2008
Loading NCBI taxon database in /home/mmokrejs/bioinformatics/databases/ncbitax/dump:
       ... retrieving all taxon nodes in the database
Sat Apr 12 01:58:43 MEST 2008
       ... reading in taxon nodes from nodes.dmp
Sat Apr 12 01:58:58 MEST 2008
       ... insert / update / delete taxon nodes
                10000/421098 done (in 5 secs, 2000.0 rows/s)
                20000/421098 done (in 4 secs, 2500.0 rows/s)
...
                420000/421098 done (in 4 secs, 2500.0 rows/s)
Sat Apr 12 02:02:21 MEST 2008
       ... (committing nodes)
Sat Apr 12 02:02:21 MEST 2008
       ... rebuilding nested set left/right values
                10000 done (in 24 secs, 416.7 rows/s)
                20000 done (in 26 secs, 384.6 rows/s)
                30000 done (in 24 secs, 416.7 rows/s)
...
                420004 done (in 23 secs, 434.8 rows/s)
Sat Apr 12 02:19:25 MEST 2008
       ... reading in taxon names from names.dmp
Sat Apr 12 02:19:25 MEST 2008
       ... deleting old taxon names
Sat Apr 12 02:19:25 MEST 2008
       ... inserting new taxon names
                10000 done (in 8 secs, 1250.0 rows/s)
                20000 done (in 8 secs, 1250.0 rows/s)
...
                580000 done (in 5 secs, 2000.0 rows/s)
Sat Apr 12 02:24:48 MEST 2008
       ... cleaning up
Sat Apr 12 02:24:49 MEST 2008
Done.
$


I decided to re-import the same data to mimic at least somehow
the future updates, although no record should be UPDATEd,
except zapping left and right values with NULL. :((

$ ./load_ncbi_taxonomy.pl --dbname=biosqldb --driver=mysql --host=127.0.01
  --port=3306 --directory=/home/mmokrejs/bioinformatics/databases/ncbitax/dump \
  --chunksize=0 --verbose=2 --mycnf=~/.my.cnf
Sat Apr 12 02:35:20 MEST 2008
Loading NCBI taxon database in /home/mmokrejs/bioinformatics/databases/ncbitax/dump:
        ... retrieving all taxon nodes in the database
Sat Apr 12 02:35:26 MEST 2008
       ... reading in taxon nodes from nodes.dmp
Sat Apr 12 02:35:46 MEST 2008
       ... insert / update / delete taxon nodes
                10000/421098 done (in 0 secs, 10000.0 rows/s)
                20000/421098 done (in 0 secs, 10000.0 rows/s)
...
                410000/421098 done (in 0 secs, 10000.0 rows/s)
                420000/421098 done (in 0 secs, 10000.0 rows/s)
Sat Apr 12 02:35:55 MEST 2008
       ... (committing nodes)
Sat Apr 12 02:35:55 MEST 2008
       ... rebuilding nested set left/right values
                10000 done (in 9 secs, 1111.1 rows/s)
                20000 done (in 9 secs, 1111.1 rows/s)
...
                410004 done (in 8 secs, 1250.0 rows/s)
                420004 done (in 9 secs, 1111.1 rows/s)
Sat Apr 12 02:41:54 MEST 2008
       ... reading in taxon names from names.dmp
Sat Apr 12 02:41:54 MEST 2008
       ... deleting old taxon names
Sat Apr 12 02:41:55 MEST 2008
       ... inserting new taxon names
                10000 done (in 5 secs, 2000.0 rows/s)
                20000 done (in 5 secs, 2000.0 rows/s)
...
                570000 done (in 6 secs, 1666.7 rows/s)
                580000 done (in 5 secs, 2000.0 rows/s)
Sat Apr 12 02:47:27 MEST 2008
       ... cleaning up
Sat Apr 12 02:47:27 MEST 2008
Done.
$ ls -la /var/log/mysql/mysql.log 
-rw-rw---- 1 mysql mysql 483443314 Apr 12 03:15 /var/log/mysql/mysql.log
$

Pentium4 M laptop, 1.8GHz, 1 GB RAM, mysql-5.0.56 with enabled
SQL text logging, the slow version of logging all SQL commands
compared to binary logging. The log was cleared before the tests.
I could provide some bits from the log or upload it somewhere
if anybody else would like to dig into the details.


I believe the recalculation step could be made faster. See what
happens:

                     31 Query       SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '1' ORDER BY ncbi_taxon_id
                     31 Query       SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '10239' ORDER BY ncbi_taxon_id
                     31 Query       SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12333' ORDER BY ncbi_taxon_id
                     31 Query       SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12335' ORDER BY ncbi_taxon_id
                     31 Query       UPDATE taxon SET left_value = NULL, right_value = NULL WHERE left_value = '4'
                     31 Query       UPDATE taxon SET left_value = NULL, right_value = NULL WHERE right_value = '5'
                     31 Query       UPDATE taxon SET left_value = '4', right_value = '5' WHERE taxon_id = '12335'
                     31 Query       SELECT taxon_id, left_value, right_value FROM taxon WHERE parent_taxon_id = '12340' ORDER BY ncbi_taxon_id
                     31 Query       UPDATE taxon SET left_value = NULL, right_value = NULL WHERE left_value = '6'
                     31 Query       UPDATE taxon SET left_value = NULL, right_value = NULL WHERE right_value = '7'
                     31 Query       UPDATE taxon SET left_value = '6', right_value = '7' WHERE taxon_id = '12340'


The columns left_value and right_value have NULL value upon
the table is created, so no need to write again NULL into
them. This would mean writing a wrapper function which would
mimic update() but before doing that it would do 'SELECT * FROM',
compare the values with those to be written and include in the
final UPDATE statement only those columns for which values have
been changed. We use such a smart wrapper for our code in python.
;-)

When the columns for left and right are to be made NULL during
update of an existing database, I think it would be much faster
to drop the columns and re-create them again with NULL values.


I think it could be investigated more the possibility to create
empty taxon and taxon_name tables as MyISAM tables and only after
all the import and updates they could be converted into InnoDB
tables. One would have to probably think a bit more of the foreign
keys but it might be they would not even be lost during the conversion
back and forth.

Actually, easy to check. Dump your current taxon and taxon_name
tables (maybe even without sql data using --without-data), run
'ALTER TABLE taxon ... type=MyISAM'
followed by
'ALTER TABLE taxon ... type=InnoDB'
dump again the database structure and compare by diff with
the original.

But, time for sleep here.
Martin


From hlapp at gmx.net  Sat Apr 12 02:48:29 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 11 Apr 2008 22:48:29 -0400
Subject: [BioSQL-l] left_value and right_value in taxon table
In-Reply-To: <48000772.4060903@ribosome.natur.cuni.cz>
References: <48000772.4060903@ribosome.natur.cuni.cz>
Message-ID: <82CF4290-2F0F-4E8D-86C9-B4C59072C74C@gmx.net>


On Apr 11, 2008, at 8:50 PM, Martin MOKREJ? wrote:
>>> On Apr 8, 2008, at 11:58 AM, aaron.j.mackey at gsk.com wrote:
>>> > I believe that the first thing the load_ncbi_taxonomy.pl script  
>>> > does is to
>>> > wipe out everything already in the table.
>>> That may have been in true in its beginnings but hasn't been for  
>>> a long time :-) It only updates changed nodes, adds new ones, and  
>>> deletes retired ones (unless you say --nodelete). The script does  
>>> recompute *all* nested set values, though.
>> Ahh right, I remember all that now.  It was the wiping out of the  
>> left/right values that I was thinking of.
>> Thanks,
>> -Aaron
>
> Hi,
>
> Maybe you have meant the other taxonomy loading script? ;-)
>
> http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/ 
> trunk/scripts/load_itis_taxonomy.pl

This loads a taxonomy too, but into the PhyloDB tables (because the  
ITIS taxonomy consists of multiple hierarchies, not a single one like  
NCBI).

>
> <quote>
> You can use this script to load the taxonomy data into a fresh  
> instance of
> biosql. Otherwise an already existing ITIS tree will be deleted first.
> </quote>
>
> I just don't understand why the very first sentence of the  
> documentation
> within the scripts says something about 'update':
>
> <quote>
> This script loads or updates a biosql schema with phylodb extension
> with the ITIS taxonomy as a phylogenetic trees, one tree for each
> kingdom.
> </quote>

The 'update' here is forward looking :) At this point there won't  
really be an update as any existing trees within the ITIS namespace  
are deleted first.

	-hilmar

>
>
> Regards,
> Martin
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Apr 12 03:23:21 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 11 Apr 2008 23:23:21 -0400
Subject: [BioSQL-l] Loading sequences with novel NCBI taxon_id
In-Reply-To: <4800111E.3030802@ribosome.natur.cuni.cz>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>	<93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
	<CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
	<4800111E.3030802@ribosome.natur.cuni.cz>
Message-ID: <BC9F045A-AA5C-48A3-9DE4-289EA3F8628A@gmx.net>


On Apr 11, 2008, at 9:32 PM, Martin MOKREJ? wrote:
> I decided to re-import the same data to mimic at least somehow
> the future updates, although no record should be UPDATEd,
> except zapping left and right values with NULL. :((

Not sure what made you frown here?

> [...]
>
> I believe the recalculation step could be made faster. See what
> happens:
> [...]
> The columns left_value and right_value have NULL value upon
> the table is created, so no need to write again NULL into
> them.

But that's only true the first time you load. For almost all real  
databases, all except the first run of the script won't be able to  
take advantage of that.

> This would mean writing a wrapper function which would
> mimic update() but before doing that it would do 'SELECT * FROM',
> compare the values with those to be written and include in the
> final UPDATE statement only those columns for which values have
> been changed. We use such a smart wrapper for our code in python.
> ;-)

What you see is the "optimization" for MySQL. For all other RDBMSs it  
does both left and right in one update.

BTW note that SELECT does not have zero cost, it requires both an  
index and a table read, only to find on average 50% of the time that  
you will need to update anyway. So what you gain 50% of the time you  
lose the other 50% of the time.

>
> When the columns for left and right are to be made NULL during
> update of an existing database, I think it would be much faster
> to drop the columns and re-create them again with NULL values.

In terms of speed, that may be how MySQL works indeed. In PostgreSQL  
it would even be transactional (but very slow with concurrent  
queries), but with most databases you are now outside of a  
transaction (because it is DDL), which not only leaves the data in an  
inconsistent state, but also will immediately break any application  
you run against it because the table structure changed under its feet.


> [...] I think it could be investigated more the possibility to create
> empty taxon and taxon_name tables as MyISAM tables and only after
> all the import and updates they could be converted into InnoDB
> tables.

I'm sure there are lots of hacks and tricks that would make this  
faster for one particular RBDMS, and you are welcome to explore  
those. But the script is written to deal with several RDBMSs, and it  
does so as transactionally safe as possible. The assumption is that  
you are running this against a live database that is being queried  
concurrently.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Apr 12 18:10:44 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 12 Apr 2008 14:10:44 -0400
Subject: [BioSQL-l] personal vs list email
Message-ID: <E5D49A1A-24F0-4224-9980-30F418EED978@gmx.net>

I'm not sure why but I have received several Bioperl or BioSQL- 
related email inquiries directed to me *personally* over the past few  
weeks.

I have been responding as I get to them, but I feel that I am doing  
both the senders and this community a poor service, because sometimes  
someone else on the list could have responded much faster, and when I  
respond, others on the list who happen to be interested in the same  
question don't get to see the answer.

So from now on as a policy I will redirect *every* email sent to me  
personally and that asks a question related to one of the projects to  
the respective mailing list. If you don't want this, please  
conspicuously say so at the top of your email, and in that case if  
you do ask a project-related question be prepared to wait and to  
possibly needing to follow up.

As an aside, it's a pretty safe assumption to make that all other  
core developers, and quite possibly *all* developers are following a  
similar policy, whether expressly or not.

Isn't this somewhere in the FAQ too?

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sat Apr 12 20:17:43 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 12 Apr 2008 15:17:43 -0500
Subject: [BioSQL-l] personal vs list email
In-Reply-To: <E5D49A1A-24F0-4224-9980-30F418EED978@gmx.net>
References: <E5D49A1A-24F0-4224-9980-30F418EED978@gmx.net>
Message-ID: <E7962E90-8309-4ADA-B002-950793B61D74@uiuc.edu>


On Apr 12, 2008, at 1:10 PM, Hilmar Lapp wrote:

> I'm not sure why but I have received several Bioperl or BioSQL- 
> related email inquiries directed to me *personally* over the past  
> few weeks.
>
> I have been responding as I get to them, but I feel that I am doing  
> both the senders and this community a poor service, because  
> sometimes someone else on the list could have responded much faster,  
> and when I respond, others on the list who happen to be interested  
> in the same question don't get to see the answer.
>
> So from now on as a policy I will redirect *every* email sent to me  
> personally and that asks a question related to one of the projects  
> to the respective mailing list. If you don't want this, please  
> conspicuously say so at the top of your email, and in that case if  
> you do ask a project-related question be prepared to wait and to  
> possibly needing to follow up.
>
> As an aside, it's a pretty safe assumption to make that all other  
> core developers, and quite possibly *all* developers are following a  
> similar policy, whether expressly or not.

I agree; I'm sure several other core devs feel the same way.  I always  
try to forward these to the list if I feel it is more relevant there.

> Isn't this somewhere in the FAQ too?
>
> 	-hilmar

No, but I've added it to the bioperl FAQ; might be worth checking over  
and editing.

chris


From aaron.j.mackey at gsk.com  Mon Apr 14 13:00:52 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Mon, 14 Apr 2008 09:00:52 -0400
Subject: [BioSQL-l] [Bioperl-l] personal vs list email
In-Reply-To: <E5D49A1A-24F0-4224-9980-30F418EED978@gmx.net>
Message-ID: <OF3ED0BD19.1CBA005A-ON8525742B.00473A95-8525742B.00477DEC@gsk.com>

I try to take it even one step further: I require the person to re-ask 
their question on the mailing list (and then try to answer it there). This 
has the added benefit of causing the person to pause a moment to reflect 
on their question, and (sometimes) to spend a bit more time preparing the 
question for more broader public consumption.

-Aaron


From biopython at maubp.freeserve.co.uk  Wed Apr 23 09:04:33 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 23 Apr 2008 10:04:33 +0100
Subject: [BioSQL-l] BioSQL script to update taxon table left/right values
Message-ID: <320fb6e00804230204w5161e6afg143a66a3cbb8aa66@mail.gmail.com>

Dear list,

In addition to loading the NCBI taxonomy, the load_ncbi_taxonomy.pl
script also recalculates the left/right values.

Is there a separate BioSQL script which ONLY recalculates the left/right values?

I was asked this by a Biopython user.  Possible use-cases include
people using a non-NCBI taxonomy.

Peter


From hlapp at gmx.net  Wed Apr 23 14:14:19 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 23 Apr 2008 10:14:19 -0400
Subject: [BioSQL-l] BioSQL script to update taxon table left/right values
In-Reply-To: <320fb6e00804230204w5161e6afg143a66a3cbb8aa66@mail.gmail.com>
References: <320fb6e00804230204w5161e6afg143a66a3cbb8aa66@mail.gmail.com>
Message-ID: <57EF37D9-295A-495A-ADFD-E13722DBD642@gmx.net>

No there isn't but it's a good idea. Would you mind posting it as a  
BioSQL bug/feature request?

	-hilmar

On Apr 23, 2008, at 5:04 AM, Peter wrote:
> Dear list,
>
> In addition to loading the NCBI taxonomy, the load_ncbi_taxonomy.pl
> script also recalculates the left/right values.
>
> Is there a separate BioSQL script which ONLY recalculates the left/ 
> right values?
>
> I was asked this by a Biopython user.  Possible use-cases include
> people using a non-NCBI taxonomy.
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Wed Apr 23 15:41:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 23 Apr 2008 16:41:00 +0100
Subject: [BioSQL-l] BioSQL script to update taxon table left/right values
In-Reply-To: <57EF37D9-295A-495A-ADFD-E13722DBD642@gmx.net>
References: <320fb6e00804230204w5161e6afg143a66a3cbb8aa66@mail.gmail.com>
	<57EF37D9-295A-495A-ADFD-E13722DBD642@gmx.net>
Message-ID: <320fb6e00804230841p188b4897q6c7f08dcf1ead552@mail.gmail.com>

On Wed, Apr 23, 2008 at 3:14 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> No there isn't but it's a good idea. Would you mind posting it as a BioSQL
> bug/feature request?

Sure, I've filed an enhancement request:

Bug 2493 - New script to recalculate left/right values in the taxon table
http://bugzilla.open-bio.org/show_bug.cgi?id=2493

Peter


From mmokrejs at ribosome.natur.cuni.cz  Wed Apr 23 15:58:18 2008
From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=)
Date: Wed, 23 Apr 2008 17:58:18 +0200
Subject: [BioSQL-l] BioSQL script to update taxon table left/right values
In-Reply-To: <320fb6e00804230841p188b4897q6c7f08dcf1ead552@mail.gmail.com>
References: <320fb6e00804230204w5161e6afg143a66a3cbb8aa66@mail.gmail.com>	<57EF37D9-295A-495A-ADFD-E13722DBD642@gmx.net>
	<320fb6e00804230841p188b4897q6c7f08dcf1ead552@mail.gmail.com>
Message-ID: <480F5C9A.60007@ribosome.natur.cuni.cz>

I would just propose to make the current script more modular and provide
a command-line option argument which would just establish the database
handler and update the fields and close $dbh.
M.


Peter wrote:
> On Wed, Apr 23, 2008 at 3:14 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>> No there isn't but it's a good idea. Would you mind posting it as a BioSQL
>> bug/feature request?
> 
> Sure, I've filed an enhancement request:
> 
> Bug 2493 - New script to recalculate left/right values in the taxon table
> http://bugzilla.open-bio.org/show_bug.cgi?id=2493
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l


From darin.london at duke.edu  Tue Apr 29 16:52:54 2008
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Tue, 29 Apr 2008 12:52:54 -0400
Subject: [BioSQL-l] BOSC 2008 Announcement and Call For Submissions
Message-ID: <200804291653.m3TGqsF5020841@tenero.duhs.duke.edu>


BOSC 2008 Call for Abstracts Reminder

The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008).

This is a reminder to submit your proposals for talks to the BOSC submission system before May 11.

Submission Process:
All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php).
The form will ask for a small Abstract Text to be pasted into it, and a full paper.  The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details)
Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom.  The full-length abstract should include the title, authors, and affiliations.  We prefer your abstract to be in PDF format, although plain t

Important Dates:
May 11: Abstract submission deadline.
June 2: Notification of accepted talks.
June 4: Early registration discount cut-off.
July 18-19: BOSC 2008!

We hope to see you at BOSC 2008!

Kam Dahlquist and Darin London
BOSC 2008 Co-organizers