From biopython at maubp.freeserve.co.uk Thu Oct 28 12:54:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 17:54:47 +0100 Subject: [BioSQL-l] SQLite support In-Reply-To: <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> Message-ID: On Tue, Jul 20, 2010 at 10:50 PM, Hilmar Lapp wrote: > > > On Jul 20, 2010, at 4:12 PM, Peter wrote: > >> Did you guys manage to sit down together to look at the BioSQL >> on SQLite3 schema during BOSC/ISMB? > > > Yes. I not os much, sadly, but I was lucky enough that one of the > participants, Chris Bottoms, volunteered to take on that task, and I believe > more or less completed it. I'm indebted to you, Chris! > > I don't think he has svn write access, so if I'm not mistaken it's not > committed yet. Rather than bothering with getting him an account on the > open-bio machine, Chris Fields and I were going to migrate BioSQL over to > github this week, and then we can go from there. > > ? ? ? ?-hilmar Hi Hilmar, BioSQL moved over to github successfully, does that me Brad (or I) have your blessing to checkin the proposed SQLite schema as is? Or are there some tweaks from BOSC/ISMB? Thanks, Peter From cjfields at illinois.edu Fri Oct 29 14:02:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 29 Oct 2010 13:02:25 -0500 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> Message-ID: <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> Should be easy enough to add this in to the main repo. The best way to check it is via the various language-specific adaptors for BioSQL (bioperl-db, etc). Peter, do you want the honors, or should I go ahead? chris On Oct 29, 2010, at 11:41 AM, Christopher Bottoms wrote: > Peter, Hilmar, Chris, Brad, and others, > > At BOSC, I modified a version from biopython, resulting in the > attached file. I did this for Hilmar and he is welcome to do whatever > he wants with it. I regretted not having anything to test it against, > but it did create the SQLite tables without emitting errors or > warnings. > > Thanks Hilmar for providing this opportunity! I learned more about > SQLite in two days than I had the previous two years. > > Sincerely, > Christopher > > > On Thu, Oct 28, 2010 at 11:54 AM, Peter wrote: >> On Tue, Jul 20, 2010 at 10:50 PM, Hilmar Lapp wrote: >>> >>> >>> On Jul 20, 2010, at 4:12 PM, Peter wrote: >>> >>>> Did you guys manage to sit down together to look at the BioSQL >>>> on SQLite3 schema during BOSC/ISMB? >>> >>> >>> Yes. I not os much, sadly, but I was lucky enough that one of the >>> participants, Chris Bottoms, volunteered to take on that task, and I believe >>> more or less completed it. I'm indebted to you, Chris! >>> >>> I don't think he has svn write access, so if I'm not mistaken it's not >>> committed yet. Rather than bothering with getting him an account on the >>> open-bio machine, Chris Fields and I were going to migrate BioSQL over to >>> github this week, and then we can go from there. >>> >>> -hilmar >> >> Hi Hilmar, >> >> BioSQL moved over to github successfully, does that me Brad (or I) >> have your blessing to checkin the proposed SQLite schema as is? >> Or are there some tweaks from BOSC/ISMB? >> >> Thanks, >> >> Peter >> > From biopython at maubp.freeserve.co.uk Fri Oct 29 14:27:35 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 29 Oct 2010 19:27:35 +0100 Subject: [BioSQL-l] SQLite support In-Reply-To: <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> Message-ID: On Fri, Oct 29, 2010 at 7:02 PM, Chris Fields wrote: > > Should be easy enough to add this in to the main repo. ?The best > way to check it is via the various language-specific adaptors for > BioSQL (bioperl-db, etc). > > Peter, do you want the honors, or should I go ahead? I think Brad deserves the privilege of checking it in, but otherwise I'm happy to do it (I have been nagging about this afterall ;) I'd like Brad's draft (which we've been using in Biopython for a while) committed first, then any of Chris B's changes on top. I've just taken a look at Chris B's changes - the good news is the Biopython unit tests work with his version of the schema. Do BioPerl or any other Bio* bindings exist for BioSQL on SQLite yet? Chris B has removed AUTOINCREMENT with a note at the start explaining why. That looks OK, other than the fact the ID of deleted rows may be reused (not sure if that matters to us). Given this (tiny?) risk, is the performance gain significant? More surprising to me is he has introduced extra PRIMARY KEY columns to tables that lacked an explicit key, e.g. adding location_qualifier_value_id to table location_qualifier_value. The naming convention appears to be table_name_id when table_name is the table. I'd like to understand why this was done and if it is beneficial in some way (I don't like the fact this differs from the other schemas). As part of the above, any composite primary keys are now just UNIQUE statements (e.g. tables bioentry_dbxref and bioentry_reference) with a new extra PRIMARY KEY instead. A minor point: there is some whitespace formatting issue in table seqfeature_dbxref (probably tabs vs spaces, shows up in the git diff output). Finally in table taxon_name I think we are missing a UNIQUE constraint or composite PRIMARY KEY (see the MySQL schema), but this was true in Brad's schema too. Regards, Peter From cjfields at illinois.edu Fri Oct 29 14:58:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 29 Oct 2010 13:58:26 -0500 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> Message-ID: <39C443BC-6A2C-467E-AF75-0D4F1F8B8A21@illinois.edu> On Oct 29, 2010, at 1:27 PM, Peter wrote: > On Fri, Oct 29, 2010 at 7:02 PM, Chris Fields wrote: >> >> Should be easy enough to add this in to the main repo. The best >> way to check it is via the various language-specific adaptors for >> BioSQL (bioperl-db, etc). >> >> Peter, do you want the honors, or should I go ahead? > > I think Brad deserves the privilege of checking it in, but otherwise > I'm happy to do it (I have been nagging about this afterall ;) > I'd like Brad's draft (which we've been using in Biopython for a > while) committed first, then any of Chris B's changes on top. Works for me, just need to get it added in. > I've just taken a look at Chris B's changes - the good news is > the Biopython unit tests work with his version of the schema. > Do BioPerl or any other Bio* bindings exist for BioSQL on > SQLite yet? No, I don't think so. Not sure how much work it would be, but we could probably use MySQL or Pg bindings to get it going. We have also discussed creating DBIx::Class bindings to BioSQL (Perl ORM), though I haven't heard much on this since BOSC. That basically removes the need for creating database-specific bindings. > Chris B has removed AUTOINCREMENT with a note at the > start explaining why. That looks OK, other than the fact the ID > of deleted rows may be reused (not sure if that matters to us). > Given this (tiny?) risk, is the performance gain significant? > > More surprising to me is he has introduced extra PRIMARY > KEY columns to tables that lacked an explicit key, e.g. adding > location_qualifier_value_id to table location_qualifier_value. > The naming convention appears to be table_name_id when > table_name is the table. I'd like to understand why this was > done and if it is beneficial in some way (I don't like the fact > this differs from the other schemas). > > As part of the above, any composite primary keys are now > just UNIQUE statements (e.g. tables bioentry_dbxref and > bioentry_reference) with a new extra PRIMARY KEY instead. > > A minor point: there is some whitespace formatting issue in > table seqfeature_dbxref (probably tabs vs spaces, shows up > in the git diff output). > > Finally in table taxon_name I think we are missing a > UNIQUE constraint or composite PRIMARY KEY (see the > MySQL schema), but this was true in Brad's schema too. > > Regards, > > Peter Not sure about all these, but I don't think they necessarily block adding this in. chris From biopython at maubp.freeserve.co.uk Sat Oct 30 09:06:31 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 30 Oct 2010 14:06:31 +0100 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> Message-ID: On Fri, Oct 29, 2010 at 7:58 PM, Chris Fields wrote: > On Oct 29, 2010, at 1:27 PM, Peter wrote: > >> I think Brad deserves the privilege of checking it in, but otherwise >> I'm happy to do it (I have been nagging about this afterall ;) >> I'd like Brad's draft (which we've been using in Biopython for a >> while) committed first, then any of Chris B's changes on top. > > Works for me, just need to get it added in. > Since you sounded keen Chris, and Brad wasn't replying, I went ahead and checked it in: http://github.com/biosql/biosql/tree/4315be111d7d9eaa47bb3674eeed89e045d2c07a On Fri, Oct 29, 2010 at 10:59 PM, Christopher Bottoms wrote: > On Fri, Oct 29, 2010 at 1:27 PM, Peter wrote: >> Chris B has removed AUTOINCREMENT with a note at the >> start explaining why. That looks OK, other than the fact the ID >> of deleted rows may be reused (not sure if that matters to us). >> Given this (tiny?) risk, is the performance gain significant? > > Using AUTOINCREMENT causes "INSERTs to run a little slower" (see > http://www.sqlite.org/autoinc.html). That doesn't sound like it is > game-changing. > I've checked that in too, http://github.com/biosql/biosql/commit/4ef99fbb48366631cca0845baaa3b63ded948c35 >> More surprising to me is he has introduced extra PRIMARY >> KEY columns to tables that lacked an explicit key, e.g. adding >> location_qualifier_value_id to table location_qualifier_value. >> The naming convention appears to be table_name_id when >> table_name is the table. I'd like to understand why this was >> done and if it is beneficial in some way (I don't like the fact >> this differs from the other schemas). >> > > I thought that there was a new version of BioSQL coming out for which > this was going to be done for all of the schemas. Roger Hall, for > example, did the same things with the MySQL schema for BioSQL. > I wasn't aware of that - and it hasn't been checked in (at least, not on the master branch). If all the BioSQL schemas are getting these new id fields, then it makes sense to add them to the SQLite schema too of course. >> >> A minor point: there is some whitespace formatting issue in >> table seqfeature_dbxref (probably tabs vs spaces, shows up >> in the git diff output). >> > > Sorry, I thought I tried to maintain the original formatting as much > as possible. No problem. >> Finally in table taxon_name I think we are missing a >> UNIQUE constraint or composite PRIMARY KEY (see the >> MySQL schema), but this was true in Brad's schema too. >> > > This is what is in the file I sent earlier: > > CREATE TABLE taxon_name ( > ? ? ? taxon_name_id ? ?INTEGER PRIMARY KEY, > ? ? ? taxon_id ? ? ? ? INTEGER, > ? ? ? name ? ? ? ? ? ? VARCHAR(255) ?NOT NULL, > ? ? ? name_class ? ? ? VARCHAR(32) ?NOT NULL, > ? ? ? UNIQUE (taxon_id,name,name_class) > ); Looks like a false alarm - I must have misread the diff or something. Sorry. Peter From biopython at maubp.freeserve.co.uk Thu Oct 28 16:54:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 17:54:47 +0100 Subject: [BioSQL-l] SQLite support In-Reply-To: <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> Message-ID: On Tue, Jul 20, 2010 at 10:50 PM, Hilmar Lapp wrote: > > > On Jul 20, 2010, at 4:12 PM, Peter wrote: > >> Did you guys manage to sit down together to look at the BioSQL >> on SQLite3 schema during BOSC/ISMB? > > > Yes. I not os much, sadly, but I was lucky enough that one of the > participants, Chris Bottoms, volunteered to take on that task, and I believe > more or less completed it. I'm indebted to you, Chris! > > I don't think he has svn write access, so if I'm not mistaken it's not > committed yet. Rather than bothering with getting him an account on the > open-bio machine, Chris Fields and I were going to migrate BioSQL over to > github this week, and then we can go from there. > > ? ? ? ?-hilmar Hi Hilmar, BioSQL moved over to github successfully, does that me Brad (or I) have your blessing to checkin the proposed SQLite schema as is? Or are there some tweaks from BOSC/ISMB? Thanks, Peter From cjfields at illinois.edu Fri Oct 29 18:02:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 29 Oct 2010 13:02:25 -0500 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> Message-ID: <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> Should be easy enough to add this in to the main repo. The best way to check it is via the various language-specific adaptors for BioSQL (bioperl-db, etc). Peter, do you want the honors, or should I go ahead? chris On Oct 29, 2010, at 11:41 AM, Christopher Bottoms wrote: > Peter, Hilmar, Chris, Brad, and others, > > At BOSC, I modified a version from biopython, resulting in the > attached file. I did this for Hilmar and he is welcome to do whatever > he wants with it. I regretted not having anything to test it against, > but it did create the SQLite tables without emitting errors or > warnings. > > Thanks Hilmar for providing this opportunity! I learned more about > SQLite in two days than I had the previous two years. > > Sincerely, > Christopher > > > On Thu, Oct 28, 2010 at 11:54 AM, Peter wrote: >> On Tue, Jul 20, 2010 at 10:50 PM, Hilmar Lapp wrote: >>> >>> >>> On Jul 20, 2010, at 4:12 PM, Peter wrote: >>> >>>> Did you guys manage to sit down together to look at the BioSQL >>>> on SQLite3 schema during BOSC/ISMB? >>> >>> >>> Yes. I not os much, sadly, but I was lucky enough that one of the >>> participants, Chris Bottoms, volunteered to take on that task, and I believe >>> more or less completed it. I'm indebted to you, Chris! >>> >>> I don't think he has svn write access, so if I'm not mistaken it's not >>> committed yet. Rather than bothering with getting him an account on the >>> open-bio machine, Chris Fields and I were going to migrate BioSQL over to >>> github this week, and then we can go from there. >>> >>> -hilmar >> >> Hi Hilmar, >> >> BioSQL moved over to github successfully, does that me Brad (or I) >> have your blessing to checkin the proposed SQLite schema as is? >> Or are there some tweaks from BOSC/ISMB? >> >> Thanks, >> >> Peter >> > From biopython at maubp.freeserve.co.uk Fri Oct 29 18:27:35 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 29 Oct 2010 19:27:35 +0100 Subject: [BioSQL-l] SQLite support In-Reply-To: <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> Message-ID: On Fri, Oct 29, 2010 at 7:02 PM, Chris Fields wrote: > > Should be easy enough to add this in to the main repo. ?The best > way to check it is via the various language-specific adaptors for > BioSQL (bioperl-db, etc). > > Peter, do you want the honors, or should I go ahead? I think Brad deserves the privilege of checking it in, but otherwise I'm happy to do it (I have been nagging about this afterall ;) I'd like Brad's draft (which we've been using in Biopython for a while) committed first, then any of Chris B's changes on top. I've just taken a look at Chris B's changes - the good news is the Biopython unit tests work with his version of the schema. Do BioPerl or any other Bio* bindings exist for BioSQL on SQLite yet? Chris B has removed AUTOINCREMENT with a note at the start explaining why. That looks OK, other than the fact the ID of deleted rows may be reused (not sure if that matters to us). Given this (tiny?) risk, is the performance gain significant? More surprising to me is he has introduced extra PRIMARY KEY columns to tables that lacked an explicit key, e.g. adding location_qualifier_value_id to table location_qualifier_value. The naming convention appears to be table_name_id when table_name is the table. I'd like to understand why this was done and if it is beneficial in some way (I don't like the fact this differs from the other schemas). As part of the above, any composite primary keys are now just UNIQUE statements (e.g. tables bioentry_dbxref and bioentry_reference) with a new extra PRIMARY KEY instead. A minor point: there is some whitespace formatting issue in table seqfeature_dbxref (probably tabs vs spaces, shows up in the git diff output). Finally in table taxon_name I think we are missing a UNIQUE constraint or composite PRIMARY KEY (see the MySQL schema), but this was true in Brad's schema too. Regards, Peter From cjfields at illinois.edu Fri Oct 29 18:58:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 29 Oct 2010 13:58:26 -0500 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> Message-ID: <39C443BC-6A2C-467E-AF75-0D4F1F8B8A21@illinois.edu> On Oct 29, 2010, at 1:27 PM, Peter wrote: > On Fri, Oct 29, 2010 at 7:02 PM, Chris Fields wrote: >> >> Should be easy enough to add this in to the main repo. The best >> way to check it is via the various language-specific adaptors for >> BioSQL (bioperl-db, etc). >> >> Peter, do you want the honors, or should I go ahead? > > I think Brad deserves the privilege of checking it in, but otherwise > I'm happy to do it (I have been nagging about this afterall ;) > I'd like Brad's draft (which we've been using in Biopython for a > while) committed first, then any of Chris B's changes on top. Works for me, just need to get it added in. > I've just taken a look at Chris B's changes - the good news is > the Biopython unit tests work with his version of the schema. > Do BioPerl or any other Bio* bindings exist for BioSQL on > SQLite yet? No, I don't think so. Not sure how much work it would be, but we could probably use MySQL or Pg bindings to get it going. We have also discussed creating DBIx::Class bindings to BioSQL (Perl ORM), though I haven't heard much on this since BOSC. That basically removes the need for creating database-specific bindings. > Chris B has removed AUTOINCREMENT with a note at the > start explaining why. That looks OK, other than the fact the ID > of deleted rows may be reused (not sure if that matters to us). > Given this (tiny?) risk, is the performance gain significant? > > More surprising to me is he has introduced extra PRIMARY > KEY columns to tables that lacked an explicit key, e.g. adding > location_qualifier_value_id to table location_qualifier_value. > The naming convention appears to be table_name_id when > table_name is the table. I'd like to understand why this was > done and if it is beneficial in some way (I don't like the fact > this differs from the other schemas). > > As part of the above, any composite primary keys are now > just UNIQUE statements (e.g. tables bioentry_dbxref and > bioentry_reference) with a new extra PRIMARY KEY instead. > > A minor point: there is some whitespace formatting issue in > table seqfeature_dbxref (probably tabs vs spaces, shows up > in the git diff output). > > Finally in table taxon_name I think we are missing a > UNIQUE constraint or composite PRIMARY KEY (see the > MySQL schema), but this was true in Brad's schema too. > > Regards, > > Peter Not sure about all these, but I don't think they necessarily block adding this in. chris From biopython at maubp.freeserve.co.uk Sat Oct 30 13:06:31 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 30 Oct 2010 14:06:31 +0100 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> Message-ID: On Fri, Oct 29, 2010 at 7:58 PM, Chris Fields wrote: > On Oct 29, 2010, at 1:27 PM, Peter wrote: > >> I think Brad deserves the privilege of checking it in, but otherwise >> I'm happy to do it (I have been nagging about this afterall ;) >> I'd like Brad's draft (which we've been using in Biopython for a >> while) committed first, then any of Chris B's changes on top. > > Works for me, just need to get it added in. > Since you sounded keen Chris, and Brad wasn't replying, I went ahead and checked it in: http://github.com/biosql/biosql/tree/4315be111d7d9eaa47bb3674eeed89e045d2c07a On Fri, Oct 29, 2010 at 10:59 PM, Christopher Bottoms wrote: > On Fri, Oct 29, 2010 at 1:27 PM, Peter wrote: >> Chris B has removed AUTOINCREMENT with a note at the >> start explaining why. That looks OK, other than the fact the ID >> of deleted rows may be reused (not sure if that matters to us). >> Given this (tiny?) risk, is the performance gain significant? > > Using AUTOINCREMENT causes "INSERTs to run a little slower" (see > http://www.sqlite.org/autoinc.html). That doesn't sound like it is > game-changing. > I've checked that in too, http://github.com/biosql/biosql/commit/4ef99fbb48366631cca0845baaa3b63ded948c35 >> More surprising to me is he has introduced extra PRIMARY >> KEY columns to tables that lacked an explicit key, e.g. adding >> location_qualifier_value_id to table location_qualifier_value. >> The naming convention appears to be table_name_id when >> table_name is the table. I'd like to understand why this was >> done and if it is beneficial in some way (I don't like the fact >> this differs from the other schemas). >> > > I thought that there was a new version of BioSQL coming out for which > this was going to be done for all of the schemas. Roger Hall, for > example, did the same things with the MySQL schema for BioSQL. > I wasn't aware of that - and it hasn't been checked in (at least, not on the master branch). If all the BioSQL schemas are getting these new id fields, then it makes sense to add them to the SQLite schema too of course. >> >> A minor point: there is some whitespace formatting issue in >> table seqfeature_dbxref (probably tabs vs spaces, shows up >> in the git diff output). >> > > Sorry, I thought I tried to maintain the original formatting as much > as possible. No problem. >> Finally in table taxon_name I think we are missing a >> UNIQUE constraint or composite PRIMARY KEY (see the >> MySQL schema), but this was true in Brad's schema too. >> > > This is what is in the file I sent earlier: > > CREATE TABLE taxon_name ( > ? ? ? taxon_name_id ? ?INTEGER PRIMARY KEY, > ? ? ? taxon_id ? ? ? ? INTEGER, > ? ? ? name ? ? ? ? ? ? VARCHAR(255) ?NOT NULL, > ? ? ? name_class ? ? ? VARCHAR(32) ?NOT NULL, > ? ? ? UNIQUE (taxon_id,name,name_class) > ); Looks like a false alarm - I must have misread the diff or something. Sorry. Peter