From biopython at maubp.freeserve.co.uk Thu Nov 6 06:53:13 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Nov 2008 11:53:13 +0000 Subject: [BioSQL-l] Tables without a (composite) primary key Message-ID: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> I've recently been looking into some object-relational mappers which caused me to look more closely at the BioSQL schema. Many of these packages require a primary key, but not all can cope with a composite primary key. However, some BioSQL tables don't have any primary key at all. Several BioSQL tables have composite primary keys, for example the term_dbxref table has a composite key of (term_id, dbxref_id), and also an index on dbxref_id as well. However, some BioSQL tables do not have a primary key, for example: -- corresponds to the names table of the NCBI taxonomy databaase CREATE TABLE taxon_name ( taxon_id INT(10) UNSIGNED NOT NULL, name VARCHAR(255) BINARY NOT NULL, name_class VARCHAR(32) BINARY NOT NULL, UNIQUE (taxon_id,name,name_class) ) TYPE=INNODB; CREATE INDEX taxnametaxonid ON taxon_name(taxon_id); CREATE INDEX taxnamename ON taxon_name(name); Why don't taxon_name, bioentry_path, term_relationship, bioentry_qualifier_value, seqfeature_path have a primary key (just a uniqueness criteria)? Thanks, Peter From biopython at maubp.freeserve.co.uk Thu Nov 6 07:37:42 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Nov 2008 12:37:42 +0000 Subject: [BioSQL-l] Tables without a (composite) primary key In-Reply-To: <200811061323.43749.raoul.bonnal@itb.cnr.it> References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> <200811061323.43749.raoul.bonnal@itb.cnr.it> Message-ID: <320fb6e00811060437h367804c7y6d46ed36d1b619ae@mail.gmail.com> On Thu, Nov 6, 2008 at 12:23 PM, Raoul Jean Pierre Bonnal wrote: > > Dear Peter, > I'm writing the wrapper for BioRuby using DataMapper an ORM (Active Record is > similar). Hi Raul, I'm looking at a python based ORM to use with BioSQL. [The existing Biopython BioSQL bridge uses raw SQL to turn the sequences and features into Biopython objects - this all seems to work fine, but it doesn't offer the full flexibility of an ORM framework.] > > I think we can cosider to move or branch BioSQL' schema to the approach > suggested by this kind of ORMs, with a pk for every table named "id" and a > table name in plural. Fk names are quite correct. > I don't think it makes sense to add a single primary key to many of these tables (e.g. term_dbxref). The existing composite primary keys seem fine (its just a shame some ORMs can't cope). I was thinking the tables currently lacking any primary key could get one (based on the current UNIQUE rule). So for example, the taxon_name could use (taxon_id,name,name_class) as its primary key. I don't know how big a change this would be - but superficially it looks backwards compatible. This is why I was asking why they didn't have PK in the first place. > > PS: DataMapper handles very well composite PK, much better tha ActiveRecord. > For python, currently Django and also I believe SQLObjects don't support composite primary keys. I'll take a look at SQLAlchemy next which should cope better. Peter From hlapp at gmx.net Thu Nov 6 16:39:55 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Nov 2008 16:39:55 -0500 Subject: [BioSQL-l] Tables without a (composite) primary key In-Reply-To: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> Message-ID: <082DB965-12D9-4941-BF46-E49359C8C096@gmx.net> Hi Peter, it's a known enhancement request. I know that some ORMs have trouble reverse engineering the mapping if there is no PK defined. Semantically, however, in the absence of a primary key constraint the first unique key constraint is equivalent to a primary key (in fact some ER modeling tools will automatically do the conversion); unique keys are also called alternate keys (alternate to the primary key). So for now feel free to either change the UK constraint to PK where there is no PK defined and your reverse engineering tool needs it. If you don't use a reverse engineering tool, just set the columns of the UK constraint as the compound primary key if there isn't a surrogate PK. BioSQL 1.1+ will have surrogate PKs on all tables, but this change may not be backwards compatible for existing language bindings, which is why I'd like to make those changes first that should be fully backwards compatible. -hilmar On Nov 6, 2008, at 6:53 AM, Peter wrote: > I've recently been looking into some object-relational mappers which > caused me to look more closely at the BioSQL schema. Many of these > packages require a primary key, but not all can cope with a composite > primary key. However, some BioSQL tables don't have any primary key > at all. > > Several BioSQL tables have composite primary keys, for example the > term_dbxref table has a composite key of (term_id, dbxref_id), and > also an index on dbxref_id as well. > > However, some BioSQL tables do not have a primary key, for example: > > -- corresponds to the names table of the NCBI taxonomy databaase > CREATE TABLE taxon_name ( > taxon_id INT(10) UNSIGNED NOT NULL, > name VARCHAR(255) BINARY NOT NULL, > name_class VARCHAR(32) BINARY NOT NULL, > UNIQUE (taxon_id,name,name_class) > ) TYPE=INNODB; > > CREATE INDEX taxnametaxonid ON taxon_name(taxon_id); > CREATE INDEX taxnamename ON taxon_name(name); > > Why don't taxon_name, bioentry_path, term_relationship, > bioentry_qualifier_value, seqfeature_path have a primary key (just a > uniqueness criteria)? > > Thanks, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Thu Nov 6 17:12:28 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Nov 2008 22:12:28 +0000 Subject: [BioSQL-l] Tables without a (composite) primary key In-Reply-To: <082DB965-12D9-4941-BF46-E49359C8C096@gmx.net> References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> <082DB965-12D9-4941-BF46-E49359C8C096@gmx.net> Message-ID: <320fb6e00811061412s7516b573uf26d5042e193fa45@mail.gmail.com> On Thu, Nov 6, 2008 at 9:39 PM, Hilmar Lapp wrote: > > Hi Peter, > > it's a known enhancement request. I know that some ORMs have trouble reverse > engineering the mapping if there is no PK defined. Oh right, "Surrogate primary keys on all tables" on this page: http://www.biosql.org/wiki/Enhancement_Requests > Semantically, however, in the absence of a primary key constraint the first > unique key constraint is equivalent to a primary key (in fact some ER > modeling tools will automatically do the conversion); unique keys are also > called alternate keys (alternate to the primary key). > > So for now feel free to either change the UK constraint to PK where there is > no PK defined and your reverse engineering tool needs it. If you don't use a > reverse engineering tool, just set the columns of the UK constraint as the > compound primary key if there isn't a surrogate PK. OK - I'll bear that in mind. > BioSQL 1.1+ will have surrogate PKs on all tables, but this change may not > be backwards compatible for existing language bindings, which is why I'd > like to make those changes first that should be fully backwards compatible. That sounds sensible. Thanks Hilmar! Peter P.S. Is there any agreed terminology: compound primary key versus composite primary key? From raoul.bonnal at itb.cnr.it Thu Nov 6 07:23:43 2008 From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal) Date: Thu, 06 Nov 2008 13:23:43 +0100 Subject: [BioSQL-l] Tables without a (composite) primary key In-Reply-To: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> Message-ID: <200811061323.43749.raoul.bonnal@itb.cnr.it> Dear Peter, I'm writing the wrapper for BioRuby using DataMapper an ORM (Active Record is similar). I think we can cosider to move or branch BioSQL' schema to the approach suggested by this kind of ORMs, with a pk for every table named "id" and a table name in plural. Fk names are quite correct. PS: DataMapper handles very well composite PK, much better tha ActiveRecord. Il gioved? 06 novembre 2008 12:53:13 Peter ha scritto: > I've recently been looking into some object-relational mappers which > caused me to look more closely at the BioSQL schema. Many of these > packages require a primary key, but not all can cope with a composite > primary key. However, some BioSQL tables don't have any primary key > at all. > > Several BioSQL tables have composite primary keys, for example the > term_dbxref table has a composite key of (term_id, dbxref_id), and > also an index on dbxref_id as well. > > However, some BioSQL tables do not have a primary key, for example: > > -- corresponds to the names table of the NCBI taxonomy databaase > CREATE TABLE taxon_name ( > taxon_id INT(10) UNSIGNED NOT NULL, > name VARCHAR(255) BINARY NOT NULL, > name_class VARCHAR(32) BINARY NOT NULL, > UNIQUE (taxon_id,name,name_class) > ) TYPE=INNODB; > > CREATE INDEX taxnametaxonid ON taxon_name(taxon_id); > CREATE INDEX taxnamename ON taxon_name(name); > > Why don't taxon_name, bioentry_path, term_relationship, > bioentry_qualifier_value, seqfeature_path have a primary key (just a > uniqueness criteria)? > > Thanks, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From mark.schreiber at novartis.com Thu Nov 6 21:07:12 2008 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 7 Nov 2008 10:07:12 +0800 Subject: [BioSQL-l] Tables without a (composite) primary key In-Reply-To: <200811061323.43749.raoul.bonnal@itb.cnr.it> Message-ID: Hi - In the Java JPA it is possible to use an embedded object as a primary key. This gets you around the situations where the primary key is composite. It also effectively gets you around those tables where there is no key but it is implicit as all the fields are unique (as in taxon_name). What you end up with is an object that holds taxon_id, name, name_class, and inside that object you have an embedded key object that contains the same three fields. In this way any changes that are made to the object are still associated with the original row via the unchanged embedded PK object and are updated accordingly. While I agree that an explicit PK's for all BioSQL tables would be nicer for ORM frameworks many frameworks have ways to get around this, possibly those in Ruby or Python do as well. - Mark biosql-l-bounces at lists.open-bio.org wrote on 11/06/2008 08:23:43 PM: > Dear Peter, > I'm writing the wrapper for BioRuby using DataMapper an ORM (Active Record is > similar). > > I think we can cosider to move or branch BioSQL' schema to the approach > suggested by this kind of ORMs, with a pk for every table named "id" and a > table name in plural. Fk names are quite correct. > > PS: DataMapper handles very well composite PK, much better tha ActiveRecord. > > Il gioved? 06 novembre 2008 12:53:13 Peter ha scritto: > > I've recently been looking into some object-relational mappers which > > caused me to look more closely at the BioSQL schema. Many of these > > packages require a primary key, but not all can cope with a composite > > primary key. However, some BioSQL tables don't have any primary key > > at all. > > > > Several BioSQL tables have composite primary keys, for example the > > term_dbxref table has a composite key of (term_id, dbxref_id), and > > also an index on dbxref_id as well. > > > > However, some BioSQL tables do not have a primary key, for example: > > > > -- corresponds to the names table of the NCBI taxonomy databaase > > CREATE TABLE taxon_name ( > > taxon_id INT(10) UNSIGNED NOT NULL, > > name VARCHAR(255) BINARY NOT NULL, > > name_class VARCHAR(32) BINARY NOT NULL, > > UNIQUE (taxon_id,name,name_class) > > ) TYPE=INNODB; > > > > CREATE INDEX taxnametaxonid ON taxon_name(taxon_id); > > CREATE INDEX taxnamename ON taxon_name(name); > > > > Why don't taxon_name, bioentry_path, term_relationship, > > bioentry_qualifier_value, seqfeature_path have a primary key (just a > > uniqueness criteria)? > > > > Thanks, > > > > Peter > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From biopython at maubp.freeserve.co.uk Fri Nov 7 13:35:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Nov 2008 18:35:31 +0000 Subject: [BioSQL-l] Tables without a (composite) primary key In-Reply-To: <320fb6e00811061412s7516b573uf26d5042e193fa45@mail.gmail.com> References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> <082DB965-12D9-4941-BF46-E49359C8C096@gmx.net> <320fb6e00811061412s7516b573uf26d5042e193fa45@mail.gmail.com> Message-ID: <320fb6e00811071035y496ea4d8p93aa0f54633950f@mail.gmail.com> I've ruled out using Django v1.0 and the current version of SQLObjects with BioSQL as they don't (yet) support composite primary keys. However, SQLAlchemy 0.5.0 seems to be happy with the current BioSQL schema as is :) http://www.djangoproject.com/ http://www.sqlobject.org/ http://www.sqlalchemy.org/ Hilmar: >> BioSQL 1.1+ will have surrogate PKs on all tables, but this change may not >> be backwards compatible for existing language bindings, which is why I'd >> like to make those changes first that should be fully backwards compatible. Peter: > That sounds sensible. Actually I may have initially misunderstood you. Are you saying for tables which already have a composite primary key (e.g. term_dbxref) you plan to add/replace this with a surrogate (single column) PK - just to accommodate certain simplistic ORMs? I'm not so keen on this, it seems like an invasive change with little benefit, but potentially making lots of work updating the Bio* bindings. However, would the smaller step of adding composite primary keys to tables currently lacking them be possible on the BioSQL v1.0.x roadmap? e.g. for taxon_name using (taxon_id,name,name_class) as the composite primary key, currently specified to be unique. Or might this also cause trouble for the Bio* binding? If this was possible, it *might* be useful for certain ORMs. Was there a reason why tables like taxon_name never had a (composite/compound) primary key in the first place? Peter From biopython at maubp.freeserve.co.uk Fri Nov 14 15:48:02 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 14 Nov 2008 20:48:02 +0000 Subject: [BioSQL-l] parent_taxon_id of a root node In-Reply-To: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com> References: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com> Message-ID: <320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com> On Fri, Oct 3, 2008m, I wrote: > > Hello all, > > I was puzzled to find the BioSQL script load_ncbi_taxonomy.pl will set > the parent_taxon_id of the NCBI root node in the taxon table to point > to itself. I would have expected this to be NULL indicating no > parent. If someone is using the database directly, extracting a > lineage could trigger an infinite loop. Can anyone explain the > rational here? > > Note that when Biopython adds entries to the taxon table, it uses NULL > for a root node. When retrieving sequences from a BioSQL database, > Biopython does cope with a root node with a NULL parent or a > self-parent - would it safe to assume BioPerl and Java can also cope > with both situations? > > Thanks, > > Peter > Hi again, I thought I'd raise this question again (as I didn't see any response last time), as I've just been bitten by the self-parent taxon problem this afternoon. This was for a simple webfront end to part of a BioSQL database using SQLAlchemy in python - but that's not important. I was using a simple loop to build up lineages, which was working fine until I ran load_ncbi_taxonomy.pl and suddenly my program seemed to just time out. I'd forgotten about the self-parent root nodes used by load_ncbi_taxonomy.pl which had triggered an infinite loop. I hit another (less serious) problem stemming for these self-parent root nodes when I wanted to generate a list of sub-lineages (child entries), essentially: SELECT * FROM taxon WHERE parent_taxon_id=12345; When calling this on a root node, I had to modify this to explicitly exclude itself from the children: SELECT * FROM taxon WHERE parent_taxon_id=12345 AND taxon_id<>12345; So to repeat my earlier question, is there a reason why parent_taxon_id isn't just NULL for root nodes? Was this a deliberate design choice - because if not, I think this could be regarded as a bug in load_ncbi_taxonomy.pl. Thanks Peter From hlapp at gmx.net Sat Nov 15 13:34:45 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Nov 2008 13:34:45 -0500 Subject: [BioSQL-l] parent_taxon_id of a root node In-Reply-To: <320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com> References: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com> <320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com> Message-ID: Sorry Peter - it looks like this slipped my attention (Oct was crazy). Thanks for raising it again. I agree with you, this looks like a bug. Would you mind filing it? It's possible that has secretly been assumed as policy and hence led to some people identifying the root node by equating parent and taxon_id, but surely this sounds like the wrong way of doing it, so it deserves fixing. -hilmar On Nov 14, 2008, at 3:48 PM, Peter wrote: > On Fri, Oct 3, 2008m, I wrote: >> >> Hello all, >> >> I was puzzled to find the BioSQL script load_ncbi_taxonomy.pl will >> set >> the parent_taxon_id of the NCBI root node in the taxon table to point >> to itself. I would have expected this to be NULL indicating no >> parent. If someone is using the database directly, extracting a >> lineage could trigger an infinite loop. Can anyone explain the >> rational here? >> >> Note that when Biopython adds entries to the taxon table, it uses >> NULL >> for a root node. When retrieving sequences from a BioSQL database, >> Biopython does cope with a root node with a NULL parent or a >> self-parent - would it safe to assume BioPerl and Java can also cope >> with both situations? >> >> Thanks, >> >> Peter >> > > Hi again, > > I thought I'd raise this question again (as I didn't see any response > last time), as I've just been bitten by the self-parent taxon problem > this afternoon. This was for a simple webfront end to part of a > BioSQL database using SQLAlchemy in python - but that's not important. > > I was using a simple loop to build up lineages, which was working fine > until I ran load_ncbi_taxonomy.pl and suddenly my program seemed to > just time out. I'd forgotten about the self-parent root nodes used by > load_ncbi_taxonomy.pl which had triggered an infinite loop. > > I hit another (less serious) problem stemming for these self-parent > root nodes when I wanted to generate a list of sub-lineages (child > entries), essentially: > > SELECT * FROM taxon WHERE parent_taxon_id=12345; > > When calling this on a root node, I had to modify this to explicitly > exclude itself from the children: > > SELECT * FROM taxon WHERE parent_taxon_id=12345 AND taxon_id<>12345; > > So to repeat my earlier question, is there a reason why > parent_taxon_id isn't just NULL for root nodes? Was this a deliberate > design choice - because if not, I think this could be regarded as a > bug in load_ncbi_taxonomy.pl. > > Thanks > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Sun Nov 16 09:58:20 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 16 Nov 2008 14:58:20 +0000 Subject: [BioSQL-l] parent_taxon_id of a root node In-Reply-To: References: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com> <320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com> Message-ID: <320fb6e00811160658s6282022by3681364e14aecf69@mail.gmail.com> On Sat, Nov 15, 2008 at 6:34 PM, Hilmar Lapp wrote: > > Sorry Peter - it looks like this slipped my attention (Oct was crazy). > Thanks for raising it again. I agree with you, this looks like a bug. Would > you mind filing it? Sure, http://bugzilla.open-bio.org/show_bug.cgi?id=2664 > It's possible that has secretly been assumed as policy and hence led to some > people identifying the root node by equating parent and taxon_id, but surely > this sounds like the wrong way of doing it, so it deserves fixing. In the short term we should just make sure all the Bio* projects can cope with either style root node (Biopython can), but in the long term are self parent taxon entries something that could be banned via the schema? Regards, Peter From biopython at maubp.freeserve.co.uk Wed Nov 26 13:37:51 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 26 Nov 2008 18:37:51 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <20081125211622.GE83220@sobchak.mgh.harvard.edu> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> On Tue, Nov 25, 2008 at 9:16 PM, Brad Chapman wrote: > Hi Peter; > Hope all is going well with you. I was glancing at the BioSQL > mailing list archives last night and saw your messages earlier this > month about using an ORM mapper with BioSQL. > > Some of my current work is using a BioSQL storage backend with a > javascript web interface. The middleware uses Pylons and SQLAlchemy. > This uses some parts of BioSQL not well represented via an object front > end like bioentry_relationship, and so it has been convenient to work > with these via SQLAlchemy directly. I've been using TurboGears with SQLAlchemy, and so far everything has been OK. This was essentially independent of the Biopython BioSQL mapping. > To your initial question, SQLAlchemy can handle those non-primary key > tables without a problem by setting "primary_key = True" for all of > the unique columns. Yes, SQLAlchemy seems pretty good. The only catch was for a table with no primary key defined at all (the taxon_name table) which required a little more work setting up the ORM mapping, but which also seems to work fine. > What I have done thus far is definitely non-complete, and also > includes some add-on tables for storing experimental data linked to > BioSQL. However, I am attaching it here just to give you an idea > (init.py is the __init__.py of the module). You would use it like: > > from Wherever.BioSQL import get_session, biosql > > session = get_session("production") > entries = session.query(biosql.Bioentry).filter_by(identifier = "A12345") > > If you, or anyone else, is developing something similar, > I'd be happy to help with something generalized. > > Brad > I'll probably take a look at this next week - I'm on the road at the moment. Thanks for sharing, Peter From hlapp at gmx.net Wed Nov 26 14:28:16 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 26 Nov 2008 14:28:16 -0500 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> Message-ID: <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> On Nov 26, 2008, at 1:37 PM, Peter wrote: >> To your initial question, SQLAlchemy can handle those non-primary key >> tables without a problem by setting "primary_key = True" for all of >> the unique columns. > > Yes, SQLAlchemy seems pretty good. The only catch was for a table > with no primary key defined at all (the taxon_name table) which > required a little more work setting up the ORM mapping, but which also > seems to work fine. It has one unique key defined on (name, name_class, taxon_id). Is that not what you are seeing? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From gabrielle_doan at gmx.net Thu Nov 27 08:51:16 2008 From: gabrielle_doan at gmx.net (Gabrielle Doan) Date: Thu, 27 Nov 2008 14:51:16 +0100 Subject: [BioSQL-l] BioSQL schema of v1.0.1 Message-ID: <492EA5D4.7090809@gmx.net> Hi Hilmar, recently I've noticed that BioSQL has a new release. But in the BioSQL core schema, Release v1.0.1 there is still the BioSQL schema of v1.0.0 inside. Can you update it please? Thanks a lot. Cheers, Gabrielle From hlapp at gmx.net Thu Nov 27 12:33:25 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 27 Nov 2008 12:33:25 -0500 Subject: [BioSQL-l] BioSQL schema of v1.0.1 In-Reply-To: <492EA5D4.7090809@gmx.net> References: <492EA5D4.7090809@gmx.net> Message-ID: <3ED93BDC-CD08-4589-A881-5E0CC28EFAA9@gmx.net> Hi Gabrielle, I'm not sure what you mean. The changes that v1.0.1 introduces are also in the main trunk, the biosql-release-1_0_1 tag, and in the distribution downloadable from the website. It's easily possible though that I may have overlooked something - could you elaborate what prompted you to your conclusion below? -hilmar On Nov 27, 2008, at 8:51 AM, Gabrielle Doan wrote: > Hi Hilmar, > > recently I've noticed that BioSQL has a new release. But in the > BioSQL core schema, Release v1.0.1 there is still the BioSQL schema > of v1.0.0 inside. Can you update it please? > Thanks a lot. > > Cheers, > Gabrielle > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Nov 27 13:07:30 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 27 Nov 2008 13:07:30 -0500 Subject: [BioSQL-l] BioSQL schema of v1.0.1 In-Reply-To: <492EDD4C.3090601@gmx.net> References: <492EA5D4.7090809@gmx.net> <3ED93BDC-CD08-4589-A881-5E0CC28EFAA9@gmx.net> <492EDD4C.3090601@gmx.net> Message-ID: <605BB761-BBFC-4683-AEBA-1F1BAB22617A@gmx.net> Ahh - I see. The schema has not changed from 1.0.0 to 1.0.1 in terms of entity structure and their relations. The only schema changes were to widen the column width of bioentry.accession and dbxref.accession. A complete list of changes is in the file Changes in the root directory of the distribution. So the schema diagram and all documentation still fully apply, except for the width of those two columns, which is necessary to deal with certain annotations such as pathways. Does that resolve the issue for you? -hilmar On Nov 27, 2008, at 12:47 PM, Gabrielle Doan wrote: > Hi Hilmar, > > I downloaded v1.0.1 from the website. When I looked throught the doc > folder I found a pdf which describes the BioSQL schema v1.0 and not > v1.0.1. Was the schema v1.0.1. left off intentionally? > I'm grateful if you can give me a reply. > > cheers, > Gabrielle > > Hilmar Lapp schrieb: >> Hi Gabrielle, >> I'm not sure what you mean. The changes that v1.0.1 introduces are >> also in the main trunk, the biosql-release-1_0_1 tag, and in the >> distribution downloadable from the website. >> It's easily possible though that I may have overlooked something - >> could you elaborate what prompted you to your conclusion below? >> -hilmar >> On Nov 27, 2008, at 8:51 AM, Gabrielle Doan wrote: >>> Hi Hilmar, >>> >>> recently I've noticed that BioSQL has a new release. But in the >>> BioSQL core schema, Release v1.0.1 there is still the BioSQL >>> schema of v1.0.0 inside. Can you update it please? >>> Thanks a lot. >>> >>> Cheers, >>> Gabrielle >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Nov 28 05:43:01 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 10:43:01 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> Message-ID: <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> On Wed, Nov 26, 2008 at 7:28 PM, Hilmar Lapp wrote: > On Nov 26, 2008, at 1:37 PM, Peter wrote: >> Yes, SQLAlchemy seems pretty good. The only catch was for a table >> with no primary key defined at all (the taxon_name table) which >> required a little more work setting up the ORM mapping, but which also >> seems to work fine. > > It has one unique key defined on (name, name_class, taxon_id). Is that not > what you are seeing? > > -hilmar According to the MySQL schema, taxon_name has a unique restraint but does NOT have a primary key: CREATE TABLE taxon_name ( taxon_id INT(10) UNSIGNED NOT NULL, name VARCHAR(255) BINARY NOT NULL, name_class VARCHAR(32) BINARY NOT NULL, UNIQUE (taxon_id,name,name_class) ) TYPE=INNODB; As you said, since (taxon_id,name,name_class) is unique, this tuple can be used as a substitute primary key in the ORM mapping (which for SQLAlchemy I seem to have to do manually). SQLAlchemy would do this automatically if the schema actually used (taxon_id,name,name_class) as a primary key explicitly. i.e. Why not this: CREATE TABLE taxon_name ( taxon_id INT(10) UNSIGNED NOT NULL, name VARCHAR(255) BINARY NOT NULL, name_class VARCHAR(32) BINARY NOT NULL, PRIMARY KEY (taxon_id,name,name_class) ) TYPE=INNODB; See also this thread where I wrote: http://lists.open-bio.org/pipermail/biosql-l/2008-November/001386.html > Was there a reason why tables like taxon_name never had a > (composite/compound) primary key in the first place? Thanks, Peter From n.j.loman at bham.ac.uk Fri Nov 28 06:12:32 2008 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Fri, 28 Nov 2008 11:12:32 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> Message-ID: <492FD220.60505@bham.ac.uk> Peter wrote: >>> Yes, SQLAlchemy seems pretty good. The only catch was for a table >>> with no primary key defined at all (the taxon_name table) which >>> required a little more work setting up the ORM mapping, but which also >>> seems to work fine. >> It has one unique key defined on (name, name_class, taxon_id). Is that not >> what you are seeing? >> >> -hilmar > > According to the MySQL schema, taxon_name has a unique restraint but > does NOT have a primary key: Just to say I've had good results using the Django (http://www.djangoproject.com) ORM system with a BioSQL database. You can get going quite quickly using the Django introspection feature (configure your settings.py file and run python manage.py inspectdb to get a models file). However, the version of Django I use (not sure about latest) didn't support multi-column indexes as primary key, so I had to add another auto_increment column to use as a primary key. Cheers, Nick. From biopython at maubp.freeserve.co.uk Fri Nov 28 06:55:22 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 11:55:22 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <492FD220.60505@bham.ac.uk> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <492FD220.60505@bham.ac.uk> Message-ID: <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> Nick Loman wrote: > >> According to the MySQL schema, taxon_name has a unique restraint but >> does NOT have a primary key: > > Just to say I've had good results using the Django > (http://www.djangoproject.com) ORM system with a BioSQL database. > > You can get going quite quickly using the Django introspection feature > (configure your settings.py file and run python manage.py inspectdb to get a > models file). > > However, the version of Django I use (not sure about latest) didn't support > multi-column indexes as primary key, so I had to add another auto_increment > column to use as a primary key. When I started looking at web-frameworks and ORM, I didn't want to modify a perfectly good schema (BioSQL) just to cope with a limited tool. I investigated Django earlier this month, and rejected it because it doesn't yet support multi-column indices as primary keys. They've had an open bug on this for 3 years, with no expected date yet: http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys http://code.djangoproject.com/ticket/373 My impression is that Django's philosophy is that they expect you to define your objects which then automatically defines the database schema. Note the title of this FAQ page refers to an existing schema as a "legacy" database: http://docs.djangoproject.com/en/dev/howto/legacy-databases/ If Django can cope with an existing schema, then it does look like an excellent package, and seems well documented. I also rejected another python ORM system, SQLObjects, on similar grounds. Their documentation says "SQLObject does not support primary keys made up of multiple columns (that probably won't change)". In fact, they currently are even less flexible in that SQLObject requires an *integer* primary key on each table! This left SQLAlchemy as the remaining python ORM candidate, which seems to cope just fine with the unmodified BioSQL schema. Brad and I have reported using BioSQL with SQLAlchemy successfully (within the python web-frameworks TurboGears and Pylons respectively). Peter From raoul.bonnal at itb.cnr.it Fri Nov 28 08:51:02 2008 From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal) Date: Fri, 28 Nov 2008 14:51:02 +0100 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <492FD220.60505@bham.ac.uk> <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> Message-ID: <200811281451.03433.raoul.bonnal@itb.cnr.it> Il venerd? 28 novembre 2008 12:55:22 Peter ha scritto: > Nick Loman wrote:.... > My impression is that Django's philosophy is that they expect you to > define your objects which then automatically defines the database > schema. Note the title of this FAQ page refers to an existing schema > as a "legacy" database: > http://docs.djangoproject.com/en/dev/howto/legacy-databases/ > If Django can cope with an existing schema, then it does look like an > excellent package, and seems well documented. You are right.This, philosophy, is the same of others ORM like ActiveRecord and DataMapper. In Ruby I prefer DataMapper because is simpler and configurable than AR. Usually they have particular conventions and requirements. So choose the one which fits best with BioSQL schema without modify the schema. I wasted a lot of time digging into the API to understand how ORM handles relationships. Last, check if your ORM can handle transactions for free. Ciao. -- Ra From n.j.loman at bham.ac.uk Fri Nov 28 10:59:53 2008 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Fri, 28 Nov 2008 15:59:53 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <492FD220.60505@bham.ac.uk> <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> Message-ID: <49301579.2060700@bham.ac.uk> Peter wrote: >> However, the version of Django I use (not sure about latest) didn't support >> multi-column indexes as primary key, so I had to add another auto_increment >> column to use as a primary key. > > When I started looking at web-frameworks and ORM, I didn't want to > modify a perfectly good schema (BioSQL) just to cope with a limited > tool. Fair enough. > I investigated Django earlier this month, and rejected it because it > doesn't yet support multi-column indices as primary keys. They've had > an open bug on this for 3 years, with no expected date yet: > http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys > http://code.djangoproject.com/ticket/373 > > My impression is that Django's philosophy is that they expect you to > define your objects which then automatically defines the database > schema. Note the title of this FAQ page refers to an existing schema > as a "legacy" database: > http://docs.djangoproject.com/en/dev/howto/legacy-databases/ > If Django can cope with an existing schema, then it does look like an > excellent package, and seems well documented. You can still use Django even if you don't want to modify your database, with the caveat that certain functions (e.g. adding a new taxon via the ORM) will not work correctly. If you are just querying data that still might be sufficiently useful. And Django will let you fall back to raw SQL if you should need to at any point. I personally was extremely skeptical about an ORM because of the added level of complexity, and sometimes difficulty understanding the relationship between the generated models and the underlying database. However, Django (like Python's) basic principles of DRY and "don't do anything magic" mean that I find the results acceptable enough for my applications. I wouldn't make an argument to change BioSQL to suit Django, but I would commend Django to anyone using Python who wants an ORM - particularly if they are building a dynamic web site! Cheers, Nick. From biopython at maubp.freeserve.co.uk Fri Nov 28 11:57:47 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 16:57:47 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <49301579.2060700@bham.ac.uk> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <492FD220.60505@bham.ac.uk> <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> <49301579.2060700@bham.ac.uk> Message-ID: <320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com> >> I investigated Django earlier this month, and rejected it because it >> doesn't yet support multi-column indices as primary keys. They've had >> an open bug on this for 3 years, with no expected date yet: >> http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys >> http://code.djangoproject.com/ticket/373 >> >> My impression is that Django's philosophy is that they expect you to >> define your objects which then automatically defines the database >> schema. Note the title of this FAQ page refers to an existing schema >> as a "legacy" database: >> http://docs.djangoproject.com/en/dev/howto/legacy-databases/ >> If Django can cope with an existing schema, then it does look like an >> excellent package, and seems well documented. > > You can still use Django even if you don't want to modify your database, > with the caveat that certain functions (e.g. adding a new taxon via the ORM) > will not work correctly. If you are just querying data that still might be > sufficiently useful. Maybe - but given so much of BioSQL uses composite primary keys etc it was my impression that trying to use Django would be making life difficult for myself. If you already are familiar with Django, then perhaps this wouldn't be so bad. > I wouldn't make an argument to change BioSQL to suit Django, ... Agreed. > ... but I would commend Django to anyone using Python who wants an > ORM - particularly if they are building a dynamic web site! I agree but ONLY if you are not trying to use an existing schema with composite primary keys and/or tables with no primary key. For these SQLAlchemy seems to be the current best bet with python, leading to the choice of either TurboGears (which I went for) or Pylons (picked by Brad). Peter From n.j.loman at bham.ac.uk Fri Nov 28 12:00:58 2008 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Fri, 28 Nov 2008 17:00:58 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <492FD220.60505@bham.ac.uk> <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> <49301579.2060700@bham.ac.uk> <320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com> Message-ID: <493023CA.9030108@bham.ac.uk> Peter wrote: >> You can still use Django even if you don't want to modify your database, >> with the caveat that certain functions (e.g. adding a new taxon via the ORM) >> will not work correctly. If you are just querying data that still might be >> sufficiently useful. > > Maybe - but given so much of BioSQL uses composite primary keys etc it > was my impression that trying to use Django would be making life > difficult for myself. If you already are familiar with Django, then > perhaps this wouldn't be so bad. Depends again on what you want to do. If you wanted to knock up a quick web-based genome viewer for example you might not need to amend (or even access) the taxon table as it would be pre-populated. And if you REALLY had to, you could just fashion some SQL to do it. >> ... but I would commend Django to anyone using Python who wants an >> ORM - particularly if they are building a dynamic web site! > > I agree but ONLY if you are not trying to use an existing schema with > composite primary keys and/or tables with no primary key. For these > SQLAlchemy seems to be the current best bet with python, leading to > the choice of either TurboGears (which I went for) or Pylons (picked > by Brad). Well, I wouldn't be that prescriptive - I would say people can just use what they feel comfortable with and they can get good results from quickly. I've had good experiences with Django so I wouldn't put people off it just because of the primary key issue which can be partially solved easily enough with a single ALTER TABLE statement :) Cheers, Nick. From biopython at maubp.freeserve.co.uk Fri Nov 28 12:46:01 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 17:46:01 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <493023CA.9030108@bham.ac.uk> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <492FD220.60505@bham.ac.uk> <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> <49301579.2060700@bham.ac.uk> <320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com> <493023CA.9030108@bham.ac.uk> Message-ID: <320fb6e00811280946r391eaac9q8c54ae6a4a59595c@mail.gmail.com> > Peter wrote: >> I agree but ONLY if you are not trying to use an existing schema with >> composite primary keys and/or tables with no primary key. For these >> SQLAlchemy seems to be the current best bet with python, leading to >> the choice of either TurboGears (which I went for) or Pylons (picked >> by Brad). Nick wrote: > Well, I wouldn't be that prescriptive - I would say people can just use what > they feel comfortable with and they can get good results from quickly. I've > had good experiences with Django so I wouldn't put people off it just > because of the primary key issue which can be partially solved easily enough > with a single ALTER TABLE statement :) Maybe adding a few surrogate primary keys to tables to make Django (or your ORM of choice) happy isn't such a big deal. However, I put a lot of value on the shared standard nature of the BioSQL schema, and prefer to modify it as little as possible - not just in case I break something another software package relies on, but also to reduce long term maintenance and re-installation hassles. In your case you had existing experience with Django, while I had no prior investment in it or any other ORM tool. I can therefore understand your choice - and might even have done the same in your position. I'm not convinced that the BioSQL schema needs to be changed for v1.1.x to help ORM software either (surrogate primary keys on all tables - something mooted on the roadmap). http://www.biosql.org/wiki/Enhancement_Requests Peter From hlapp at gmx.net Fri Nov 28 13:31:55 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 28 Nov 2008 13:31:55 -0500 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> Message-ID: On Nov 28, 2008, at 5:43 AM, Peter wrote: > Why not this: > > CREATE TABLE taxon_name ( > taxon_id INT(10) UNSIGNED NOT NULL, > name VARCHAR(255) BINARY NOT NULL, > name_class VARCHAR(32) BINARY NOT NULL, > PRIMARY KEY (taxon_id,name,name_class) > ) TYPE=INNODB; It's part of the changes planned for the next release indeed. At the time this was written it didn't seem to matter much as they are really semantically equivalent, and ORM tools weren't around much at the time :-) I do hope that no-one is using a dynamically configuring ORM at run time so that this change can be a drop-in replacement that's fully backwards compatible. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Nov 28 13:41:52 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 18:41:52 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> Message-ID: <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com> On Fri, Nov 28, 2008 at 6:31 PM, Hilmar Lapp wrote: > > On Nov 28, 2008, at 5:43 AM, Peter wrote: > >> Why not this: >> >> CREATE TABLE taxon_name ( >> taxon_id INT(10) UNSIGNED NOT NULL, >> name VARCHAR(255) BINARY NOT NULL, >> name_class VARCHAR(32) BINARY NOT NULL, >> PRIMARY KEY (taxon_id,name,name_class) >> ) TYPE=INNODB; > > > It's part of the changes planned for the next release indeed. By next release, do you mean BioSQL v1.0.2 or v1.1.0 here? > At the time this was written it didn't seem to matter much as they are really > semantically equivalent, and ORM tools weren't around much at the time :-) I see - that kind of explains the reason why some tables have explicit composite primary keys, while others just have a unique set of fields. > I do hope that no-one is using a dynamically configuring ORM at run time so > that this change can be a drop-in replacement that's fully backwards > compatible. Some dynamically configuring ORM code would never have coped with these tables in the first place - so it doesn't matter here. In other cases the user can tell the ORM to treat the tuple (taxon_id,name,name_class) as a primary key - and this should still be fine even when this is explicit in the database schema. I expect (and hope) this will be a backwards compatible change. Peter From hlapp at gmx.net Fri Nov 28 13:46:26 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 28 Nov 2008 13:46:26 -0500 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com> Message-ID: <20D70002-B512-4EA0-8755-1CF00310ADC6@gmx.net> On Nov 28, 2008, at 1:41 PM, Peter wrote: >> >> It's part of the changes planned for the next release indeed. > > By next release, do you mean BioSQL v1.0.2 or v1.1.0 here? That would be 1.0.2. Otherwise there would be no need to worry about backward compatibility (as 1.1x won't be by definition). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Nov 28 13:57:40 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 18:57:40 +0000 Subject: [BioSQL-l] BioSQL and ontology "standards". Message-ID: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com> Hi all, The BioSQL schema allows multiple ontologies, so that things like entries in seqfeature_qualifier_value can say when they mean by "locus_tag". Currently BioPerl and Biopython (and I assume the other projects but haven't checked) use a couple of ad-hoc ontology names for storing annotation. In particular, if there is no predefined entry for a novel ontology term, it gets added on the fly. This is very convenient as it means a BioSQL database can be used without first importing a predefined ontology. However there are downsides, for example spelling errors in the keys of a GenBank file get treated as a ontology entries. Have these ad-hoc ontologies ever been defined? i.e. For table bioentry_qualifier_value terms, which ad-hoc ontology name should be used? Biopython uses ad-hoc ontology named 'SeqFeature Keys', 'SeqFeature Sources', 'Annotation Tags' for various different tables (which I believe is the same for BioPerl). On a related point, it might make more sense to use a predefined ontology, like SOFA or SO from http://www.sequenceontology.org/ where a novel term is treated as an error (or perhaps falls back on the ad-hoc ontology). How do the various Bio* projects cope with annotations in the database for different or multiple ontologies? Or has this not been considered? Thanks, Peter From biopython at maubp.freeserve.co.uk Fri Nov 28 15:04:33 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 20:04:33 +0000 Subject: [BioSQL-l] BioSQL and ontology "standards". In-Reply-To: <49304392.4080908@eaglegenomics.com> References: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com> <49304392.4080908@eaglegenomics.com> Message-ID: <320fb6e00811281204i3bae31e4kc18f70121244b4d1@mail.gmail.com> On Fri, Nov 28, 2008 at 7:16 PM, Richard Holland wrote: > > BioJava does what BioPerl does and pretty much makes it up as it goes > along, using whatever the input files tell it. OK, good. But which ontology names do you use for which tables? i.e. Do you also use ad-hoc ontologies named 'SeqFeature Keys', 'SeqFeature Sources' and 'Annotation Tags'? To be a little more specific, here are some examples - which I presume (hope) are all coping BioPerl's conventions. In recording a bioentry date, Biopython sets bioentry_qualifier_value.term_id to point to a term table entry "date_changed" which belongs to the ad-hoc "Annotation Tags" ontology. In recording most bioentry annotations (a list of keywords), Biopython sets bioentry_qualifier_value.term_id to point to a term table entry for that annotation type (e.g. "keywords") which belongs to the ad-hoc "Annotation Tags" ontology. In recording a seqfeature, Biopython sets seqfeature.seqfeature_key_id to point to a term table entry for that feature type (e.g. "CDS", "misc_feature", "gene") which belongs to the ad-hoc "SeqFeature Keys" ontology. Biopython always sets seqfeature.type_term_id to point to a term table entry for "EMBL/GenBank/SwissProt" within the ad-hoc "SeqFeature Sources" ontology. In recording most of a seqfeature's qualifiers (annotations), Biopython sets seqfeature_qualifier_value.term_id to point to a term table entry for the key (e.g. "locus_tag", "note", "translation") which belongs to the ad-hoc "Annotation Tags" ontology. Notice that the ad-hoc "Annotation Tags" ontology serves double duty, doing both bioentry and seqfeature annotations. This doesn't seem entirely sensible. On the other hand, when recording a seqfeature's location Biopython and BioPerl leave location.term_id as NULL (rather than using any particular ontology term). This seems arbitary. Relating to this, if we want to record a composite location type (typically "join"), we'd want to use the location_qualifier_value table. BioPerl seems to leave this table empty (presumably assuming all composite locations are joins) which is what Biopython currently does too. Here we can't just set location_qualifier_value.term_id as NULL (why not?) so we have to introduce something. The BioSQL projects should first agree what ontology term and what ontology this should be stored with. > The trouble with throwing exceptions when things don't meet standards is > that people complain when their custom files don't work, and can't be > made to work without editing the file itself. ... I'm not sure if you are talking about parsing files, or loading them into BioSQL. I agree that when parsing sometimes some leeway is required. In terms of *optionally* enforcing a strict ontology, throwing an error is a good thing if the input file doesn't follow the ontology - this indicates a problem with the file (or perhaps an out of date ontology). I would certainly leave the default behaviour as is with the ad-hoc ontologies extended on the fly. > I think the best approach is to always to use what the file says, and > trust that it's accurate. What needs to be agreed between projects is > any additional annotations that get introduced outside the context of > file parsing, and the names of the ontologies used for the file > annotations so that all projects use the same ontologies and don't > replicate them inside the BioSQL database. It would be nice to > standardise these names and the additional custom terms across the > projects, in much the same way as people tried already to standardise > the way general objects get mapped to BioSQL. This is what I am trying to get at here - documenting the existing "ad hoc" ontology usage. My impression is that it has not been documented, and that the BioPerl behaviour is the defacto BioSQL standard. I'd like to pin down this standard, and extend it for situations like the location_qualifier_value.term_id and perhaps location.term_id where BioPerl seems to ignore the ontology issue. Peter From holland at eaglegenomics.com Fri Nov 28 14:16:34 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 28 Nov 2008 19:16:34 +0000 Subject: [BioSQL-l] BioSQL and ontology "standards". In-Reply-To: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com> References: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com> Message-ID: <49304392.4080908@eaglegenomics.com> BioJava does what BioPerl does and pretty much makes it up as it goes along, using whatever the input files tell it. The trouble with throwing exceptions when things don't meet standards is that people complain when their custom files don't work, and can't be made to work without editing the file itself. By custom I mean not only things they've written themselves, but also files coming from established tools which don't follow the rules (NEXUS format is a classic example of this - the most popular tools that output NEXUS pretty much ignore the format specification). Even the standards providers themselves often don't comply with their own rules (several Genbank examples supplied from NCBI/Entrez break any parser which tries to be completely strict with the declared format). I think the best approach is to always to use what the file says, and trust that it's accurate. What needs to be agreed between projects is any additional annotations that get introduced outside the context of file parsing, and the names of the ontologies used for the file annotations so that all projects use the same ontologies and don't replicate them inside the BioSQL database. It would be nice to standardise these names and the additional custom terms across the projects, in much the same way as people tried already to standardise the way general objects get mapped to BioSQL. cheers, Richard Peter wrote: > Hi all, > > The BioSQL schema allows multiple ontologies, so that things like > entries in seqfeature_qualifier_value can say when they mean by > "locus_tag". > > Currently BioPerl and Biopython (and I assume the other projects but > haven't checked) use a couple of ad-hoc ontology names for storing > annotation. In particular, if there is no predefined entry for a > novel ontology term, it gets added on the fly. This is very > convenient as it means a BioSQL database can be used without first > importing a predefined ontology. However there are downsides, for > example spelling errors in the keys of a GenBank file get treated as a > ontology entries. > > Have these ad-hoc ontologies ever been defined? i.e. For table > bioentry_qualifier_value terms, which ad-hoc ontology name should be > used? Biopython uses ad-hoc ontology named 'SeqFeature Keys', > 'SeqFeature Sources', 'Annotation Tags' for various different tables > (which I believe is the same for BioPerl). > > On a related point, it might make more sense to use a predefined > ontology, like SOFA or SO from http://www.sequenceontology.org/ where > a novel term is treated as an error (or perhaps falls back on the > ad-hoc ontology). How do the various Bio* projects cope with > annotations in the database for different or multiple ontologies? Or > has this not been considered? > > Thanks, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From d.m.a.martin at dundee.ac.uk Tue Nov 25 06:09:08 2008 From: d.m.a.martin at dundee.ac.uk (David Martin) Date: Tue, 25 Nov 2008 11:09:08 +0000 Subject: [BioSQL-l] Passwords on biosql databases Message-ID: <492BDCF2.6F09.00E0.0@dundee.ac.uk> I have set up a biosql database on Postgres. The Bio::DB::BioDB module croaks complaining that it needs the password. I have tried the obvious things (-password -passwd and reading what docs I could find) but to no avail. Any clues? Assuming the database is on postgres and is called biosql with user biosqluser and password biosqlpassword I have been trying: my $dbadp = Bio::DB::BioDB->new(-database => 'biosql', -user => 'biosqluser', -dbname => 'biosql', -host => 'postgres', -passwd=>'biosqlpassword', -driver => 'Pg'); regards ..d David Martin PhD College of Life Sciences University of Dundee The University of Dundee is a Scottish Registered Charity, No. SC015096. The University of Dundee is a registered Scottish charity, No: SC015096 From chapmanb at 50mail.com Tue Nov 25 16:16:22 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 25 Nov 2008 16:16:22 -0500 Subject: [BioSQL-l] Python ORM mapping for BioSQL Message-ID: <20081125211622.GE83220@sobchak.mgh.harvard.edu> Hi Peter; Hope all is going well with you. I was glancing at the BioSQL mailing list archives last night and saw your messages earlier this month about using an ORM mapper with BioSQL. Some of my current work is using a BioSQL storage backend with a javascript web interface. The middleware uses Pylons and SQLAlchemy. This uses some parts of BioSQL not well represented via an object front end like bioentry_relationship, and so it has been convenient to work with these via SQLAlchemy directly. To your initial question, SQLAlchemy can handle those non-primary key tables without a problem by setting "primary_key = True" for all of the unique columns. What I have done thus far is definitely non-complete, and also includes some add-on tables for storing experimental data linked to BioSQL. However, I am attaching it here just to give you an idea (init.py is the __init__.py of the module). You would use it like: from Wherever.BioSQL import get_session, biosql session = get_session("production") entries = session.query(biosql.Bioentry).filter_by(identifier = "A12345") If you, or anyone else, is developing something similar, I'd be happy to help with something generalized. Brad -------------- next part -------------- A non-text attachment was scrubbed... Name: init.py Type: text/x-python Size: 1116 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BioSQL.py Type: text/x-python Size: 6881 bytes Desc: not available URL: From biopython at maubp.freeserve.co.uk Thu Nov 6 11:53:13 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Nov 2008 11:53:13 +0000 Subject: [BioSQL-l] Tables without a (composite) primary key Message-ID: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> I've recently been looking into some object-relational mappers which caused me to look more closely at the BioSQL schema. Many of these packages require a primary key, but not all can cope with a composite primary key. However, some BioSQL tables don't have any primary key at all. Several BioSQL tables have composite primary keys, for example the term_dbxref table has a composite key of (term_id, dbxref_id), and also an index on dbxref_id as well. However, some BioSQL tables do not have a primary key, for example: -- corresponds to the names table of the NCBI taxonomy databaase CREATE TABLE taxon_name ( taxon_id INT(10) UNSIGNED NOT NULL, name VARCHAR(255) BINARY NOT NULL, name_class VARCHAR(32) BINARY NOT NULL, UNIQUE (taxon_id,name,name_class) ) TYPE=INNODB; CREATE INDEX taxnametaxonid ON taxon_name(taxon_id); CREATE INDEX taxnamename ON taxon_name(name); Why don't taxon_name, bioentry_path, term_relationship, bioentry_qualifier_value, seqfeature_path have a primary key (just a uniqueness criteria)? Thanks, Peter From biopython at maubp.freeserve.co.uk Thu Nov 6 12:37:42 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Nov 2008 12:37:42 +0000 Subject: [BioSQL-l] Tables without a (composite) primary key In-Reply-To: <200811061323.43749.raoul.bonnal@itb.cnr.it> References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> <200811061323.43749.raoul.bonnal@itb.cnr.it> Message-ID: <320fb6e00811060437h367804c7y6d46ed36d1b619ae@mail.gmail.com> On Thu, Nov 6, 2008 at 12:23 PM, Raoul Jean Pierre Bonnal wrote: > > Dear Peter, > I'm writing the wrapper for BioRuby using DataMapper an ORM (Active Record is > similar). Hi Raul, I'm looking at a python based ORM to use with BioSQL. [The existing Biopython BioSQL bridge uses raw SQL to turn the sequences and features into Biopython objects - this all seems to work fine, but it doesn't offer the full flexibility of an ORM framework.] > > I think we can cosider to move or branch BioSQL' schema to the approach > suggested by this kind of ORMs, with a pk for every table named "id" and a > table name in plural. Fk names are quite correct. > I don't think it makes sense to add a single primary key to many of these tables (e.g. term_dbxref). The existing composite primary keys seem fine (its just a shame some ORMs can't cope). I was thinking the tables currently lacking any primary key could get one (based on the current UNIQUE rule). So for example, the taxon_name could use (taxon_id,name,name_class) as its primary key. I don't know how big a change this would be - but superficially it looks backwards compatible. This is why I was asking why they didn't have PK in the first place. > > PS: DataMapper handles very well composite PK, much better tha ActiveRecord. > For python, currently Django and also I believe SQLObjects don't support composite primary keys. I'll take a look at SQLAlchemy next which should cope better. Peter From hlapp at gmx.net Thu Nov 6 21:39:55 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Nov 2008 16:39:55 -0500 Subject: [BioSQL-l] Tables without a (composite) primary key In-Reply-To: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> Message-ID: <082DB965-12D9-4941-BF46-E49359C8C096@gmx.net> Hi Peter, it's a known enhancement request. I know that some ORMs have trouble reverse engineering the mapping if there is no PK defined. Semantically, however, in the absence of a primary key constraint the first unique key constraint is equivalent to a primary key (in fact some ER modeling tools will automatically do the conversion); unique keys are also called alternate keys (alternate to the primary key). So for now feel free to either change the UK constraint to PK where there is no PK defined and your reverse engineering tool needs it. If you don't use a reverse engineering tool, just set the columns of the UK constraint as the compound primary key if there isn't a surrogate PK. BioSQL 1.1+ will have surrogate PKs on all tables, but this change may not be backwards compatible for existing language bindings, which is why I'd like to make those changes first that should be fully backwards compatible. -hilmar On Nov 6, 2008, at 6:53 AM, Peter wrote: > I've recently been looking into some object-relational mappers which > caused me to look more closely at the BioSQL schema. Many of these > packages require a primary key, but not all can cope with a composite > primary key. However, some BioSQL tables don't have any primary key > at all. > > Several BioSQL tables have composite primary keys, for example the > term_dbxref table has a composite key of (term_id, dbxref_id), and > also an index on dbxref_id as well. > > However, some BioSQL tables do not have a primary key, for example: > > -- corresponds to the names table of the NCBI taxonomy databaase > CREATE TABLE taxon_name ( > taxon_id INT(10) UNSIGNED NOT NULL, > name VARCHAR(255) BINARY NOT NULL, > name_class VARCHAR(32) BINARY NOT NULL, > UNIQUE (taxon_id,name,name_class) > ) TYPE=INNODB; > > CREATE INDEX taxnametaxonid ON taxon_name(taxon_id); > CREATE INDEX taxnamename ON taxon_name(name); > > Why don't taxon_name, bioentry_path, term_relationship, > bioentry_qualifier_value, seqfeature_path have a primary key (just a > uniqueness criteria)? > > Thanks, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Thu Nov 6 22:12:28 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Nov 2008 22:12:28 +0000 Subject: [BioSQL-l] Tables without a (composite) primary key In-Reply-To: <082DB965-12D9-4941-BF46-E49359C8C096@gmx.net> References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> <082DB965-12D9-4941-BF46-E49359C8C096@gmx.net> Message-ID: <320fb6e00811061412s7516b573uf26d5042e193fa45@mail.gmail.com> On Thu, Nov 6, 2008 at 9:39 PM, Hilmar Lapp wrote: > > Hi Peter, > > it's a known enhancement request. I know that some ORMs have trouble reverse > engineering the mapping if there is no PK defined. Oh right, "Surrogate primary keys on all tables" on this page: http://www.biosql.org/wiki/Enhancement_Requests > Semantically, however, in the absence of a primary key constraint the first > unique key constraint is equivalent to a primary key (in fact some ER > modeling tools will automatically do the conversion); unique keys are also > called alternate keys (alternate to the primary key). > > So for now feel free to either change the UK constraint to PK where there is > no PK defined and your reverse engineering tool needs it. If you don't use a > reverse engineering tool, just set the columns of the UK constraint as the > compound primary key if there isn't a surrogate PK. OK - I'll bear that in mind. > BioSQL 1.1+ will have surrogate PKs on all tables, but this change may not > be backwards compatible for existing language bindings, which is why I'd > like to make those changes first that should be fully backwards compatible. That sounds sensible. Thanks Hilmar! Peter P.S. Is there any agreed terminology: compound primary key versus composite primary key? From raoul.bonnal at itb.cnr.it Thu Nov 6 12:23:43 2008 From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal) Date: Thu, 06 Nov 2008 13:23:43 +0100 Subject: [BioSQL-l] Tables without a (composite) primary key In-Reply-To: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> Message-ID: <200811061323.43749.raoul.bonnal@itb.cnr.it> Dear Peter, I'm writing the wrapper for BioRuby using DataMapper an ORM (Active Record is similar). I think we can cosider to move or branch BioSQL' schema to the approach suggested by this kind of ORMs, with a pk for every table named "id" and a table name in plural. Fk names are quite correct. PS: DataMapper handles very well composite PK, much better tha ActiveRecord. Il gioved? 06 novembre 2008 12:53:13 Peter ha scritto: > I've recently been looking into some object-relational mappers which > caused me to look more closely at the BioSQL schema. Many of these > packages require a primary key, but not all can cope with a composite > primary key. However, some BioSQL tables don't have any primary key > at all. > > Several BioSQL tables have composite primary keys, for example the > term_dbxref table has a composite key of (term_id, dbxref_id), and > also an index on dbxref_id as well. > > However, some BioSQL tables do not have a primary key, for example: > > -- corresponds to the names table of the NCBI taxonomy databaase > CREATE TABLE taxon_name ( > taxon_id INT(10) UNSIGNED NOT NULL, > name VARCHAR(255) BINARY NOT NULL, > name_class VARCHAR(32) BINARY NOT NULL, > UNIQUE (taxon_id,name,name_class) > ) TYPE=INNODB; > > CREATE INDEX taxnametaxonid ON taxon_name(taxon_id); > CREATE INDEX taxnamename ON taxon_name(name); > > Why don't taxon_name, bioentry_path, term_relationship, > bioentry_qualifier_value, seqfeature_path have a primary key (just a > uniqueness criteria)? > > Thanks, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From mark.schreiber at novartis.com Fri Nov 7 02:07:12 2008 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 7 Nov 2008 10:07:12 +0800 Subject: [BioSQL-l] Tables without a (composite) primary key In-Reply-To: <200811061323.43749.raoul.bonnal@itb.cnr.it> Message-ID: Hi - In the Java JPA it is possible to use an embedded object as a primary key. This gets you around the situations where the primary key is composite. It also effectively gets you around those tables where there is no key but it is implicit as all the fields are unique (as in taxon_name). What you end up with is an object that holds taxon_id, name, name_class, and inside that object you have an embedded key object that contains the same three fields. In this way any changes that are made to the object are still associated with the original row via the unchanged embedded PK object and are updated accordingly. While I agree that an explicit PK's for all BioSQL tables would be nicer for ORM frameworks many frameworks have ways to get around this, possibly those in Ruby or Python do as well. - Mark biosql-l-bounces at lists.open-bio.org wrote on 11/06/2008 08:23:43 PM: > Dear Peter, > I'm writing the wrapper for BioRuby using DataMapper an ORM (Active Record is > similar). > > I think we can cosider to move or branch BioSQL' schema to the approach > suggested by this kind of ORMs, with a pk for every table named "id" and a > table name in plural. Fk names are quite correct. > > PS: DataMapper handles very well composite PK, much better tha ActiveRecord. > > Il gioved? 06 novembre 2008 12:53:13 Peter ha scritto: > > I've recently been looking into some object-relational mappers which > > caused me to look more closely at the BioSQL schema. Many of these > > packages require a primary key, but not all can cope with a composite > > primary key. However, some BioSQL tables don't have any primary key > > at all. > > > > Several BioSQL tables have composite primary keys, for example the > > term_dbxref table has a composite key of (term_id, dbxref_id), and > > also an index on dbxref_id as well. > > > > However, some BioSQL tables do not have a primary key, for example: > > > > -- corresponds to the names table of the NCBI taxonomy databaase > > CREATE TABLE taxon_name ( > > taxon_id INT(10) UNSIGNED NOT NULL, > > name VARCHAR(255) BINARY NOT NULL, > > name_class VARCHAR(32) BINARY NOT NULL, > > UNIQUE (taxon_id,name,name_class) > > ) TYPE=INNODB; > > > > CREATE INDEX taxnametaxonid ON taxon_name(taxon_id); > > CREATE INDEX taxnamename ON taxon_name(name); > > > > Why don't taxon_name, bioentry_path, term_relationship, > > bioentry_qualifier_value, seqfeature_path have a primary key (just a > > uniqueness criteria)? > > > > Thanks, > > > > Peter > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From biopython at maubp.freeserve.co.uk Fri Nov 7 18:35:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Nov 2008 18:35:31 +0000 Subject: [BioSQL-l] Tables without a (composite) primary key In-Reply-To: <320fb6e00811061412s7516b573uf26d5042e193fa45@mail.gmail.com> References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com> <082DB965-12D9-4941-BF46-E49359C8C096@gmx.net> <320fb6e00811061412s7516b573uf26d5042e193fa45@mail.gmail.com> Message-ID: <320fb6e00811071035y496ea4d8p93aa0f54633950f@mail.gmail.com> I've ruled out using Django v1.0 and the current version of SQLObjects with BioSQL as they don't (yet) support composite primary keys. However, SQLAlchemy 0.5.0 seems to be happy with the current BioSQL schema as is :) http://www.djangoproject.com/ http://www.sqlobject.org/ http://www.sqlalchemy.org/ Hilmar: >> BioSQL 1.1+ will have surrogate PKs on all tables, but this change may not >> be backwards compatible for existing language bindings, which is why I'd >> like to make those changes first that should be fully backwards compatible. Peter: > That sounds sensible. Actually I may have initially misunderstood you. Are you saying for tables which already have a composite primary key (e.g. term_dbxref) you plan to add/replace this with a surrogate (single column) PK - just to accommodate certain simplistic ORMs? I'm not so keen on this, it seems like an invasive change with little benefit, but potentially making lots of work updating the Bio* bindings. However, would the smaller step of adding composite primary keys to tables currently lacking them be possible on the BioSQL v1.0.x roadmap? e.g. for taxon_name using (taxon_id,name,name_class) as the composite primary key, currently specified to be unique. Or might this also cause trouble for the Bio* binding? If this was possible, it *might* be useful for certain ORMs. Was there a reason why tables like taxon_name never had a (composite/compound) primary key in the first place? Peter From biopython at maubp.freeserve.co.uk Fri Nov 14 20:48:02 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 14 Nov 2008 20:48:02 +0000 Subject: [BioSQL-l] parent_taxon_id of a root node In-Reply-To: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com> References: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com> Message-ID: <320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com> On Fri, Oct 3, 2008m, I wrote: > > Hello all, > > I was puzzled to find the BioSQL script load_ncbi_taxonomy.pl will set > the parent_taxon_id of the NCBI root node in the taxon table to point > to itself. I would have expected this to be NULL indicating no > parent. If someone is using the database directly, extracting a > lineage could trigger an infinite loop. Can anyone explain the > rational here? > > Note that when Biopython adds entries to the taxon table, it uses NULL > for a root node. When retrieving sequences from a BioSQL database, > Biopython does cope with a root node with a NULL parent or a > self-parent - would it safe to assume BioPerl and Java can also cope > with both situations? > > Thanks, > > Peter > Hi again, I thought I'd raise this question again (as I didn't see any response last time), as I've just been bitten by the self-parent taxon problem this afternoon. This was for a simple webfront end to part of a BioSQL database using SQLAlchemy in python - but that's not important. I was using a simple loop to build up lineages, which was working fine until I ran load_ncbi_taxonomy.pl and suddenly my program seemed to just time out. I'd forgotten about the self-parent root nodes used by load_ncbi_taxonomy.pl which had triggered an infinite loop. I hit another (less serious) problem stemming for these self-parent root nodes when I wanted to generate a list of sub-lineages (child entries), essentially: SELECT * FROM taxon WHERE parent_taxon_id=12345; When calling this on a root node, I had to modify this to explicitly exclude itself from the children: SELECT * FROM taxon WHERE parent_taxon_id=12345 AND taxon_id<>12345; So to repeat my earlier question, is there a reason why parent_taxon_id isn't just NULL for root nodes? Was this a deliberate design choice - because if not, I think this could be regarded as a bug in load_ncbi_taxonomy.pl. Thanks Peter From hlapp at gmx.net Sat Nov 15 18:34:45 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Nov 2008 13:34:45 -0500 Subject: [BioSQL-l] parent_taxon_id of a root node In-Reply-To: <320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com> References: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com> <320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com> Message-ID: Sorry Peter - it looks like this slipped my attention (Oct was crazy). Thanks for raising it again. I agree with you, this looks like a bug. Would you mind filing it? It's possible that has secretly been assumed as policy and hence led to some people identifying the root node by equating parent and taxon_id, but surely this sounds like the wrong way of doing it, so it deserves fixing. -hilmar On Nov 14, 2008, at 3:48 PM, Peter wrote: > On Fri, Oct 3, 2008m, I wrote: >> >> Hello all, >> >> I was puzzled to find the BioSQL script load_ncbi_taxonomy.pl will >> set >> the parent_taxon_id of the NCBI root node in the taxon table to point >> to itself. I would have expected this to be NULL indicating no >> parent. If someone is using the database directly, extracting a >> lineage could trigger an infinite loop. Can anyone explain the >> rational here? >> >> Note that when Biopython adds entries to the taxon table, it uses >> NULL >> for a root node. When retrieving sequences from a BioSQL database, >> Biopython does cope with a root node with a NULL parent or a >> self-parent - would it safe to assume BioPerl and Java can also cope >> with both situations? >> >> Thanks, >> >> Peter >> > > Hi again, > > I thought I'd raise this question again (as I didn't see any response > last time), as I've just been bitten by the self-parent taxon problem > this afternoon. This was for a simple webfront end to part of a > BioSQL database using SQLAlchemy in python - but that's not important. > > I was using a simple loop to build up lineages, which was working fine > until I ran load_ncbi_taxonomy.pl and suddenly my program seemed to > just time out. I'd forgotten about the self-parent root nodes used by > load_ncbi_taxonomy.pl which had triggered an infinite loop. > > I hit another (less serious) problem stemming for these self-parent > root nodes when I wanted to generate a list of sub-lineages (child > entries), essentially: > > SELECT * FROM taxon WHERE parent_taxon_id=12345; > > When calling this on a root node, I had to modify this to explicitly > exclude itself from the children: > > SELECT * FROM taxon WHERE parent_taxon_id=12345 AND taxon_id<>12345; > > So to repeat my earlier question, is there a reason why > parent_taxon_id isn't just NULL for root nodes? Was this a deliberate > design choice - because if not, I think this could be regarded as a > bug in load_ncbi_taxonomy.pl. > > Thanks > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Sun Nov 16 14:58:20 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 16 Nov 2008 14:58:20 +0000 Subject: [BioSQL-l] parent_taxon_id of a root node In-Reply-To: References: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com> <320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com> Message-ID: <320fb6e00811160658s6282022by3681364e14aecf69@mail.gmail.com> On Sat, Nov 15, 2008 at 6:34 PM, Hilmar Lapp wrote: > > Sorry Peter - it looks like this slipped my attention (Oct was crazy). > Thanks for raising it again. I agree with you, this looks like a bug. Would > you mind filing it? Sure, http://bugzilla.open-bio.org/show_bug.cgi?id=2664 > It's possible that has secretly been assumed as policy and hence led to some > people identifying the root node by equating parent and taxon_id, but surely > this sounds like the wrong way of doing it, so it deserves fixing. In the short term we should just make sure all the Bio* projects can cope with either style root node (Biopython can), but in the long term are self parent taxon entries something that could be banned via the schema? Regards, Peter From biopython at maubp.freeserve.co.uk Wed Nov 26 18:37:51 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 26 Nov 2008 18:37:51 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <20081125211622.GE83220@sobchak.mgh.harvard.edu> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> On Tue, Nov 25, 2008 at 9:16 PM, Brad Chapman wrote: > Hi Peter; > Hope all is going well with you. I was glancing at the BioSQL > mailing list archives last night and saw your messages earlier this > month about using an ORM mapper with BioSQL. > > Some of my current work is using a BioSQL storage backend with a > javascript web interface. The middleware uses Pylons and SQLAlchemy. > This uses some parts of BioSQL not well represented via an object front > end like bioentry_relationship, and so it has been convenient to work > with these via SQLAlchemy directly. I've been using TurboGears with SQLAlchemy, and so far everything has been OK. This was essentially independent of the Biopython BioSQL mapping. > To your initial question, SQLAlchemy can handle those non-primary key > tables without a problem by setting "primary_key = True" for all of > the unique columns. Yes, SQLAlchemy seems pretty good. The only catch was for a table with no primary key defined at all (the taxon_name table) which required a little more work setting up the ORM mapping, but which also seems to work fine. > What I have done thus far is definitely non-complete, and also > includes some add-on tables for storing experimental data linked to > BioSQL. However, I am attaching it here just to give you an idea > (init.py is the __init__.py of the module). You would use it like: > > from Wherever.BioSQL import get_session, biosql > > session = get_session("production") > entries = session.query(biosql.Bioentry).filter_by(identifier = "A12345") > > If you, or anyone else, is developing something similar, > I'd be happy to help with something generalized. > > Brad > I'll probably take a look at this next week - I'm on the road at the moment. Thanks for sharing, Peter From hlapp at gmx.net Wed Nov 26 19:28:16 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 26 Nov 2008 14:28:16 -0500 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> Message-ID: <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> On Nov 26, 2008, at 1:37 PM, Peter wrote: >> To your initial question, SQLAlchemy can handle those non-primary key >> tables without a problem by setting "primary_key = True" for all of >> the unique columns. > > Yes, SQLAlchemy seems pretty good. The only catch was for a table > with no primary key defined at all (the taxon_name table) which > required a little more work setting up the ORM mapping, but which also > seems to work fine. It has one unique key defined on (name, name_class, taxon_id). Is that not what you are seeing? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From gabrielle_doan at gmx.net Thu Nov 27 13:51:16 2008 From: gabrielle_doan at gmx.net (Gabrielle Doan) Date: Thu, 27 Nov 2008 14:51:16 +0100 Subject: [BioSQL-l] BioSQL schema of v1.0.1 Message-ID: <492EA5D4.7090809@gmx.net> Hi Hilmar, recently I've noticed that BioSQL has a new release. But in the BioSQL core schema, Release v1.0.1 there is still the BioSQL schema of v1.0.0 inside. Can you update it please? Thanks a lot. Cheers, Gabrielle From hlapp at gmx.net Thu Nov 27 17:33:25 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 27 Nov 2008 12:33:25 -0500 Subject: [BioSQL-l] BioSQL schema of v1.0.1 In-Reply-To: <492EA5D4.7090809@gmx.net> References: <492EA5D4.7090809@gmx.net> Message-ID: <3ED93BDC-CD08-4589-A881-5E0CC28EFAA9@gmx.net> Hi Gabrielle, I'm not sure what you mean. The changes that v1.0.1 introduces are also in the main trunk, the biosql-release-1_0_1 tag, and in the distribution downloadable from the website. It's easily possible though that I may have overlooked something - could you elaborate what prompted you to your conclusion below? -hilmar On Nov 27, 2008, at 8:51 AM, Gabrielle Doan wrote: > Hi Hilmar, > > recently I've noticed that BioSQL has a new release. But in the > BioSQL core schema, Release v1.0.1 there is still the BioSQL schema > of v1.0.0 inside. Can you update it please? > Thanks a lot. > > Cheers, > Gabrielle > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Nov 27 18:07:30 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 27 Nov 2008 13:07:30 -0500 Subject: [BioSQL-l] BioSQL schema of v1.0.1 In-Reply-To: <492EDD4C.3090601@gmx.net> References: <492EA5D4.7090809@gmx.net> <3ED93BDC-CD08-4589-A881-5E0CC28EFAA9@gmx.net> <492EDD4C.3090601@gmx.net> Message-ID: <605BB761-BBFC-4683-AEBA-1F1BAB22617A@gmx.net> Ahh - I see. The schema has not changed from 1.0.0 to 1.0.1 in terms of entity structure and their relations. The only schema changes were to widen the column width of bioentry.accession and dbxref.accession. A complete list of changes is in the file Changes in the root directory of the distribution. So the schema diagram and all documentation still fully apply, except for the width of those two columns, which is necessary to deal with certain annotations such as pathways. Does that resolve the issue for you? -hilmar On Nov 27, 2008, at 12:47 PM, Gabrielle Doan wrote: > Hi Hilmar, > > I downloaded v1.0.1 from the website. When I looked throught the doc > folder I found a pdf which describes the BioSQL schema v1.0 and not > v1.0.1. Was the schema v1.0.1. left off intentionally? > I'm grateful if you can give me a reply. > > cheers, > Gabrielle > > Hilmar Lapp schrieb: >> Hi Gabrielle, >> I'm not sure what you mean. The changes that v1.0.1 introduces are >> also in the main trunk, the biosql-release-1_0_1 tag, and in the >> distribution downloadable from the website. >> It's easily possible though that I may have overlooked something - >> could you elaborate what prompted you to your conclusion below? >> -hilmar >> On Nov 27, 2008, at 8:51 AM, Gabrielle Doan wrote: >>> Hi Hilmar, >>> >>> recently I've noticed that BioSQL has a new release. But in the >>> BioSQL core schema, Release v1.0.1 there is still the BioSQL >>> schema of v1.0.0 inside. Can you update it please? >>> Thanks a lot. >>> >>> Cheers, >>> Gabrielle >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Nov 28 10:43:01 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 10:43:01 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> Message-ID: <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> On Wed, Nov 26, 2008 at 7:28 PM, Hilmar Lapp wrote: > On Nov 26, 2008, at 1:37 PM, Peter wrote: >> Yes, SQLAlchemy seems pretty good. The only catch was for a table >> with no primary key defined at all (the taxon_name table) which >> required a little more work setting up the ORM mapping, but which also >> seems to work fine. > > It has one unique key defined on (name, name_class, taxon_id). Is that not > what you are seeing? > > -hilmar According to the MySQL schema, taxon_name has a unique restraint but does NOT have a primary key: CREATE TABLE taxon_name ( taxon_id INT(10) UNSIGNED NOT NULL, name VARCHAR(255) BINARY NOT NULL, name_class VARCHAR(32) BINARY NOT NULL, UNIQUE (taxon_id,name,name_class) ) TYPE=INNODB; As you said, since (taxon_id,name,name_class) is unique, this tuple can be used as a substitute primary key in the ORM mapping (which for SQLAlchemy I seem to have to do manually). SQLAlchemy would do this automatically if the schema actually used (taxon_id,name,name_class) as a primary key explicitly. i.e. Why not this: CREATE TABLE taxon_name ( taxon_id INT(10) UNSIGNED NOT NULL, name VARCHAR(255) BINARY NOT NULL, name_class VARCHAR(32) BINARY NOT NULL, PRIMARY KEY (taxon_id,name,name_class) ) TYPE=INNODB; See also this thread where I wrote: http://lists.open-bio.org/pipermail/biosql-l/2008-November/001386.html > Was there a reason why tables like taxon_name never had a > (composite/compound) primary key in the first place? Thanks, Peter From n.j.loman at bham.ac.uk Fri Nov 28 11:12:32 2008 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Fri, 28 Nov 2008 11:12:32 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> Message-ID: <492FD220.60505@bham.ac.uk> Peter wrote: >>> Yes, SQLAlchemy seems pretty good. The only catch was for a table >>> with no primary key defined at all (the taxon_name table) which >>> required a little more work setting up the ORM mapping, but which also >>> seems to work fine. >> It has one unique key defined on (name, name_class, taxon_id). Is that not >> what you are seeing? >> >> -hilmar > > According to the MySQL schema, taxon_name has a unique restraint but > does NOT have a primary key: Just to say I've had good results using the Django (http://www.djangoproject.com) ORM system with a BioSQL database. You can get going quite quickly using the Django introspection feature (configure your settings.py file and run python manage.py inspectdb to get a models file). However, the version of Django I use (not sure about latest) didn't support multi-column indexes as primary key, so I had to add another auto_increment column to use as a primary key. Cheers, Nick. From biopython at maubp.freeserve.co.uk Fri Nov 28 11:55:22 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 11:55:22 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <492FD220.60505@bham.ac.uk> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <492FD220.60505@bham.ac.uk> Message-ID: <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> Nick Loman wrote: > >> According to the MySQL schema, taxon_name has a unique restraint but >> does NOT have a primary key: > > Just to say I've had good results using the Django > (http://www.djangoproject.com) ORM system with a BioSQL database. > > You can get going quite quickly using the Django introspection feature > (configure your settings.py file and run python manage.py inspectdb to get a > models file). > > However, the version of Django I use (not sure about latest) didn't support > multi-column indexes as primary key, so I had to add another auto_increment > column to use as a primary key. When I started looking at web-frameworks and ORM, I didn't want to modify a perfectly good schema (BioSQL) just to cope with a limited tool. I investigated Django earlier this month, and rejected it because it doesn't yet support multi-column indices as primary keys. They've had an open bug on this for 3 years, with no expected date yet: http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys http://code.djangoproject.com/ticket/373 My impression is that Django's philosophy is that they expect you to define your objects which then automatically defines the database schema. Note the title of this FAQ page refers to an existing schema as a "legacy" database: http://docs.djangoproject.com/en/dev/howto/legacy-databases/ If Django can cope with an existing schema, then it does look like an excellent package, and seems well documented. I also rejected another python ORM system, SQLObjects, on similar grounds. Their documentation says "SQLObject does not support primary keys made up of multiple columns (that probably won't change)". In fact, they currently are even less flexible in that SQLObject requires an *integer* primary key on each table! This left SQLAlchemy as the remaining python ORM candidate, which seems to cope just fine with the unmodified BioSQL schema. Brad and I have reported using BioSQL with SQLAlchemy successfully (within the python web-frameworks TurboGears and Pylons respectively). Peter From raoul.bonnal at itb.cnr.it Fri Nov 28 13:51:02 2008 From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal) Date: Fri, 28 Nov 2008 14:51:02 +0100 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <492FD220.60505@bham.ac.uk> <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> Message-ID: <200811281451.03433.raoul.bonnal@itb.cnr.it> Il venerd? 28 novembre 2008 12:55:22 Peter ha scritto: > Nick Loman wrote:.... > My impression is that Django's philosophy is that they expect you to > define your objects which then automatically defines the database > schema. Note the title of this FAQ page refers to an existing schema > as a "legacy" database: > http://docs.djangoproject.com/en/dev/howto/legacy-databases/ > If Django can cope with an existing schema, then it does look like an > excellent package, and seems well documented. You are right.This, philosophy, is the same of others ORM like ActiveRecord and DataMapper. In Ruby I prefer DataMapper because is simpler and configurable than AR. Usually they have particular conventions and requirements. So choose the one which fits best with BioSQL schema without modify the schema. I wasted a lot of time digging into the API to understand how ORM handles relationships. Last, check if your ORM can handle transactions for free. Ciao. -- Ra From n.j.loman at bham.ac.uk Fri Nov 28 15:59:53 2008 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Fri, 28 Nov 2008 15:59:53 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <492FD220.60505@bham.ac.uk> <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> Message-ID: <49301579.2060700@bham.ac.uk> Peter wrote: >> However, the version of Django I use (not sure about latest) didn't support >> multi-column indexes as primary key, so I had to add another auto_increment >> column to use as a primary key. > > When I started looking at web-frameworks and ORM, I didn't want to > modify a perfectly good schema (BioSQL) just to cope with a limited > tool. Fair enough. > I investigated Django earlier this month, and rejected it because it > doesn't yet support multi-column indices as primary keys. They've had > an open bug on this for 3 years, with no expected date yet: > http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys > http://code.djangoproject.com/ticket/373 > > My impression is that Django's philosophy is that they expect you to > define your objects which then automatically defines the database > schema. Note the title of this FAQ page refers to an existing schema > as a "legacy" database: > http://docs.djangoproject.com/en/dev/howto/legacy-databases/ > If Django can cope with an existing schema, then it does look like an > excellent package, and seems well documented. You can still use Django even if you don't want to modify your database, with the caveat that certain functions (e.g. adding a new taxon via the ORM) will not work correctly. If you are just querying data that still might be sufficiently useful. And Django will let you fall back to raw SQL if you should need to at any point. I personally was extremely skeptical about an ORM because of the added level of complexity, and sometimes difficulty understanding the relationship between the generated models and the underlying database. However, Django (like Python's) basic principles of DRY and "don't do anything magic" mean that I find the results acceptable enough for my applications. I wouldn't make an argument to change BioSQL to suit Django, but I would commend Django to anyone using Python who wants an ORM - particularly if they are building a dynamic web site! Cheers, Nick. From biopython at maubp.freeserve.co.uk Fri Nov 28 16:57:47 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 16:57:47 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <49301579.2060700@bham.ac.uk> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <492FD220.60505@bham.ac.uk> <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> <49301579.2060700@bham.ac.uk> Message-ID: <320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com> >> I investigated Django earlier this month, and rejected it because it >> doesn't yet support multi-column indices as primary keys. They've had >> an open bug on this for 3 years, with no expected date yet: >> http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys >> http://code.djangoproject.com/ticket/373 >> >> My impression is that Django's philosophy is that they expect you to >> define your objects which then automatically defines the database >> schema. Note the title of this FAQ page refers to an existing schema >> as a "legacy" database: >> http://docs.djangoproject.com/en/dev/howto/legacy-databases/ >> If Django can cope with an existing schema, then it does look like an >> excellent package, and seems well documented. > > You can still use Django even if you don't want to modify your database, > with the caveat that certain functions (e.g. adding a new taxon via the ORM) > will not work correctly. If you are just querying data that still might be > sufficiently useful. Maybe - but given so much of BioSQL uses composite primary keys etc it was my impression that trying to use Django would be making life difficult for myself. If you already are familiar with Django, then perhaps this wouldn't be so bad. > I wouldn't make an argument to change BioSQL to suit Django, ... Agreed. > ... but I would commend Django to anyone using Python who wants an > ORM - particularly if they are building a dynamic web site! I agree but ONLY if you are not trying to use an existing schema with composite primary keys and/or tables with no primary key. For these SQLAlchemy seems to be the current best bet with python, leading to the choice of either TurboGears (which I went for) or Pylons (picked by Brad). Peter From n.j.loman at bham.ac.uk Fri Nov 28 17:00:58 2008 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Fri, 28 Nov 2008 17:00:58 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <492FD220.60505@bham.ac.uk> <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> <49301579.2060700@bham.ac.uk> <320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com> Message-ID: <493023CA.9030108@bham.ac.uk> Peter wrote: >> You can still use Django even if you don't want to modify your database, >> with the caveat that certain functions (e.g. adding a new taxon via the ORM) >> will not work correctly. If you are just querying data that still might be >> sufficiently useful. > > Maybe - but given so much of BioSQL uses composite primary keys etc it > was my impression that trying to use Django would be making life > difficult for myself. If you already are familiar with Django, then > perhaps this wouldn't be so bad. Depends again on what you want to do. If you wanted to knock up a quick web-based genome viewer for example you might not need to amend (or even access) the taxon table as it would be pre-populated. And if you REALLY had to, you could just fashion some SQL to do it. >> ... but I would commend Django to anyone using Python who wants an >> ORM - particularly if they are building a dynamic web site! > > I agree but ONLY if you are not trying to use an existing schema with > composite primary keys and/or tables with no primary key. For these > SQLAlchemy seems to be the current best bet with python, leading to > the choice of either TurboGears (which I went for) or Pylons (picked > by Brad). Well, I wouldn't be that prescriptive - I would say people can just use what they feel comfortable with and they can get good results from quickly. I've had good experiences with Django so I wouldn't put people off it just because of the primary key issue which can be partially solved easily enough with a single ALTER TABLE statement :) Cheers, Nick. From biopython at maubp.freeserve.co.uk Fri Nov 28 17:46:01 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 17:46:01 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <493023CA.9030108@bham.ac.uk> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <492FD220.60505@bham.ac.uk> <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com> <49301579.2060700@bham.ac.uk> <320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com> <493023CA.9030108@bham.ac.uk> Message-ID: <320fb6e00811280946r391eaac9q8c54ae6a4a59595c@mail.gmail.com> > Peter wrote: >> I agree but ONLY if you are not trying to use an existing schema with >> composite primary keys and/or tables with no primary key. For these >> SQLAlchemy seems to be the current best bet with python, leading to >> the choice of either TurboGears (which I went for) or Pylons (picked >> by Brad). Nick wrote: > Well, I wouldn't be that prescriptive - I would say people can just use what > they feel comfortable with and they can get good results from quickly. I've > had good experiences with Django so I wouldn't put people off it just > because of the primary key issue which can be partially solved easily enough > with a single ALTER TABLE statement :) Maybe adding a few surrogate primary keys to tables to make Django (or your ORM of choice) happy isn't such a big deal. However, I put a lot of value on the shared standard nature of the BioSQL schema, and prefer to modify it as little as possible - not just in case I break something another software package relies on, but also to reduce long term maintenance and re-installation hassles. In your case you had existing experience with Django, while I had no prior investment in it or any other ORM tool. I can therefore understand your choice - and might even have done the same in your position. I'm not convinced that the BioSQL schema needs to be changed for v1.1.x to help ORM software either (surrogate primary keys on all tables - something mooted on the roadmap). http://www.biosql.org/wiki/Enhancement_Requests Peter From hlapp at gmx.net Fri Nov 28 18:31:55 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 28 Nov 2008 13:31:55 -0500 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> Message-ID: On Nov 28, 2008, at 5:43 AM, Peter wrote: > Why not this: > > CREATE TABLE taxon_name ( > taxon_id INT(10) UNSIGNED NOT NULL, > name VARCHAR(255) BINARY NOT NULL, > name_class VARCHAR(32) BINARY NOT NULL, > PRIMARY KEY (taxon_id,name,name_class) > ) TYPE=INNODB; It's part of the changes planned for the next release indeed. At the time this was written it didn't seem to matter much as they are really semantically equivalent, and ORM tools weren't around much at the time :-) I do hope that no-one is using a dynamically configuring ORM at run time so that this change can be a drop-in replacement that's fully backwards compatible. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Nov 28 18:41:52 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 18:41:52 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> Message-ID: <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com> On Fri, Nov 28, 2008 at 6:31 PM, Hilmar Lapp wrote: > > On Nov 28, 2008, at 5:43 AM, Peter wrote: > >> Why not this: >> >> CREATE TABLE taxon_name ( >> taxon_id INT(10) UNSIGNED NOT NULL, >> name VARCHAR(255) BINARY NOT NULL, >> name_class VARCHAR(32) BINARY NOT NULL, >> PRIMARY KEY (taxon_id,name,name_class) >> ) TYPE=INNODB; > > > It's part of the changes planned for the next release indeed. By next release, do you mean BioSQL v1.0.2 or v1.1.0 here? > At the time this was written it didn't seem to matter much as they are really > semantically equivalent, and ORM tools weren't around much at the time :-) I see - that kind of explains the reason why some tables have explicit composite primary keys, while others just have a unique set of fields. > I do hope that no-one is using a dynamically configuring ORM at run time so > that this change can be a drop-in replacement that's fully backwards > compatible. Some dynamically configuring ORM code would never have coped with these tables in the first place - so it doesn't matter here. In other cases the user can tell the ORM to treat the tuple (taxon_id,name,name_class) as a primary key - and this should still be fine even when this is explicit in the database schema. I expect (and hope) this will be a backwards compatible change. Peter From hlapp at gmx.net Fri Nov 28 18:46:26 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 28 Nov 2008 13:46:26 -0500 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com> Message-ID: <20D70002-B512-4EA0-8755-1CF00310ADC6@gmx.net> On Nov 28, 2008, at 1:41 PM, Peter wrote: >> >> It's part of the changes planned for the next release indeed. > > By next release, do you mean BioSQL v1.0.2 or v1.1.0 here? That would be 1.0.2. Otherwise there would be no need to worry about backward compatibility (as 1.1x won't be by definition). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Fri Nov 28 18:57:40 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 18:57:40 +0000 Subject: [BioSQL-l] BioSQL and ontology "standards". Message-ID: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com> Hi all, The BioSQL schema allows multiple ontologies, so that things like entries in seqfeature_qualifier_value can say when they mean by "locus_tag". Currently BioPerl and Biopython (and I assume the other projects but haven't checked) use a couple of ad-hoc ontology names for storing annotation. In particular, if there is no predefined entry for a novel ontology term, it gets added on the fly. This is very convenient as it means a BioSQL database can be used without first importing a predefined ontology. However there are downsides, for example spelling errors in the keys of a GenBank file get treated as a ontology entries. Have these ad-hoc ontologies ever been defined? i.e. For table bioentry_qualifier_value terms, which ad-hoc ontology name should be used? Biopython uses ad-hoc ontology named 'SeqFeature Keys', 'SeqFeature Sources', 'Annotation Tags' for various different tables (which I believe is the same for BioPerl). On a related point, it might make more sense to use a predefined ontology, like SOFA or SO from http://www.sequenceontology.org/ where a novel term is treated as an error (or perhaps falls back on the ad-hoc ontology). How do the various Bio* projects cope with annotations in the database for different or multiple ontologies? Or has this not been considered? Thanks, Peter From biopython at maubp.freeserve.co.uk Fri Nov 28 20:04:33 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 20:04:33 +0000 Subject: [BioSQL-l] BioSQL and ontology "standards". In-Reply-To: <49304392.4080908@eaglegenomics.com> References: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com> <49304392.4080908@eaglegenomics.com> Message-ID: <320fb6e00811281204i3bae31e4kc18f70121244b4d1@mail.gmail.com> On Fri, Nov 28, 2008 at 7:16 PM, Richard Holland wrote: > > BioJava does what BioPerl does and pretty much makes it up as it goes > along, using whatever the input files tell it. OK, good. But which ontology names do you use for which tables? i.e. Do you also use ad-hoc ontologies named 'SeqFeature Keys', 'SeqFeature Sources' and 'Annotation Tags'? To be a little more specific, here are some examples - which I presume (hope) are all coping BioPerl's conventions. In recording a bioentry date, Biopython sets bioentry_qualifier_value.term_id to point to a term table entry "date_changed" which belongs to the ad-hoc "Annotation Tags" ontology. In recording most bioentry annotations (a list of keywords), Biopython sets bioentry_qualifier_value.term_id to point to a term table entry for that annotation type (e.g. "keywords") which belongs to the ad-hoc "Annotation Tags" ontology. In recording a seqfeature, Biopython sets seqfeature.seqfeature_key_id to point to a term table entry for that feature type (e.g. "CDS", "misc_feature", "gene") which belongs to the ad-hoc "SeqFeature Keys" ontology. Biopython always sets seqfeature.type_term_id to point to a term table entry for "EMBL/GenBank/SwissProt" within the ad-hoc "SeqFeature Sources" ontology. In recording most of a seqfeature's qualifiers (annotations), Biopython sets seqfeature_qualifier_value.term_id to point to a term table entry for the key (e.g. "locus_tag", "note", "translation") which belongs to the ad-hoc "Annotation Tags" ontology. Notice that the ad-hoc "Annotation Tags" ontology serves double duty, doing both bioentry and seqfeature annotations. This doesn't seem entirely sensible. On the other hand, when recording a seqfeature's location Biopython and BioPerl leave location.term_id as NULL (rather than using any particular ontology term). This seems arbitary. Relating to this, if we want to record a composite location type (typically "join"), we'd want to use the location_qualifier_value table. BioPerl seems to leave this table empty (presumably assuming all composite locations are joins) which is what Biopython currently does too. Here we can't just set location_qualifier_value.term_id as NULL (why not?) so we have to introduce something. The BioSQL projects should first agree what ontology term and what ontology this should be stored with. > The trouble with throwing exceptions when things don't meet standards is > that people complain when their custom files don't work, and can't be > made to work without editing the file itself. ... I'm not sure if you are talking about parsing files, or loading them into BioSQL. I agree that when parsing sometimes some leeway is required. In terms of *optionally* enforcing a strict ontology, throwing an error is a good thing if the input file doesn't follow the ontology - this indicates a problem with the file (or perhaps an out of date ontology). I would certainly leave the default behaviour as is with the ad-hoc ontologies extended on the fly. > I think the best approach is to always to use what the file says, and > trust that it's accurate. What needs to be agreed between projects is > any additional annotations that get introduced outside the context of > file parsing, and the names of the ontologies used for the file > annotations so that all projects use the same ontologies and don't > replicate them inside the BioSQL database. It would be nice to > standardise these names and the additional custom terms across the > projects, in much the same way as people tried already to standardise > the way general objects get mapped to BioSQL. This is what I am trying to get at here - documenting the existing "ad hoc" ontology usage. My impression is that it has not been documented, and that the BioPerl behaviour is the defacto BioSQL standard. I'd like to pin down this standard, and extend it for situations like the location_qualifier_value.term_id and perhaps location.term_id where BioPerl seems to ignore the ontology issue. Peter From holland at eaglegenomics.com Fri Nov 28 19:16:34 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 28 Nov 2008 19:16:34 +0000 Subject: [BioSQL-l] BioSQL and ontology "standards". In-Reply-To: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com> References: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com> Message-ID: <49304392.4080908@eaglegenomics.com> BioJava does what BioPerl does and pretty much makes it up as it goes along, using whatever the input files tell it. The trouble with throwing exceptions when things don't meet standards is that people complain when their custom files don't work, and can't be made to work without editing the file itself. By custom I mean not only things they've written themselves, but also files coming from established tools which don't follow the rules (NEXUS format is a classic example of this - the most popular tools that output NEXUS pretty much ignore the format specification). Even the standards providers themselves often don't comply with their own rules (several Genbank examples supplied from NCBI/Entrez break any parser which tries to be completely strict with the declared format). I think the best approach is to always to use what the file says, and trust that it's accurate. What needs to be agreed between projects is any additional annotations that get introduced outside the context of file parsing, and the names of the ontologies used for the file annotations so that all projects use the same ontologies and don't replicate them inside the BioSQL database. It would be nice to standardise these names and the additional custom terms across the projects, in much the same way as people tried already to standardise the way general objects get mapped to BioSQL. cheers, Richard Peter wrote: > Hi all, > > The BioSQL schema allows multiple ontologies, so that things like > entries in seqfeature_qualifier_value can say when they mean by > "locus_tag". > > Currently BioPerl and Biopython (and I assume the other projects but > haven't checked) use a couple of ad-hoc ontology names for storing > annotation. In particular, if there is no predefined entry for a > novel ontology term, it gets added on the fly. This is very > convenient as it means a BioSQL database can be used without first > importing a predefined ontology. However there are downsides, for > example spelling errors in the keys of a GenBank file get treated as a > ontology entries. > > Have these ad-hoc ontologies ever been defined? i.e. For table > bioentry_qualifier_value terms, which ad-hoc ontology name should be > used? Biopython uses ad-hoc ontology named 'SeqFeature Keys', > 'SeqFeature Sources', 'Annotation Tags' for various different tables > (which I believe is the same for BioPerl). > > On a related point, it might make more sense to use a predefined > ontology, like SOFA or SO from http://www.sequenceontology.org/ where > a novel term is treated as an error (or perhaps falls back on the > ad-hoc ontology). How do the various Bio* projects cope with > annotations in the database for different or multiple ontologies? Or > has this not been considered? > > Thanks, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From d.m.a.martin at dundee.ac.uk Tue Nov 25 11:09:08 2008 From: d.m.a.martin at dundee.ac.uk (David Martin) Date: Tue, 25 Nov 2008 11:09:08 +0000 Subject: [BioSQL-l] Passwords on biosql databases Message-ID: <492BDCF2.6F09.00E0.0@dundee.ac.uk> I have set up a biosql database on Postgres. The Bio::DB::BioDB module croaks complaining that it needs the password. I have tried the obvious things (-password -passwd and reading what docs I could find) but to no avail. Any clues? Assuming the database is on postgres and is called biosql with user biosqluser and password biosqlpassword I have been trying: my $dbadp = Bio::DB::BioDB->new(-database => 'biosql', -user => 'biosqluser', -dbname => 'biosql', -host => 'postgres', -passwd=>'biosqlpassword', -driver => 'Pg'); regards ..d David Martin PhD College of Life Sciences University of Dundee The University of Dundee is a Scottish Registered Charity, No. SC015096. The University of Dundee is a registered Scottish charity, No: SC015096 From chapmanb at 50mail.com Tue Nov 25 21:16:22 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 25 Nov 2008 16:16:22 -0500 Subject: [BioSQL-l] Python ORM mapping for BioSQL Message-ID: <20081125211622.GE83220@sobchak.mgh.harvard.edu> Hi Peter; Hope all is going well with you. I was glancing at the BioSQL mailing list archives last night and saw your messages earlier this month about using an ORM mapper with BioSQL. Some of my current work is using a BioSQL storage backend with a javascript web interface. The middleware uses Pylons and SQLAlchemy. This uses some parts of BioSQL not well represented via an object front end like bioentry_relationship, and so it has been convenient to work with these via SQLAlchemy directly. To your initial question, SQLAlchemy can handle those non-primary key tables without a problem by setting "primary_key = True" for all of the unique columns. What I have done thus far is definitely non-complete, and also includes some add-on tables for storing experimental data linked to BioSQL. However, I am attaching it here just to give you an idea (init.py is the __init__.py of the module). You would use it like: from Wherever.BioSQL import get_session, biosql session = get_session("production") entries = session.query(biosql.Bioentry).filter_by(identifier = "A12345") If you, or anyone else, is developing something similar, I'd be happy to help with something generalized. Brad -------------- next part -------------- A non-text attachment was scrubbed... Name: init.py Type: text/x-python Size: 1116 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BioSQL.py Type: text/x-python Size: 6881 bytes Desc: not available URL: