From biopython at maubp.freeserve.co.uk  Thu Nov  6 06:53:13 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 6 Nov 2008 11:53:13 +0000
Subject: [BioSQL-l] Tables without a (composite) primary key
Message-ID: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>

I've recently been looking into some object-relational mappers which
caused me to look more closely at the BioSQL schema.  Many of these
packages require a primary key, but not all can cope with a composite
primary key.  However, some BioSQL tables don't have any primary key
at all.

Several BioSQL tables have composite primary keys, for example the
term_dbxref table has a composite key of (term_id, dbxref_id), and
also an index on dbxref_id as well.

However, some BioSQL tables do not have a primary key, for example:

-- corresponds to the names table of the NCBI taxonomy databaase
CREATE TABLE taxon_name (
       taxon_id		INT(10) UNSIGNED NOT NULL,
       name		VARCHAR(255) BINARY NOT NULL,
       name_class	VARCHAR(32) BINARY NOT NULL,
       UNIQUE (taxon_id,name,name_class)
) TYPE=INNODB;

CREATE INDEX taxnametaxonid ON taxon_name(taxon_id);
CREATE INDEX taxnamename    ON taxon_name(name);

Why don't taxon_name, bioentry_path, term_relationship,
bioentry_qualifier_value, seqfeature_path have a primary key (just a
uniqueness criteria)?

Thanks,

Peter

From biopython at maubp.freeserve.co.uk  Thu Nov  6 07:37:42 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 6 Nov 2008 12:37:42 +0000
Subject: [BioSQL-l] Tables without a (composite) primary key
In-Reply-To: <200811061323.43749.raoul.bonnal@itb.cnr.it>
References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
	<200811061323.43749.raoul.bonnal@itb.cnr.it>
Message-ID: <320fb6e00811060437h367804c7y6d46ed36d1b619ae@mail.gmail.com>

On Thu, Nov 6, 2008 at 12:23 PM, Raoul Jean Pierre Bonnal
<raoul.bonnal at itb.cnr.it> wrote:
>
> Dear Peter,
> I'm writing the wrapper for BioRuby using DataMapper an ORM (Active Record is
> similar).

Hi Raul,

I'm looking at a python based ORM to use with BioSQL.

[The existing Biopython BioSQL bridge uses raw SQL to turn the
sequences and features into Biopython objects - this all seems to work
fine, but it doesn't offer the full flexibility of an ORM framework.]

>
> I think we can cosider to move or branch BioSQL' schema to the approach
> suggested by this kind of ORMs, with a pk for every table named "id" and a
> table name in plural. Fk names are quite correct.
>

I don't think it makes sense to add a single primary key to many of
these tables (e.g. term_dbxref).  The existing composite primary keys
seem fine (its just a shame some ORMs can't cope).

I was thinking the tables currently lacking any primary key could get
one (based on the current UNIQUE rule).  So for example, the
taxon_name could use (taxon_id,name,name_class) as its primary key.  I
don't know how big a change this would be - but superficially it looks
backwards compatible.  This is why I was asking why they didn't have
PK in the first place.

>
> PS: DataMapper handles very well composite PK, much better tha ActiveRecord.
>

For python, currently Django and also I believe SQLObjects don't
support composite primary keys.  I'll take a look at SQLAlchemy next
which should cope better.

Peter

From hlapp at gmx.net  Thu Nov  6 16:39:55 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Nov 2008 16:39:55 -0500
Subject: [BioSQL-l] Tables without a (composite) primary key
In-Reply-To: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
Message-ID: <082DB965-12D9-4941-BF46-E49359C8C096@gmx.net>

Hi Peter,

it's a known enhancement request. I know that some ORMs have trouble  
reverse engineering the mapping if there is no PK defined.

Semantically, however, in the absence of a primary key constraint the  
first unique key constraint is equivalent to a primary key (in fact  
some ER modeling tools will automatically do the conversion); unique  
keys are also called alternate keys (alternate to the primary key).

So for now feel free to either change the UK constraint to PK where  
there is no PK defined and your reverse engineering tool needs it. If  
you don't use a reverse engineering tool, just set the columns of the  
UK constraint as the compound primary key if there isn't a surrogate PK.

BioSQL 1.1+ will have surrogate PKs on all tables, but this change may  
not be backwards compatible for existing language bindings, which is  
why I'd like to make those changes first that should be fully  
backwards compatible.

	-hilmar

On Nov 6, 2008, at 6:53 AM, Peter wrote:

> I've recently been looking into some object-relational mappers which
> caused me to look more closely at the BioSQL schema.  Many of these
> packages require a primary key, but not all can cope with a composite
> primary key.  However, some BioSQL tables don't have any primary key
> at all.
>
> Several BioSQL tables have composite primary keys, for example the
> term_dbxref table has a composite key of (term_id, dbxref_id), and
> also an index on dbxref_id as well.
>
> However, some BioSQL tables do not have a primary key, for example:
>
> -- corresponds to the names table of the NCBI taxonomy databaase
> CREATE TABLE taxon_name (
>       taxon_id		INT(10) UNSIGNED NOT NULL,
>       name		VARCHAR(255) BINARY NOT NULL,
>       name_class	VARCHAR(32) BINARY NOT NULL,
>       UNIQUE (taxon_id,name,name_class)
> ) TYPE=INNODB;
>
> CREATE INDEX taxnametaxonid ON taxon_name(taxon_id);
> CREATE INDEX taxnamename    ON taxon_name(name);
>
> Why don't taxon_name, bioentry_path, term_relationship,
> bioentry_qualifier_value, seqfeature_path have a primary key (just a
> uniqueness criteria)?
>
> Thanks,
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Thu Nov  6 17:12:28 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 6 Nov 2008 22:12:28 +0000
Subject: [BioSQL-l] Tables without a (composite) primary key
In-Reply-To: <082DB965-12D9-4941-BF46-E49359C8C096@gmx.net>
References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
	<082DB965-12D9-4941-BF46-E49359C8C096@gmx.net>
Message-ID: <320fb6e00811061412s7516b573uf26d5042e193fa45@mail.gmail.com>

On Thu, Nov 6, 2008 at 9:39 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> Hi Peter,
>
> it's a known enhancement request. I know that some ORMs have trouble reverse
> engineering the mapping if there is no PK defined.

Oh right, "Surrogate primary keys on all tables" on this page:
http://www.biosql.org/wiki/Enhancement_Requests

> Semantically, however, in the absence of a primary key constraint the first
> unique key constraint is equivalent to a primary key (in fact some ER
> modeling tools will automatically do the conversion); unique keys are also
> called alternate keys (alternate to the primary key).
>
> So for now feel free to either change the UK constraint to PK where there is
> no PK defined and your reverse engineering tool needs it. If you don't use a
> reverse engineering tool, just set the columns of the UK constraint as the
> compound primary key if there isn't a surrogate PK.

OK - I'll bear that in mind.

> BioSQL 1.1+ will have surrogate PKs on all tables, but this change may not
> be backwards compatible for existing language bindings, which is why I'd
> like to make those changes first that should be fully backwards compatible.

That sounds sensible.

Thanks Hilmar!

Peter

P.S. Is there any agreed terminology: compound primary key versus
composite primary key?

From raoul.bonnal at itb.cnr.it  Thu Nov  6 07:23:43 2008
From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal)
Date: Thu, 06 Nov 2008 13:23:43 +0100
Subject: [BioSQL-l] Tables without a (composite) primary key
In-Reply-To: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
Message-ID: <200811061323.43749.raoul.bonnal@itb.cnr.it>

Dear Peter, 
I'm writing the wrapper for BioRuby using DataMapper an ORM (Active Record is 
similar).

I think we can cosider to move or branch BioSQL' schema to the approach 
suggested by this kind of ORMs, with a pk for every table named "id" and a 
table name in plural. Fk names are quite correct.

PS: DataMapper handles very well composite PK, much better tha ActiveRecord.

Il gioved? 06 novembre 2008 12:53:13 Peter ha scritto:
> I've recently been looking into some object-relational mappers which
> caused me to look more closely at the BioSQL schema.  Many of these
> packages require a primary key, but not all can cope with a composite
> primary key.  However, some BioSQL tables don't have any primary key
> at all.
>
> Several BioSQL tables have composite primary keys, for example the
> term_dbxref table has a composite key of (term_id, dbxref_id), and
> also an index on dbxref_id as well.
>
> However, some BioSQL tables do not have a primary key, for example:
>
> -- corresponds to the names table of the NCBI taxonomy databaase
> CREATE TABLE taxon_name (
>        taxon_id		INT(10) UNSIGNED NOT NULL,
>        name		VARCHAR(255) BINARY NOT NULL,
>        name_class	VARCHAR(32) BINARY NOT NULL,
>        UNIQUE (taxon_id,name,name_class)
> ) TYPE=INNODB;
>
> CREATE INDEX taxnametaxonid ON taxon_name(taxon_id);
> CREATE INDEX taxnamename    ON taxon_name(name);
>
> Why don't taxon_name, bioentry_path, term_relationship,
> bioentry_qualifier_value, seqfeature_path have a primary key (just a
> uniqueness criteria)?
>
> Thanks,
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l


From mark.schreiber at novartis.com  Thu Nov  6 21:07:12 2008
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 7 Nov 2008 10:07:12 +0800
Subject: [BioSQL-l] Tables without a (composite) primary key
In-Reply-To: <200811061323.43749.raoul.bonnal@itb.cnr.it>
Message-ID: <OFF57EB0F7.4AC79B77-ON482574FA.000B2306-482574FA.000BA587@ah.novartis.com>

Hi -

In the Java JPA it is possible to use an embedded object as a primary key. 
This gets you around the situations where the primary key is composite. It 
also effectively gets you around those tables where there is no key but it 
is implicit as all the fields are unique (as in taxon_name).

What you end up with is an object that holds taxon_id, name, name_class, 
and inside that object you have an embedded key object that contains the 
same three fields. In this way any changes that are made to the object are 
still associated with the original row via the unchanged embedded PK 
object and are updated accordingly.

While I agree that an explicit PK's for all BioSQL tables would be nicer 
for ORM frameworks many frameworks have ways to get around this, possibly 
those in Ruby or Python do as well.

- Mark


biosql-l-bounces at lists.open-bio.org wrote on 11/06/2008 08:23:43 PM:

> Dear Peter, 
> I'm writing the wrapper for BioRuby using DataMapper an ORM (Active 
Record is 
> similar).
> 
> I think we can cosider to move or branch BioSQL' schema to the approach 
> suggested by this kind of ORMs, with a pk for every table named "id" and 
a 
> table name in plural. Fk names are quite correct.
> 
> PS: DataMapper handles very well composite PK, much better tha 
ActiveRecord.
> 
> Il gioved? 06 novembre 2008 12:53:13 Peter ha scritto:
> > I've recently been looking into some object-relational mappers which
> > caused me to look more closely at the BioSQL schema.  Many of these
> > packages require a primary key, but not all can cope with a composite
> > primary key.  However, some BioSQL tables don't have any primary key
> > at all.
> >
> > Several BioSQL tables have composite primary keys, for example the
> > term_dbxref table has a composite key of (term_id, dbxref_id), and
> > also an index on dbxref_id as well.
> >
> > However, some BioSQL tables do not have a primary key, for example:
> >
> > -- corresponds to the names table of the NCBI taxonomy databaase
> > CREATE TABLE taxon_name (
> >        taxon_id      INT(10) UNSIGNED NOT NULL,
> >        name      VARCHAR(255) BINARY NOT NULL,
> >        name_class   VARCHAR(32) BINARY NOT NULL,
> >        UNIQUE (taxon_id,name,name_class)
> > ) TYPE=INNODB;
> >
> > CREATE INDEX taxnametaxonid ON taxon_name(taxon_id);
> > CREATE INDEX taxnamename    ON taxon_name(name);
> >
> > Why don't taxon_name, bioentry_path, term_relationship,
> > bioentry_qualifier_value, seqfeature_path have a primary key (just a
> > uniqueness criteria)?
> >
> > Thanks,
> >
> > Peter
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l
> 
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.


From biopython at maubp.freeserve.co.uk  Fri Nov  7 13:35:31 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 7 Nov 2008 18:35:31 +0000
Subject: [BioSQL-l] Tables without a (composite) primary key
In-Reply-To: <320fb6e00811061412s7516b573uf26d5042e193fa45@mail.gmail.com>
References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
	<082DB965-12D9-4941-BF46-E49359C8C096@gmx.net>
	<320fb6e00811061412s7516b573uf26d5042e193fa45@mail.gmail.com>
Message-ID: <320fb6e00811071035y496ea4d8p93aa0f54633950f@mail.gmail.com>

I've ruled out using Django v1.0 and the current version of SQLObjects
with BioSQL as they don't (yet) support composite primary keys.
However, SQLAlchemy 0.5.0 seems to be happy with the current BioSQL
schema as is :)

http://www.djangoproject.com/  http://www.sqlobject.org/
http://www.sqlalchemy.org/

Hilmar:
>> BioSQL 1.1+ will have surrogate PKs on all tables, but this change may not
>> be backwards compatible for existing language bindings, which is why I'd
>> like to make those changes first that should be fully backwards compatible.

Peter:
> That sounds sensible.

Actually I may have initially misunderstood you.

Are you saying for tables which already have a composite primary key
(e.g. term_dbxref) you plan to add/replace this with a surrogate
(single column) PK - just to accommodate certain simplistic ORMs?  I'm
not so keen on this, it seems like an invasive change with little
benefit, but potentially making lots of work updating the Bio*
bindings.

However, would the smaller step of adding composite primary keys to
tables currently lacking them be possible on the BioSQL v1.0.x
roadmap? e.g. for taxon_name using (taxon_id,name,name_class) as the
composite primary key, currently specified to be unique. Or might this
also cause trouble for the Bio* binding?  If this was possible, it
*might* be useful for certain ORMs.

Was there a reason why tables like taxon_name never had a
(composite/compound) primary key in the first place?

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 14 15:48:02 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 14 Nov 2008 20:48:02 +0000
Subject: [BioSQL-l] parent_taxon_id of a root node
In-Reply-To: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com>
References: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com>
Message-ID: <320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com>

On Fri, Oct 3, 2008m, I wrote:
>
> Hello all,
>
> I was puzzled to find the BioSQL script load_ncbi_taxonomy.pl will set
> the parent_taxon_id of the NCBI root node in the taxon table to point
> to itself.  I would have expected this to be NULL indicating no
> parent.  If someone is using the database directly, extracting a
> lineage could trigger an infinite loop.  Can anyone explain the
> rational here?
>
> Note that when Biopython adds entries to the taxon table, it uses NULL
> for a root node.  When retrieving sequences from a BioSQL database,
> Biopython does cope with a root node with a NULL parent or a
> self-parent - would it safe to assume BioPerl and Java can also cope
> with both situations?
>
> Thanks,
>
> Peter
>

Hi again,

I thought I'd raise this question again (as I didn't see any response
last time), as I've just been bitten by the self-parent taxon problem
this afternoon.  This was for a simple webfront end to part of a
BioSQL database using SQLAlchemy in python - but that's not important.

I was using a simple loop to build up lineages, which was working fine
until I ran load_ncbi_taxonomy.pl and suddenly my program seemed to
just time out.  I'd forgotten about the self-parent root nodes used by
load_ncbi_taxonomy.pl which had triggered an infinite loop.

I hit another (less serious) problem stemming for these self-parent
root nodes when I wanted to generate a list of sub-lineages (child
entries), essentially:

SELECT * FROM taxon WHERE parent_taxon_id=12345;

When calling this on a root node, I had to modify this to explicitly
exclude itself from the children:

SELECT * FROM taxon WHERE parent_taxon_id=12345 AND taxon_id<>12345;

So to repeat my earlier question, is there a reason why
parent_taxon_id isn't just NULL for root nodes?  Was this a deliberate
design choice - because if not, I think this could be regarded as a
bug in  load_ncbi_taxonomy.pl.

Thanks

Peter

From hlapp at gmx.net  Sat Nov 15 13:34:45 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 15 Nov 2008 13:34:45 -0500
Subject: [BioSQL-l] parent_taxon_id of a root node
In-Reply-To: <320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com>
References: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com>
	<320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com>
Message-ID: <F41D532F-A9AE-427A-B10F-2160FF51E12F@gmx.net>

Sorry Peter - it looks like this slipped my attention (Oct was crazy).  
Thanks for raising it again. I agree with you, this looks like a bug.  
Would you mind filing it?

It's possible that has secretly been assumed as policy and hence led  
to some people identifying the root node by equating parent and  
taxon_id, but surely this sounds like the wrong way of doing it, so it  
deserves fixing.

	-hilmar

On Nov 14, 2008, at 3:48 PM, Peter wrote:

> On Fri, Oct 3, 2008m, I wrote:
>>
>> Hello all,
>>
>> I was puzzled to find the BioSQL script load_ncbi_taxonomy.pl will  
>> set
>> the parent_taxon_id of the NCBI root node in the taxon table to point
>> to itself.  I would have expected this to be NULL indicating no
>> parent.  If someone is using the database directly, extracting a
>> lineage could trigger an infinite loop.  Can anyone explain the
>> rational here?
>>
>> Note that when Biopython adds entries to the taxon table, it uses  
>> NULL
>> for a root node.  When retrieving sequences from a BioSQL database,
>> Biopython does cope with a root node with a NULL parent or a
>> self-parent - would it safe to assume BioPerl and Java can also cope
>> with both situations?
>>
>> Thanks,
>>
>> Peter
>>
>
> Hi again,
>
> I thought I'd raise this question again (as I didn't see any response
> last time), as I've just been bitten by the self-parent taxon problem
> this afternoon.  This was for a simple webfront end to part of a
> BioSQL database using SQLAlchemy in python - but that's not important.
>
> I was using a simple loop to build up lineages, which was working fine
> until I ran load_ncbi_taxonomy.pl and suddenly my program seemed to
> just time out.  I'd forgotten about the self-parent root nodes used by
> load_ncbi_taxonomy.pl which had triggered an infinite loop.
>
> I hit another (less serious) problem stemming for these self-parent
> root nodes when I wanted to generate a list of sub-lineages (child
> entries), essentially:
>
> SELECT * FROM taxon WHERE parent_taxon_id=12345;
>
> When calling this on a root node, I had to modify this to explicitly
> exclude itself from the children:
>
> SELECT * FROM taxon WHERE parent_taxon_id=12345 AND taxon_id<>12345;
>
> So to repeat my earlier question, is there a reason why
> parent_taxon_id isn't just NULL for root nodes?  Was this a deliberate
> design choice - because if not, I think this could be regarded as a
> bug in  load_ncbi_taxonomy.pl.
>
> Thanks
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Sun Nov 16 09:58:20 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 16 Nov 2008 14:58:20 +0000
Subject: [BioSQL-l] parent_taxon_id of a root node
In-Reply-To: <F41D532F-A9AE-427A-B10F-2160FF51E12F@gmx.net>
References: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com>
	<320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com>
	<F41D532F-A9AE-427A-B10F-2160FF51E12F@gmx.net>
Message-ID: <320fb6e00811160658s6282022by3681364e14aecf69@mail.gmail.com>

On Sat, Nov 15, 2008 at 6:34 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> Sorry Peter - it looks like this slipped my attention (Oct was crazy).
> Thanks for raising it again. I agree with you, this looks like a bug. Would
> you mind filing it?

Sure,
http://bugzilla.open-bio.org/show_bug.cgi?id=2664

> It's possible that has secretly been assumed as policy and hence led to some
> people identifying the root node by equating parent and taxon_id, but surely
> this sounds like the wrong way of doing it, so it deserves fixing.

In the short term we should just make sure all the Bio* projects can
cope with either style root node (Biopython can), but in the long term
are self parent taxon entries something that could be banned via the
schema?

Regards,

Peter

From biopython at maubp.freeserve.co.uk  Wed Nov 26 13:37:51 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 26 Nov 2008 18:37:51 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>

On Tue, Nov 25, 2008 at 9:16 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi Peter;
> Hope all is going well with you. I was glancing at the BioSQL
> mailing list archives last night and saw your messages earlier this
> month about using an ORM mapper with BioSQL.
>
> Some of my current work is using a BioSQL storage backend with a
> javascript web interface. The middleware uses Pylons and SQLAlchemy.
> This uses some parts of BioSQL not well represented via an object front
> end like bioentry_relationship, and so it has been convenient to work
> with these via SQLAlchemy directly.

I've been using TurboGears with SQLAlchemy, and so far everything has
been OK.  This was essentially independent of the Biopython BioSQL
mapping.

> To your initial question, SQLAlchemy can handle those non-primary key
> tables without a problem by setting "primary_key = True" for all of
> the unique columns.

Yes, SQLAlchemy seems pretty good.  The only catch was for a table
with no primary key defined at all (the taxon_name table) which
required a little more work setting up the ORM mapping, but which also
seems to work fine.

> What I have done thus far is definitely non-complete, and also
> includes some add-on tables for storing experimental data linked to
> BioSQL. However, I am attaching it here just to give you an idea
> (init.py is the __init__.py of the module). You would use it like:
>
> from Wherever.BioSQL import get_session, biosql
>
> session = get_session("production")
> entries = session.query(biosql.Bioentry).filter_by(identifier = "A12345")
>
> If you, or anyone else, is developing something similar,
> I'd be happy to help with something generalized.
>
> Brad
>

I'll probably take a look at this next week - I'm on the road at the
moment.  Thanks for sharing,

Peter

From hlapp at gmx.net  Wed Nov 26 14:28:16 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 26 Nov 2008 14:28:16 -0500
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
Message-ID: <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>


On Nov 26, 2008, at 1:37 PM, Peter wrote:

>> To your initial question, SQLAlchemy can handle those non-primary key
>> tables without a problem by setting "primary_key = True" for all of
>> the unique columns.
>
> Yes, SQLAlchemy seems pretty good.  The only catch was for a table
> with no primary key defined at all (the taxon_name table) which
> required a little more work setting up the ORM mapping, but which also
> seems to work fine.


It has one unique key defined on (name, name_class, taxon_id). Is that  
not what you are seeing?

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From gabrielle_doan at gmx.net  Thu Nov 27 08:51:16 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Thu, 27 Nov 2008 14:51:16 +0100
Subject: [BioSQL-l] BioSQL schema of v1.0.1
Message-ID: <492EA5D4.7090809@gmx.net>

Hi Hilmar,

recently I've noticed that BioSQL has a new release. But in the BioSQL 
core schema, Release v1.0.1 there is still the BioSQL schema of v1.0.0 
inside. Can you update it please?
Thanks a lot.

Cheers,
Gabrielle

From hlapp at gmx.net  Thu Nov 27 12:33:25 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 27 Nov 2008 12:33:25 -0500
Subject: [BioSQL-l] BioSQL schema of v1.0.1
In-Reply-To: <492EA5D4.7090809@gmx.net>
References: <492EA5D4.7090809@gmx.net>
Message-ID: <3ED93BDC-CD08-4589-A881-5E0CC28EFAA9@gmx.net>

Hi Gabrielle,

I'm not sure what you mean. The changes that v1.0.1 introduces are  
also in the main trunk, the biosql-release-1_0_1 tag, and in the  
distribution downloadable from the website.

It's easily possible though that I may have overlooked something -  
could you elaborate what prompted you to your conclusion below?

	-hilmar

On Nov 27, 2008, at 8:51 AM, Gabrielle Doan wrote:

> Hi Hilmar,
>
> recently I've noticed that BioSQL has a new release. But in the  
> BioSQL core schema, Release v1.0.1 there is still the BioSQL schema  
> of v1.0.0 inside. Can you update it please?
> Thanks a lot.
>
> Cheers,
> Gabrielle
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Thu Nov 27 13:07:30 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 27 Nov 2008 13:07:30 -0500
Subject: [BioSQL-l] BioSQL schema of v1.0.1
In-Reply-To: <492EDD4C.3090601@gmx.net>
References: <492EA5D4.7090809@gmx.net>
	<3ED93BDC-CD08-4589-A881-5E0CC28EFAA9@gmx.net>
	<492EDD4C.3090601@gmx.net>
Message-ID: <605BB761-BBFC-4683-AEBA-1F1BAB22617A@gmx.net>

Ahh - I see. The schema has not changed from 1.0.0 to 1.0.1 in terms  
of entity structure and their relations. The only schema changes were  
to widen the column width of bioentry.accession and dbxref.accession.  
A complete list of changes is in the file Changes in the root  
directory of the distribution.

So the schema diagram and all documentation still fully apply, except  
for the width of those two columns, which is necessary to deal with  
certain annotations such as pathways.

Does that resolve the issue for you?

	-hilmar

On Nov 27, 2008, at 12:47 PM, Gabrielle Doan wrote:

> Hi Hilmar,
>
> I downloaded v1.0.1 from the website. When I looked throught the doc  
> folder I found a pdf which describes the BioSQL schema v1.0 and not  
> v1.0.1. Was the schema v1.0.1. left off intentionally?
> I'm grateful if you can give me a reply.
>
> cheers,
> Gabrielle
>
> Hilmar Lapp schrieb:
>> Hi Gabrielle,
>> I'm not sure what you mean. The changes that v1.0.1 introduces are  
>> also in the main trunk, the biosql-release-1_0_1 tag, and in the  
>> distribution downloadable from the website.
>> It's easily possible though that I may have overlooked something -  
>> could you elaborate what prompted you to your conclusion below?
>>    -hilmar
>> On Nov 27, 2008, at 8:51 AM, Gabrielle Doan wrote:
>>> Hi Hilmar,
>>>
>>> recently I've noticed that BioSQL has a new release. But in the  
>>> BioSQL core schema, Release v1.0.1 there is still the BioSQL  
>>> schema of v1.0.0 inside. Can you update it please?
>>> Thanks a lot.
>>>
>>> Cheers,
>>> Gabrielle
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Fri Nov 28 05:43:01 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 10:43:01 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
Message-ID: <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>

On Wed, Nov 26, 2008 at 7:28 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> On Nov 26, 2008, at 1:37 PM, Peter wrote:
>> Yes, SQLAlchemy seems pretty good.  The only catch was for a table
>> with no primary key defined at all (the taxon_name table) which
>> required a little more work setting up the ORM mapping, but which also
>> seems to work fine.
>
> It has one unique key defined on (name, name_class, taxon_id). Is that not
> what you are seeing?
>
>        -hilmar

According to the MySQL schema, taxon_name has a unique restraint but
does NOT have a primary key:

CREATE TABLE taxon_name (
       taxon_id		INT(10) UNSIGNED NOT NULL,
       name		VARCHAR(255) BINARY NOT NULL,
       name_class	VARCHAR(32) BINARY NOT NULL,
       UNIQUE (taxon_id,name,name_class)
) TYPE=INNODB;

As you said, since (taxon_id,name,name_class) is unique, this tuple
can be used as a substitute primary key in the ORM mapping (which for
SQLAlchemy I seem to have to do manually).  SQLAlchemy would do this
automatically if the schema actually used (taxon_id,name,name_class)
as a primary key explicitly.  i.e. Why not this:

CREATE TABLE taxon_name (
       taxon_id		INT(10) UNSIGNED NOT NULL,
       name		VARCHAR(255) BINARY NOT NULL,
       name_class	VARCHAR(32) BINARY NOT NULL,
       PRIMARY KEY (taxon_id,name,name_class)
) TYPE=INNODB;

See also this thread where I wrote:
http://lists.open-bio.org/pipermail/biosql-l/2008-November/001386.html
> Was there a reason why tables like taxon_name never had a
> (composite/compound) primary key in the first place?

Thanks,

Peter

From n.j.loman at bham.ac.uk  Fri Nov 28 06:12:32 2008
From: n.j.loman at bham.ac.uk (Nick Loman)
Date: Fri, 28 Nov 2008 11:12:32 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
Message-ID: <492FD220.60505@bham.ac.uk>

Peter wrote:

>>> Yes, SQLAlchemy seems pretty good.  The only catch was for a table
>>> with no primary key defined at all (the taxon_name table) which
>>> required a little more work setting up the ORM mapping, but which also
>>> seems to work fine.
>> It has one unique key defined on (name, name_class, taxon_id). Is that not
>> what you are seeing?
>>
>>        -hilmar
> 
> According to the MySQL schema, taxon_name has a unique restraint but
> does NOT have a primary key:

Just to say I've had good results using the Django 
(http://www.djangoproject.com) ORM system with a BioSQL database.

You can get going quite quickly using the Django introspection feature 
(configure your settings.py file and run python manage.py inspectdb to 
get a models file).

However, the version of Django I use (not sure about latest) didn't 
support multi-column indexes as primary key, so I had to add another 
auto_increment column to use as a primary key.

Cheers,

Nick.


From biopython at maubp.freeserve.co.uk  Fri Nov 28 06:55:22 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 11:55:22 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <492FD220.60505@bham.ac.uk>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
	<492FD220.60505@bham.ac.uk>
Message-ID: <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>

Nick Loman wrote:
>
>> According to the MySQL schema, taxon_name has a unique restraint but
>> does NOT have a primary key:
>
> Just to say I've had good results using the Django
> (http://www.djangoproject.com) ORM system with a BioSQL database.
>
> You can get going quite quickly using the Django introspection feature
> (configure your settings.py file and run python manage.py inspectdb to get a
> models file).
>
> However, the version of Django I use (not sure about latest) didn't support
> multi-column indexes as primary key, so I had to add another auto_increment
> column to use as a primary key.

When I started looking at web-frameworks and ORM, I didn't want to
modify a perfectly good schema (BioSQL) just to cope with a limited
tool.

I investigated Django earlier this month, and rejected it because it
doesn't yet support multi-column indices as primary keys.  They've had
an open bug on this for 3 years, with no expected date yet:
http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys
http://code.djangoproject.com/ticket/373

My impression is that Django's philosophy is that they expect you to
define your objects which then automatically defines the database
schema.  Note the title of this FAQ page refers to an existing schema
as a "legacy" database:
http://docs.djangoproject.com/en/dev/howto/legacy-databases/
If Django can cope with an existing schema, then it does look like an
excellent package, and seems well documented.

I also rejected another python ORM system, SQLObjects, on similar
grounds.  Their documentation says "SQLObject does not support primary
keys made up of multiple columns (that probably won't change)".  In
fact, they currently are even less flexible in that SQLObject requires
an *integer* primary key on each table!

This left SQLAlchemy as the remaining python ORM candidate, which
seems to cope just fine with the unmodified BioSQL schema.  Brad and I
have reported using BioSQL with SQLAlchemy successfully (within the
python web-frameworks TurboGears and Pylons respectively).

Peter

From raoul.bonnal at itb.cnr.it  Fri Nov 28 08:51:02 2008
From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal)
Date: Fri, 28 Nov 2008 14:51:02 +0100
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<492FD220.60505@bham.ac.uk>
	<320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>
Message-ID: <200811281451.03433.raoul.bonnal@itb.cnr.it>

Il venerd? 28 novembre 2008 12:55:22 Peter ha scritto:
> Nick Loman wrote:....
> My impression is that Django's philosophy is that they expect you to
> define your objects which then automatically defines the database
> schema.  Note the title of this FAQ page refers to an existing schema
> as a "legacy" database:
> http://docs.djangoproject.com/en/dev/howto/legacy-databases/
> If Django can cope with an existing schema, then it does look like an
> excellent package, and seems well documented.
You are right.This, philosophy, is the same of others ORM like ActiveRecord 
and DataMapper. In Ruby I prefer DataMapper because is simpler and 
configurable than AR. Usually they have particular conventions and 
requirements. So choose the one which fits best with BioSQL schema without  
modify the schema. 
I wasted a lot of time digging into the API to understand how ORM handles 
relationships.

Last, check if your ORM can handle transactions for free.

Ciao.

--
Ra


From n.j.loman at bham.ac.uk  Fri Nov 28 10:59:53 2008
From: n.j.loman at bham.ac.uk (Nick Loman)
Date: Fri, 28 Nov 2008 15:59:53 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>	
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>	
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>	
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>	
	<492FD220.60505@bham.ac.uk>
	<320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>
Message-ID: <49301579.2060700@bham.ac.uk>

Peter wrote:

>> However, the version of Django I use (not sure about latest) didn't support
>> multi-column indexes as primary key, so I had to add another auto_increment
>> column to use as a primary key.
> 
> When I started looking at web-frameworks and ORM, I didn't want to
> modify a perfectly good schema (BioSQL) just to cope with a limited
> tool.

Fair enough.

> I investigated Django earlier this month, and rejected it because it
> doesn't yet support multi-column indices as primary keys.  They've had
> an open bug on this for 3 years, with no expected date yet:
> http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys
> http://code.djangoproject.com/ticket/373
> 
> My impression is that Django's philosophy is that they expect you to
> define your objects which then automatically defines the database
> schema.  Note the title of this FAQ page refers to an existing schema
> as a "legacy" database:
> http://docs.djangoproject.com/en/dev/howto/legacy-databases/
> If Django can cope with an existing schema, then it does look like an
> excellent package, and seems well documented.

You can still use Django even if you don't want to modify your database, 
with the caveat that certain functions (e.g. adding a new taxon via the 
ORM) will not work correctly. If you are just querying data that still 
might be sufficiently useful.

And Django will let you fall back to raw SQL if you should need to at 
any point.

I personally was extremely skeptical about an ORM because of the added 
level of complexity, and sometimes difficulty understanding the 
relationship between the generated models and the underlying database. 
However, Django (like Python's) basic principles of DRY and "don't do 
anything magic" mean that I find the results acceptable enough for my 
applications.

I wouldn't make an argument to change BioSQL to suit Django, but I would 
commend Django to anyone using Python who wants an ORM - particularly if 
they are building a dynamic web site!

Cheers,

Nick.


From biopython at maubp.freeserve.co.uk  Fri Nov 28 11:57:47 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 16:57:47 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <49301579.2060700@bham.ac.uk>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
	<492FD220.60505@bham.ac.uk>
	<320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>
	<49301579.2060700@bham.ac.uk>
Message-ID: <320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com>

>> I investigated Django earlier this month, and rejected it because it
>> doesn't yet support multi-column indices as primary keys.  They've had
>> an open bug on this for 3 years, with no expected date yet:
>> http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys
>> http://code.djangoproject.com/ticket/373
>>
>> My impression is that Django's philosophy is that they expect you to
>> define your objects which then automatically defines the database
>> schema.  Note the title of this FAQ page refers to an existing schema
>> as a "legacy" database:
>> http://docs.djangoproject.com/en/dev/howto/legacy-databases/
>> If Django can cope with an existing schema, then it does look like an
>> excellent package, and seems well documented.
>
> You can still use Django even if you don't want to modify your database,
> with the caveat that certain functions (e.g. adding a new taxon via the ORM)
> will not work correctly. If you are just querying data that still might be
> sufficiently useful.

Maybe - but given so much of BioSQL uses composite primary keys etc it
was my impression that trying to use Django would be making life
difficult for myself.  If you already are familiar with Django, then
perhaps this wouldn't be so bad.

> I wouldn't make an argument to change BioSQL to suit Django, ...

Agreed.

> ... but I would commend Django to anyone using Python who wants an
> ORM - particularly if they are building a dynamic web site!

I agree but ONLY if you are not trying to use an existing schema with
composite primary keys and/or tables with no primary key.  For these
SQLAlchemy seems to be the current best bet with python, leading to
the choice of either TurboGears (which I went for) or Pylons (picked
by Brad).

Peter

From n.j.loman at bham.ac.uk  Fri Nov 28 12:00:58 2008
From: n.j.loman at bham.ac.uk (Nick Loman)
Date: Fri, 28 Nov 2008 17:00:58 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>	
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>	
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>	
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>	
	<492FD220.60505@bham.ac.uk>	
	<320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>	
	<49301579.2060700@bham.ac.uk>
	<320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com>
Message-ID: <493023CA.9030108@bham.ac.uk>

Peter wrote:

>> You can still use Django even if you don't want to modify your database,
>> with the caveat that certain functions (e.g. adding a new taxon via the ORM)
>> will not work correctly. If you are just querying data that still might be
>> sufficiently useful.
> 
> Maybe - but given so much of BioSQL uses composite primary keys etc it
> was my impression that trying to use Django would be making life
> difficult for myself.  If you already are familiar with Django, then
> perhaps this wouldn't be so bad.

Depends again on what you want to do. If you wanted to knock up a quick 
web-based genome viewer for example you might not need to amend (or even 
access) the taxon table as it would be pre-populated. And if you REALLY 
had to, you could just fashion some SQL to do it.

>> ... but I would commend Django to anyone using Python who wants an
>> ORM - particularly if they are building a dynamic web site!
> 
> I agree but ONLY if you are not trying to use an existing schema with
> composite primary keys and/or tables with no primary key.  For these
> SQLAlchemy seems to be the current best bet with python, leading to
> the choice of either TurboGears (which I went for) or Pylons (picked
> by Brad).

Well, I wouldn't be that prescriptive - I would say people can just use 
what they feel comfortable with and they can get good results from 
quickly. I've had good experiences with Django so I wouldn't put people 
off it just because of the primary key issue which can be partially 
solved easily enough with a single ALTER TABLE statement :)

Cheers,

Nick.

From biopython at maubp.freeserve.co.uk  Fri Nov 28 12:46:01 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 17:46:01 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <493023CA.9030108@bham.ac.uk>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
	<492FD220.60505@bham.ac.uk>
	<320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>
	<49301579.2060700@bham.ac.uk>
	<320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com>
	<493023CA.9030108@bham.ac.uk>
Message-ID: <320fb6e00811280946r391eaac9q8c54ae6a4a59595c@mail.gmail.com>

> Peter wrote:
>> I agree but ONLY if you are not trying to use an existing schema with
>> composite primary keys and/or tables with no primary key.  For these
>> SQLAlchemy seems to be the current best bet with python, leading to
>> the choice of either TurboGears (which I went for) or Pylons (picked
>> by Brad).

Nick wrote:
> Well, I wouldn't be that prescriptive - I would say people can just use what
> they feel comfortable with and they can get good results from quickly. I've
> had good experiences with Django so I wouldn't put people off it just
> because of the primary key issue which can be partially solved easily enough
> with a single ALTER TABLE statement :)

Maybe adding a few surrogate primary keys to tables to make Django (or
your ORM of choice) happy isn't such a big deal.  However, I put a lot
of value on the shared standard nature of the BioSQL schema, and
prefer to modify it as little as possible - not just in case I break
something another software package relies on, but also to reduce long
term maintenance and re-installation hassles.

In your case you had existing experience with Django, while I had no
prior investment in it or any other ORM tool.  I can therefore
understand your choice - and might even have done the same in your
position.

I'm not convinced that the BioSQL schema needs to be changed for
v1.1.x to help ORM software either (surrogate primary keys on all
tables - something mooted on the roadmap).
http://www.biosql.org/wiki/Enhancement_Requests

Peter

From hlapp at gmx.net  Fri Nov 28 13:31:55 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 28 Nov 2008 13:31:55 -0500
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
Message-ID: <C0450178-EE82-4EC8-B664-D3CC14B069AA@gmx.net>


On Nov 28, 2008, at 5:43 AM, Peter wrote:

> Why not this:
>
> CREATE TABLE taxon_name (
>       taxon_id		INT(10) UNSIGNED NOT NULL,
>       name		VARCHAR(255) BINARY NOT NULL,
>       name_class	VARCHAR(32) BINARY NOT NULL,
>       PRIMARY KEY (taxon_id,name,name_class)
> ) TYPE=INNODB;


It's part of the changes planned for the next release indeed. At the  
time this was written it didn't seem to matter much as they are really  
semantically equivalent, and ORM tools weren't around much at the  
time :-)

I do hope that no-one is using a dynamically configuring ORM at run  
time so that this change can be a drop-in replacement that's fully  
backwards compatible.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Fri Nov 28 13:41:52 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 18:41:52 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <C0450178-EE82-4EC8-B664-D3CC14B069AA@gmx.net>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
	<C0450178-EE82-4EC8-B664-D3CC14B069AA@gmx.net>
Message-ID: <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com>

On Fri, Nov 28, 2008 at 6:31 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Nov 28, 2008, at 5:43 AM, Peter wrote:
>
>> Why not this:
>>
>> CREATE TABLE taxon_name (
>>      taxon_id          INT(10) UNSIGNED NOT NULL,
>>      name              VARCHAR(255) BINARY NOT NULL,
>>      name_class        VARCHAR(32) BINARY NOT NULL,
>>      PRIMARY KEY (taxon_id,name,name_class)
>> ) TYPE=INNODB;
>
>
> It's part of the changes planned for the next release indeed.

By next release, do you mean BioSQL v1.0.2 or v1.1.0 here?

> At the time this was written it didn't seem to matter much as they are really
> semantically equivalent, and ORM tools weren't around much at the time :-)

I see - that kind of explains the reason why some tables have explicit
composite primary keys, while others just have a unique set of fields.

> I do hope that no-one is using a dynamically configuring ORM at run time so
> that this change can be a drop-in replacement that's fully backwards
> compatible.

Some dynamically configuring ORM code would never have coped with
these tables in the first place - so it doesn't matter here.  In other
cases the user can tell the ORM to treat the tuple
(taxon_id,name,name_class) as a primary key - and this should still be
fine even when this is explicit in the database schema.  I expect (and
hope) this will be a backwards compatible change.

Peter

From hlapp at gmx.net  Fri Nov 28 13:46:26 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 28 Nov 2008 13:46:26 -0500
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
	<C0450178-EE82-4EC8-B664-D3CC14B069AA@gmx.net>
	<320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com>
Message-ID: <20D70002-B512-4EA0-8755-1CF00310ADC6@gmx.net>


On Nov 28, 2008, at 1:41 PM, Peter wrote:

>>
>> It's part of the changes planned for the next release indeed.
>
> By next release, do you mean BioSQL v1.0.2 or v1.1.0 here?


That would be 1.0.2. Otherwise there would be no need to worry about  
backward compatibility (as 1.1x won't be by definition).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Fri Nov 28 13:57:40 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 18:57:40 +0000
Subject: [BioSQL-l] BioSQL and ontology "standards".
Message-ID: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com>

Hi all,

The BioSQL schema allows multiple ontologies, so that things like
entries in seqfeature_qualifier_value can say when they mean by
"locus_tag".

Currently BioPerl and Biopython (and I assume the other projects but
haven't checked) use a couple of ad-hoc ontology names for storing
annotation.  In particular, if there is no predefined entry for a
novel ontology term, it gets added on the fly.  This is very
convenient as it means a BioSQL database can be used without first
importing a predefined ontology.  However there are downsides, for
example spelling errors in the keys of a GenBank file get treated as a
ontology entries.

Have these ad-hoc ontologies ever been defined?  i.e. For table
bioentry_qualifier_value terms, which ad-hoc ontology name should be
used?  Biopython uses ad-hoc ontology named  'SeqFeature Keys',
'SeqFeature Sources', 'Annotation Tags' for various different tables
(which I believe is the same for BioPerl).

On a related point, it might make more sense to use a predefined
ontology, like SOFA or SO from http://www.sequenceontology.org/ where
a novel term is treated as an error (or perhaps falls back on the
ad-hoc ontology).  How do the various Bio* projects cope with
annotations in the database for different or multiple ontologies?  Or
has this not been considered?

Thanks,

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 28 15:04:33 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 20:04:33 +0000
Subject: [BioSQL-l] BioSQL and ontology "standards".
In-Reply-To: <49304392.4080908@eaglegenomics.com>
References: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com>
	<49304392.4080908@eaglegenomics.com>
Message-ID: <320fb6e00811281204i3bae31e4kc18f70121244b4d1@mail.gmail.com>

On Fri, Nov 28, 2008 at 7:16 PM, Richard Holland wrote:
>
> BioJava does what BioPerl does and pretty much makes it up as it goes
> along, using whatever the input files tell it.

OK, good.  But which ontology names do you use for which tables?  i.e.
Do you also use ad-hoc ontologies named  'SeqFeature Keys',
'SeqFeature Sources' and 'Annotation Tags'?

To be a little more specific, here are some examples - which I presume
(hope) are all coping BioPerl's conventions.

In recording a bioentry date, Biopython sets
bioentry_qualifier_value.term_id to point to a term table entry
"date_changed" which belongs to the ad-hoc "Annotation Tags" ontology.

In recording most bioentry annotations (a list of keywords), Biopython
sets bioentry_qualifier_value.term_id to point to a term table entry
for that annotation type (e.g. "keywords") which belongs to the ad-hoc
"Annotation Tags" ontology.

In recording a seqfeature, Biopython sets seqfeature.seqfeature_key_id
to point to a term table entry for that feature type (e.g. "CDS",
"misc_feature", "gene") which belongs to the ad-hoc "SeqFeature Keys"
ontology.  Biopython always sets seqfeature.type_term_id to point to a
term table entry for "EMBL/GenBank/SwissProt" within the ad-hoc
"SeqFeature Sources" ontology.

In recording most of a seqfeature's qualifiers (annotations),
Biopython sets seqfeature_qualifier_value.term_id to point to a term
table entry for the key (e.g. "locus_tag", "note", "translation")
which belongs to the ad-hoc "Annotation Tags" ontology.

Notice that the ad-hoc "Annotation Tags" ontology serves double duty,
doing both bioentry and seqfeature annotations.  This doesn't seem
entirely sensible.

On the other hand, when recording a seqfeature's location Biopython
and BioPerl leave location.term_id as NULL (rather than using any
particular ontology term).  This seems arbitary.

Relating to this, if we want to record a composite location type
(typically "join"), we'd want to use the location_qualifier_value
table.  BioPerl seems to leave this table empty (presumably assuming
all composite locations are joins) which is what Biopython currently
does too.  Here we can't just set location_qualifier_value.term_id as
NULL (why not?) so we have to introduce something.  The BioSQL
projects should first agree what ontology term and what ontology this
should be stored with.

> The trouble with throwing exceptions when things don't meet standards is
> that people complain when their custom files don't work, and can't be
> made to work without editing the file itself. ...

I'm not sure if you are talking about parsing files, or loading them
into BioSQL.  I agree that when parsing sometimes some leeway is
required.

In terms of *optionally* enforcing a strict ontology, throwing an
error is a good thing if the input file doesn't follow the ontology -
this indicates a problem with the file (or perhaps an out of date
ontology).  I would certainly leave the default behaviour as is with
the ad-hoc ontologies extended on the fly.

> I think the best approach is to always to use what the file says, and
> trust that it's accurate. What needs to be agreed between projects is
> any additional annotations that get introduced outside the context of
> file parsing, and the names of the ontologies used for the file
> annotations so that all projects use the same ontologies and don't
> replicate them inside the BioSQL database. It would be nice to
> standardise these names and the additional custom terms across the
> projects, in much the same way as people tried already to standardise
> the way general objects get mapped to BioSQL.

This is what I am trying to get at here - documenting the existing "ad
hoc" ontology usage.  My impression is that it has not been
documented, and that the BioPerl behaviour is the defacto BioSQL
standard.

I'd like to pin down this standard, and extend it for situations like
the location_qualifier_value.term_id and perhaps location.term_id
where BioPerl seems to ignore the ontology issue.

Peter

From holland at eaglegenomics.com  Fri Nov 28 14:16:34 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 28 Nov 2008 19:16:34 +0000
Subject: [BioSQL-l] BioSQL and ontology "standards".
In-Reply-To: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com>
References: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com>
Message-ID: <49304392.4080908@eaglegenomics.com>

BioJava does what BioPerl does and pretty much makes it up as it goes
along, using whatever the input files tell it.

The trouble with throwing exceptions when things don't meet standards is
that people complain when their custom files don't work, and can't be
made to work without editing the file itself. By custom I mean not only
things they've written themselves, but also files coming from
established tools which don't follow the rules (NEXUS format is a
classic example of this - the most popular tools that output NEXUS
pretty much ignore the format specification). Even the standards
providers themselves often don't comply with their own rules (several
Genbank examples supplied from NCBI/Entrez break any parser which tries
to be completely strict with the declared format).

I think the best approach is to always to use what the file says, and
trust that it's accurate. What needs to be agreed between projects is
any additional annotations that get introduced outside the context of
file parsing, and the names of the ontologies used for the file
annotations so that all projects use the same ontologies and don't
replicate them inside the BioSQL database. It would be nice to
standardise these names and the additional custom terms across the
projects, in much the same way as people tried already to standardise
the way general objects get mapped to BioSQL.

cheers,
Richard

Peter wrote:
> Hi all,
> 
> The BioSQL schema allows multiple ontologies, so that things like
> entries in seqfeature_qualifier_value can say when they mean by
> "locus_tag".
> 
> Currently BioPerl and Biopython (and I assume the other projects but
> haven't checked) use a couple of ad-hoc ontology names for storing
> annotation.  In particular, if there is no predefined entry for a
> novel ontology term, it gets added on the fly.  This is very
> convenient as it means a BioSQL database can be used without first
> importing a predefined ontology.  However there are downsides, for
> example spelling errors in the keys of a GenBank file get treated as a
> ontology entries.
> 
> Have these ad-hoc ontologies ever been defined?  i.e. For table
> bioentry_qualifier_value terms, which ad-hoc ontology name should be
> used?  Biopython uses ad-hoc ontology named  'SeqFeature Keys',
> 'SeqFeature Sources', 'Annotation Tags' for various different tables
> (which I believe is the same for BioPerl).
> 
> On a related point, it might make more sense to use a predefined
> ontology, like SOFA or SO from http://www.sequenceontology.org/ where
> a novel term is treated as an error (or perhaps falls back on the
> ad-hoc ontology).  How do the various Bio* projects cope with
> annotations in the database for different or multiple ontologies?  Or
> has this not been considered?
> 
> Thanks,
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From d.m.a.martin at dundee.ac.uk  Tue Nov 25 06:09:08 2008
From: d.m.a.martin at dundee.ac.uk (David Martin)
Date: Tue, 25 Nov 2008 11:09:08 +0000
Subject: [BioSQL-l] Passwords on biosql databases
Message-ID: <492BDCF2.6F09.00E0.0@dundee.ac.uk>

I have set up a biosql database on Postgres. The Bio::DB::BioDB module croaks complaining that it needs the password. I have tried the obvious things (-password -passwd and reading what docs I could find) but to no avail.
 
Any clues?
 
Assuming the database is on postgres and is called biosql with user biosqluser and password biosqlpassword I have been trying:
 
 my $dbadp = Bio::DB::BioDB->new(-database => 'biosql',
                                -user => 'biosqluser',
                                -dbname => 'biosql',
                                -host => 'postgres',
     -passwd=>'biosqlpassword',
                                -driver => 'Pg');
 
regards
 
..d
 
David Martin PhD
College of Life Sciences
University of Dundee 
The University of Dundee is a Scottish Registered Charity, No. SC015096.

The University of Dundee is a registered Scottish charity, No: SC015096


From chapmanb at 50mail.com  Tue Nov 25 16:16:22 2008
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 25 Nov 2008 16:16:22 -0500
Subject: [BioSQL-l] Python ORM mapping for BioSQL
Message-ID: <20081125211622.GE83220@sobchak.mgh.harvard.edu>

Hi Peter;
Hope all is going well with you. I was glancing at the BioSQL
mailing list archives last night and saw your messages earlier this
month about using an ORM mapper with BioSQL.

Some of my current work is using a BioSQL storage backend with a
javascript web interface. The middleware uses Pylons and SQLAlchemy.
This uses some parts of BioSQL not well represented via an object front
end like bioentry_relationship, and so it has been convenient to work
with these via SQLAlchemy directly.

To your initial question, SQLAlchemy can handle those non-primary key
tables without a problem by setting "primary_key = True" for all of
the unique columns.

What I have done thus far is definitely non-complete, and also
includes some add-on tables for storing experimental data linked to
BioSQL. However, I am attaching it here just to give you an idea
(init.py is the __init__.py of the module). You would use it like:

from Wherever.BioSQL import get_session, biosql

session = get_session("production")
entries = session.query(biosql.Bioentry).filter_by(identifier = "A12345")

If you, or anyone else, is developing something similar,
I'd be happy to help with something generalized.

Brad
-------------- next part --------------
A non-text attachment was scrubbed...
Name: init.py
Type: text/x-python
Size: 1116 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biosql-l/attachments/20081125/f4daa3a6/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BioSQL.py
Type: text/x-python
Size: 6881 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biosql-l/attachments/20081125/f4daa3a6/attachment-0001.py>

From biopython at maubp.freeserve.co.uk  Thu Nov  6 11:53:13 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 6 Nov 2008 11:53:13 +0000
Subject: [BioSQL-l] Tables without a (composite) primary key
Message-ID: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>

I've recently been looking into some object-relational mappers which
caused me to look more closely at the BioSQL schema.  Many of these
packages require a primary key, but not all can cope with a composite
primary key.  However, some BioSQL tables don't have any primary key
at all.

Several BioSQL tables have composite primary keys, for example the
term_dbxref table has a composite key of (term_id, dbxref_id), and
also an index on dbxref_id as well.

However, some BioSQL tables do not have a primary key, for example:

-- corresponds to the names table of the NCBI taxonomy databaase
CREATE TABLE taxon_name (
       taxon_id		INT(10) UNSIGNED NOT NULL,
       name		VARCHAR(255) BINARY NOT NULL,
       name_class	VARCHAR(32) BINARY NOT NULL,
       UNIQUE (taxon_id,name,name_class)
) TYPE=INNODB;

CREATE INDEX taxnametaxonid ON taxon_name(taxon_id);
CREATE INDEX taxnamename    ON taxon_name(name);

Why don't taxon_name, bioentry_path, term_relationship,
bioentry_qualifier_value, seqfeature_path have a primary key (just a
uniqueness criteria)?

Thanks,

Peter


From biopython at maubp.freeserve.co.uk  Thu Nov  6 12:37:42 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 6 Nov 2008 12:37:42 +0000
Subject: [BioSQL-l] Tables without a (composite) primary key
In-Reply-To: <200811061323.43749.raoul.bonnal@itb.cnr.it>
References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
	<200811061323.43749.raoul.bonnal@itb.cnr.it>
Message-ID: <320fb6e00811060437h367804c7y6d46ed36d1b619ae@mail.gmail.com>

On Thu, Nov 6, 2008 at 12:23 PM, Raoul Jean Pierre Bonnal
<raoul.bonnal at itb.cnr.it> wrote:
>
> Dear Peter,
> I'm writing the wrapper for BioRuby using DataMapper an ORM (Active Record is
> similar).

Hi Raul,

I'm looking at a python based ORM to use with BioSQL.

[The existing Biopython BioSQL bridge uses raw SQL to turn the
sequences and features into Biopython objects - this all seems to work
fine, but it doesn't offer the full flexibility of an ORM framework.]

>
> I think we can cosider to move or branch BioSQL' schema to the approach
> suggested by this kind of ORMs, with a pk for every table named "id" and a
> table name in plural. Fk names are quite correct.
>

I don't think it makes sense to add a single primary key to many of
these tables (e.g. term_dbxref).  The existing composite primary keys
seem fine (its just a shame some ORMs can't cope).

I was thinking the tables currently lacking any primary key could get
one (based on the current UNIQUE rule).  So for example, the
taxon_name could use (taxon_id,name,name_class) as its primary key.  I
don't know how big a change this would be - but superficially it looks
backwards compatible.  This is why I was asking why they didn't have
PK in the first place.

>
> PS: DataMapper handles very well composite PK, much better tha ActiveRecord.
>

For python, currently Django and also I believe SQLObjects don't
support composite primary keys.  I'll take a look at SQLAlchemy next
which should cope better.

Peter


From hlapp at gmx.net  Thu Nov  6 21:39:55 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Nov 2008 16:39:55 -0500
Subject: [BioSQL-l] Tables without a (composite) primary key
In-Reply-To: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
Message-ID: <082DB965-12D9-4941-BF46-E49359C8C096@gmx.net>

Hi Peter,

it's a known enhancement request. I know that some ORMs have trouble  
reverse engineering the mapping if there is no PK defined.

Semantically, however, in the absence of a primary key constraint the  
first unique key constraint is equivalent to a primary key (in fact  
some ER modeling tools will automatically do the conversion); unique  
keys are also called alternate keys (alternate to the primary key).

So for now feel free to either change the UK constraint to PK where  
there is no PK defined and your reverse engineering tool needs it. If  
you don't use a reverse engineering tool, just set the columns of the  
UK constraint as the compound primary key if there isn't a surrogate PK.

BioSQL 1.1+ will have surrogate PKs on all tables, but this change may  
not be backwards compatible for existing language bindings, which is  
why I'd like to make those changes first that should be fully  
backwards compatible.

	-hilmar

On Nov 6, 2008, at 6:53 AM, Peter wrote:

> I've recently been looking into some object-relational mappers which
> caused me to look more closely at the BioSQL schema.  Many of these
> packages require a primary key, but not all can cope with a composite
> primary key.  However, some BioSQL tables don't have any primary key
> at all.
>
> Several BioSQL tables have composite primary keys, for example the
> term_dbxref table has a composite key of (term_id, dbxref_id), and
> also an index on dbxref_id as well.
>
> However, some BioSQL tables do not have a primary key, for example:
>
> -- corresponds to the names table of the NCBI taxonomy databaase
> CREATE TABLE taxon_name (
>       taxon_id		INT(10) UNSIGNED NOT NULL,
>       name		VARCHAR(255) BINARY NOT NULL,
>       name_class	VARCHAR(32) BINARY NOT NULL,
>       UNIQUE (taxon_id,name,name_class)
> ) TYPE=INNODB;
>
> CREATE INDEX taxnametaxonid ON taxon_name(taxon_id);
> CREATE INDEX taxnamename    ON taxon_name(name);
>
> Why don't taxon_name, bioentry_path, term_relationship,
> bioentry_qualifier_value, seqfeature_path have a primary key (just a
> uniqueness criteria)?
>
> Thanks,
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Thu Nov  6 22:12:28 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 6 Nov 2008 22:12:28 +0000
Subject: [BioSQL-l] Tables without a (composite) primary key
In-Reply-To: <082DB965-12D9-4941-BF46-E49359C8C096@gmx.net>
References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
	<082DB965-12D9-4941-BF46-E49359C8C096@gmx.net>
Message-ID: <320fb6e00811061412s7516b573uf26d5042e193fa45@mail.gmail.com>

On Thu, Nov 6, 2008 at 9:39 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> Hi Peter,
>
> it's a known enhancement request. I know that some ORMs have trouble reverse
> engineering the mapping if there is no PK defined.

Oh right, "Surrogate primary keys on all tables" on this page:
http://www.biosql.org/wiki/Enhancement_Requests

> Semantically, however, in the absence of a primary key constraint the first
> unique key constraint is equivalent to a primary key (in fact some ER
> modeling tools will automatically do the conversion); unique keys are also
> called alternate keys (alternate to the primary key).
>
> So for now feel free to either change the UK constraint to PK where there is
> no PK defined and your reverse engineering tool needs it. If you don't use a
> reverse engineering tool, just set the columns of the UK constraint as the
> compound primary key if there isn't a surrogate PK.

OK - I'll bear that in mind.

> BioSQL 1.1+ will have surrogate PKs on all tables, but this change may not
> be backwards compatible for existing language bindings, which is why I'd
> like to make those changes first that should be fully backwards compatible.

That sounds sensible.

Thanks Hilmar!

Peter

P.S. Is there any agreed terminology: compound primary key versus
composite primary key?


From raoul.bonnal at itb.cnr.it  Thu Nov  6 12:23:43 2008
From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal)
Date: Thu, 06 Nov 2008 13:23:43 +0100
Subject: [BioSQL-l] Tables without a (composite) primary key
In-Reply-To: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
Message-ID: <200811061323.43749.raoul.bonnal@itb.cnr.it>

Dear Peter, 
I'm writing the wrapper for BioRuby using DataMapper an ORM (Active Record is 
similar).

I think we can cosider to move or branch BioSQL' schema to the approach 
suggested by this kind of ORMs, with a pk for every table named "id" and a 
table name in plural. Fk names are quite correct.

PS: DataMapper handles very well composite PK, much better tha ActiveRecord.

Il gioved? 06 novembre 2008 12:53:13 Peter ha scritto:
> I've recently been looking into some object-relational mappers which
> caused me to look more closely at the BioSQL schema.  Many of these
> packages require a primary key, but not all can cope with a composite
> primary key.  However, some BioSQL tables don't have any primary key
> at all.
>
> Several BioSQL tables have composite primary keys, for example the
> term_dbxref table has a composite key of (term_id, dbxref_id), and
> also an index on dbxref_id as well.
>
> However, some BioSQL tables do not have a primary key, for example:
>
> -- corresponds to the names table of the NCBI taxonomy databaase
> CREATE TABLE taxon_name (
>        taxon_id		INT(10) UNSIGNED NOT NULL,
>        name		VARCHAR(255) BINARY NOT NULL,
>        name_class	VARCHAR(32) BINARY NOT NULL,
>        UNIQUE (taxon_id,name,name_class)
> ) TYPE=INNODB;
>
> CREATE INDEX taxnametaxonid ON taxon_name(taxon_id);
> CREATE INDEX taxnamename    ON taxon_name(name);
>
> Why don't taxon_name, bioentry_path, term_relationship,
> bioentry_qualifier_value, seqfeature_path have a primary key (just a
> uniqueness criteria)?
>
> Thanks,
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l


From mark.schreiber at novartis.com  Fri Nov  7 02:07:12 2008
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 7 Nov 2008 10:07:12 +0800
Subject: [BioSQL-l] Tables without a (composite) primary key
In-Reply-To: <200811061323.43749.raoul.bonnal@itb.cnr.it>
Message-ID: <OFF57EB0F7.4AC79B77-ON482574FA.000B2306-482574FA.000BA587@ah.novartis.com>

Hi -

In the Java JPA it is possible to use an embedded object as a primary key. 
This gets you around the situations where the primary key is composite. It 
also effectively gets you around those tables where there is no key but it 
is implicit as all the fields are unique (as in taxon_name).

What you end up with is an object that holds taxon_id, name, name_class, 
and inside that object you have an embedded key object that contains the 
same three fields. In this way any changes that are made to the object are 
still associated with the original row via the unchanged embedded PK 
object and are updated accordingly.

While I agree that an explicit PK's for all BioSQL tables would be nicer 
for ORM frameworks many frameworks have ways to get around this, possibly 
those in Ruby or Python do as well.

- Mark


biosql-l-bounces at lists.open-bio.org wrote on 11/06/2008 08:23:43 PM:

> Dear Peter, 
> I'm writing the wrapper for BioRuby using DataMapper an ORM (Active 
Record is 
> similar).
> 
> I think we can cosider to move or branch BioSQL' schema to the approach 
> suggested by this kind of ORMs, with a pk for every table named "id" and 
a 
> table name in plural. Fk names are quite correct.
> 
> PS: DataMapper handles very well composite PK, much better tha 
ActiveRecord.
> 
> Il gioved? 06 novembre 2008 12:53:13 Peter ha scritto:
> > I've recently been looking into some object-relational mappers which
> > caused me to look more closely at the BioSQL schema.  Many of these
> > packages require a primary key, but not all can cope with a composite
> > primary key.  However, some BioSQL tables don't have any primary key
> > at all.
> >
> > Several BioSQL tables have composite primary keys, for example the
> > term_dbxref table has a composite key of (term_id, dbxref_id), and
> > also an index on dbxref_id as well.
> >
> > However, some BioSQL tables do not have a primary key, for example:
> >
> > -- corresponds to the names table of the NCBI taxonomy databaase
> > CREATE TABLE taxon_name (
> >        taxon_id      INT(10) UNSIGNED NOT NULL,
> >        name      VARCHAR(255) BINARY NOT NULL,
> >        name_class   VARCHAR(32) BINARY NOT NULL,
> >        UNIQUE (taxon_id,name,name_class)
> > ) TYPE=INNODB;
> >
> > CREATE INDEX taxnametaxonid ON taxon_name(taxon_id);
> > CREATE INDEX taxnamename    ON taxon_name(name);
> >
> > Why don't taxon_name, bioentry_path, term_relationship,
> > bioentry_qualifier_value, seqfeature_path have a primary key (just a
> > uniqueness criteria)?
> >
> > Thanks,
> >
> > Peter
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l
> 
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.


From biopython at maubp.freeserve.co.uk  Fri Nov  7 18:35:31 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 7 Nov 2008 18:35:31 +0000
Subject: [BioSQL-l] Tables without a (composite) primary key
In-Reply-To: <320fb6e00811061412s7516b573uf26d5042e193fa45@mail.gmail.com>
References: <320fb6e00811060353y18ea6b6cs9aa0dac56e1988a7@mail.gmail.com>
	<082DB965-12D9-4941-BF46-E49359C8C096@gmx.net>
	<320fb6e00811061412s7516b573uf26d5042e193fa45@mail.gmail.com>
Message-ID: <320fb6e00811071035y496ea4d8p93aa0f54633950f@mail.gmail.com>

I've ruled out using Django v1.0 and the current version of SQLObjects
with BioSQL as they don't (yet) support composite primary keys.
However, SQLAlchemy 0.5.0 seems to be happy with the current BioSQL
schema as is :)

http://www.djangoproject.com/  http://www.sqlobject.org/
http://www.sqlalchemy.org/

Hilmar:
>> BioSQL 1.1+ will have surrogate PKs on all tables, but this change may not
>> be backwards compatible for existing language bindings, which is why I'd
>> like to make those changes first that should be fully backwards compatible.

Peter:
> That sounds sensible.

Actually I may have initially misunderstood you.

Are you saying for tables which already have a composite primary key
(e.g. term_dbxref) you plan to add/replace this with a surrogate
(single column) PK - just to accommodate certain simplistic ORMs?  I'm
not so keen on this, it seems like an invasive change with little
benefit, but potentially making lots of work updating the Bio*
bindings.

However, would the smaller step of adding composite primary keys to
tables currently lacking them be possible on the BioSQL v1.0.x
roadmap? e.g. for taxon_name using (taxon_id,name,name_class) as the
composite primary key, currently specified to be unique. Or might this
also cause trouble for the Bio* binding?  If this was possible, it
*might* be useful for certain ORMs.

Was there a reason why tables like taxon_name never had a
(composite/compound) primary key in the first place?

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 14 20:48:02 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 14 Nov 2008 20:48:02 +0000
Subject: [BioSQL-l] parent_taxon_id of a root node
In-Reply-To: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com>
References: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com>
Message-ID: <320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com>

On Fri, Oct 3, 2008m, I wrote:
>
> Hello all,
>
> I was puzzled to find the BioSQL script load_ncbi_taxonomy.pl will set
> the parent_taxon_id of the NCBI root node in the taxon table to point
> to itself.  I would have expected this to be NULL indicating no
> parent.  If someone is using the database directly, extracting a
> lineage could trigger an infinite loop.  Can anyone explain the
> rational here?
>
> Note that when Biopython adds entries to the taxon table, it uses NULL
> for a root node.  When retrieving sequences from a BioSQL database,
> Biopython does cope with a root node with a NULL parent or a
> self-parent - would it safe to assume BioPerl and Java can also cope
> with both situations?
>
> Thanks,
>
> Peter
>

Hi again,

I thought I'd raise this question again (as I didn't see any response
last time), as I've just been bitten by the self-parent taxon problem
this afternoon.  This was for a simple webfront end to part of a
BioSQL database using SQLAlchemy in python - but that's not important.

I was using a simple loop to build up lineages, which was working fine
until I ran load_ncbi_taxonomy.pl and suddenly my program seemed to
just time out.  I'd forgotten about the self-parent root nodes used by
load_ncbi_taxonomy.pl which had triggered an infinite loop.

I hit another (less serious) problem stemming for these self-parent
root nodes when I wanted to generate a list of sub-lineages (child
entries), essentially:

SELECT * FROM taxon WHERE parent_taxon_id=12345;

When calling this on a root node, I had to modify this to explicitly
exclude itself from the children:

SELECT * FROM taxon WHERE parent_taxon_id=12345 AND taxon_id<>12345;

So to repeat my earlier question, is there a reason why
parent_taxon_id isn't just NULL for root nodes?  Was this a deliberate
design choice - because if not, I think this could be regarded as a
bug in  load_ncbi_taxonomy.pl.

Thanks

Peter


From hlapp at gmx.net  Sat Nov 15 18:34:45 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 15 Nov 2008 13:34:45 -0500
Subject: [BioSQL-l] parent_taxon_id of a root node
In-Reply-To: <320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com>
References: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com>
	<320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com>
Message-ID: <F41D532F-A9AE-427A-B10F-2160FF51E12F@gmx.net>

Sorry Peter - it looks like this slipped my attention (Oct was crazy).  
Thanks for raising it again. I agree with you, this looks like a bug.  
Would you mind filing it?

It's possible that has secretly been assumed as policy and hence led  
to some people identifying the root node by equating parent and  
taxon_id, but surely this sounds like the wrong way of doing it, so it  
deserves fixing.

	-hilmar

On Nov 14, 2008, at 3:48 PM, Peter wrote:

> On Fri, Oct 3, 2008m, I wrote:
>>
>> Hello all,
>>
>> I was puzzled to find the BioSQL script load_ncbi_taxonomy.pl will  
>> set
>> the parent_taxon_id of the NCBI root node in the taxon table to point
>> to itself.  I would have expected this to be NULL indicating no
>> parent.  If someone is using the database directly, extracting a
>> lineage could trigger an infinite loop.  Can anyone explain the
>> rational here?
>>
>> Note that when Biopython adds entries to the taxon table, it uses  
>> NULL
>> for a root node.  When retrieving sequences from a BioSQL database,
>> Biopython does cope with a root node with a NULL parent or a
>> self-parent - would it safe to assume BioPerl and Java can also cope
>> with both situations?
>>
>> Thanks,
>>
>> Peter
>>
>
> Hi again,
>
> I thought I'd raise this question again (as I didn't see any response
> last time), as I've just been bitten by the self-parent taxon problem
> this afternoon.  This was for a simple webfront end to part of a
> BioSQL database using SQLAlchemy in python - but that's not important.
>
> I was using a simple loop to build up lineages, which was working fine
> until I ran load_ncbi_taxonomy.pl and suddenly my program seemed to
> just time out.  I'd forgotten about the self-parent root nodes used by
> load_ncbi_taxonomy.pl which had triggered an infinite loop.
>
> I hit another (less serious) problem stemming for these self-parent
> root nodes when I wanted to generate a list of sub-lineages (child
> entries), essentially:
>
> SELECT * FROM taxon WHERE parent_taxon_id=12345;
>
> When calling this on a root node, I had to modify this to explicitly
> exclude itself from the children:
>
> SELECT * FROM taxon WHERE parent_taxon_id=12345 AND taxon_id<>12345;
>
> So to repeat my earlier question, is there a reason why
> parent_taxon_id isn't just NULL for root nodes?  Was this a deliberate
> design choice - because if not, I think this could be regarded as a
> bug in  load_ncbi_taxonomy.pl.
>
> Thanks
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Sun Nov 16 14:58:20 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 16 Nov 2008 14:58:20 +0000
Subject: [BioSQL-l] parent_taxon_id of a root node
In-Reply-To: <F41D532F-A9AE-427A-B10F-2160FF51E12F@gmx.net>
References: <320fb6e00810030918u7dac6493wc017b4cc69ba2bc2@mail.gmail.com>
	<320fb6e00811141248j43c66959k308766ec9b2af166@mail.gmail.com>
	<F41D532F-A9AE-427A-B10F-2160FF51E12F@gmx.net>
Message-ID: <320fb6e00811160658s6282022by3681364e14aecf69@mail.gmail.com>

On Sat, Nov 15, 2008 at 6:34 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> Sorry Peter - it looks like this slipped my attention (Oct was crazy).
> Thanks for raising it again. I agree with you, this looks like a bug. Would
> you mind filing it?

Sure,
http://bugzilla.open-bio.org/show_bug.cgi?id=2664

> It's possible that has secretly been assumed as policy and hence led to some
> people identifying the root node by equating parent and taxon_id, but surely
> this sounds like the wrong way of doing it, so it deserves fixing.

In the short term we should just make sure all the Bio* projects can
cope with either style root node (Biopython can), but in the long term
are self parent taxon entries something that could be banned via the
schema?

Regards,

Peter


From biopython at maubp.freeserve.co.uk  Wed Nov 26 18:37:51 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 26 Nov 2008 18:37:51 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>

On Tue, Nov 25, 2008 at 9:16 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi Peter;
> Hope all is going well with you. I was glancing at the BioSQL
> mailing list archives last night and saw your messages earlier this
> month about using an ORM mapper with BioSQL.
>
> Some of my current work is using a BioSQL storage backend with a
> javascript web interface. The middleware uses Pylons and SQLAlchemy.
> This uses some parts of BioSQL not well represented via an object front
> end like bioentry_relationship, and so it has been convenient to work
> with these via SQLAlchemy directly.

I've been using TurboGears with SQLAlchemy, and so far everything has
been OK.  This was essentially independent of the Biopython BioSQL
mapping.

> To your initial question, SQLAlchemy can handle those non-primary key
> tables without a problem by setting "primary_key = True" for all of
> the unique columns.

Yes, SQLAlchemy seems pretty good.  The only catch was for a table
with no primary key defined at all (the taxon_name table) which
required a little more work setting up the ORM mapping, but which also
seems to work fine.

> What I have done thus far is definitely non-complete, and also
> includes some add-on tables for storing experimental data linked to
> BioSQL. However, I am attaching it here just to give you an idea
> (init.py is the __init__.py of the module). You would use it like:
>
> from Wherever.BioSQL import get_session, biosql
>
> session = get_session("production")
> entries = session.query(biosql.Bioentry).filter_by(identifier = "A12345")
>
> If you, or anyone else, is developing something similar,
> I'd be happy to help with something generalized.
>
> Brad
>

I'll probably take a look at this next week - I'm on the road at the
moment.  Thanks for sharing,

Peter


From hlapp at gmx.net  Wed Nov 26 19:28:16 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 26 Nov 2008 14:28:16 -0500
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
Message-ID: <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>


On Nov 26, 2008, at 1:37 PM, Peter wrote:

>> To your initial question, SQLAlchemy can handle those non-primary key
>> tables without a problem by setting "primary_key = True" for all of
>> the unique columns.
>
> Yes, SQLAlchemy seems pretty good.  The only catch was for a table
> with no primary key defined at all (the taxon_name table) which
> required a little more work setting up the ORM mapping, but which also
> seems to work fine.


It has one unique key defined on (name, name_class, taxon_id). Is that  
not what you are seeing?

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From gabrielle_doan at gmx.net  Thu Nov 27 13:51:16 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Thu, 27 Nov 2008 14:51:16 +0100
Subject: [BioSQL-l] BioSQL schema of v1.0.1
Message-ID: <492EA5D4.7090809@gmx.net>

Hi Hilmar,

recently I've noticed that BioSQL has a new release. But in the BioSQL 
core schema, Release v1.0.1 there is still the BioSQL schema of v1.0.0 
inside. Can you update it please?
Thanks a lot.

Cheers,
Gabrielle


From hlapp at gmx.net  Thu Nov 27 17:33:25 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 27 Nov 2008 12:33:25 -0500
Subject: [BioSQL-l] BioSQL schema of v1.0.1
In-Reply-To: <492EA5D4.7090809@gmx.net>
References: <492EA5D4.7090809@gmx.net>
Message-ID: <3ED93BDC-CD08-4589-A881-5E0CC28EFAA9@gmx.net>

Hi Gabrielle,

I'm not sure what you mean. The changes that v1.0.1 introduces are  
also in the main trunk, the biosql-release-1_0_1 tag, and in the  
distribution downloadable from the website.

It's easily possible though that I may have overlooked something -  
could you elaborate what prompted you to your conclusion below?

	-hilmar

On Nov 27, 2008, at 8:51 AM, Gabrielle Doan wrote:

> Hi Hilmar,
>
> recently I've noticed that BioSQL has a new release. But in the  
> BioSQL core schema, Release v1.0.1 there is still the BioSQL schema  
> of v1.0.0 inside. Can you update it please?
> Thanks a lot.
>
> Cheers,
> Gabrielle
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Thu Nov 27 18:07:30 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 27 Nov 2008 13:07:30 -0500
Subject: [BioSQL-l] BioSQL schema of v1.0.1
In-Reply-To: <492EDD4C.3090601@gmx.net>
References: <492EA5D4.7090809@gmx.net>
	<3ED93BDC-CD08-4589-A881-5E0CC28EFAA9@gmx.net>
	<492EDD4C.3090601@gmx.net>
Message-ID: <605BB761-BBFC-4683-AEBA-1F1BAB22617A@gmx.net>

Ahh - I see. The schema has not changed from 1.0.0 to 1.0.1 in terms  
of entity structure and their relations. The only schema changes were  
to widen the column width of bioentry.accession and dbxref.accession.  
A complete list of changes is in the file Changes in the root  
directory of the distribution.

So the schema diagram and all documentation still fully apply, except  
for the width of those two columns, which is necessary to deal with  
certain annotations such as pathways.

Does that resolve the issue for you?

	-hilmar

On Nov 27, 2008, at 12:47 PM, Gabrielle Doan wrote:

> Hi Hilmar,
>
> I downloaded v1.0.1 from the website. When I looked throught the doc  
> folder I found a pdf which describes the BioSQL schema v1.0 and not  
> v1.0.1. Was the schema v1.0.1. left off intentionally?
> I'm grateful if you can give me a reply.
>
> cheers,
> Gabrielle
>
> Hilmar Lapp schrieb:
>> Hi Gabrielle,
>> I'm not sure what you mean. The changes that v1.0.1 introduces are  
>> also in the main trunk, the biosql-release-1_0_1 tag, and in the  
>> distribution downloadable from the website.
>> It's easily possible though that I may have overlooked something -  
>> could you elaborate what prompted you to your conclusion below?
>>    -hilmar
>> On Nov 27, 2008, at 8:51 AM, Gabrielle Doan wrote:
>>> Hi Hilmar,
>>>
>>> recently I've noticed that BioSQL has a new release. But in the  
>>> BioSQL core schema, Release v1.0.1 there is still the BioSQL  
>>> schema of v1.0.0 inside. Can you update it please?
>>> Thanks a lot.
>>>
>>> Cheers,
>>> Gabrielle
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Fri Nov 28 10:43:01 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 10:43:01 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
Message-ID: <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>

On Wed, Nov 26, 2008 at 7:28 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> On Nov 26, 2008, at 1:37 PM, Peter wrote:
>> Yes, SQLAlchemy seems pretty good.  The only catch was for a table
>> with no primary key defined at all (the taxon_name table) which
>> required a little more work setting up the ORM mapping, but which also
>> seems to work fine.
>
> It has one unique key defined on (name, name_class, taxon_id). Is that not
> what you are seeing?
>
>        -hilmar

According to the MySQL schema, taxon_name has a unique restraint but
does NOT have a primary key:

CREATE TABLE taxon_name (
       taxon_id		INT(10) UNSIGNED NOT NULL,
       name		VARCHAR(255) BINARY NOT NULL,
       name_class	VARCHAR(32) BINARY NOT NULL,
       UNIQUE (taxon_id,name,name_class)
) TYPE=INNODB;

As you said, since (taxon_id,name,name_class) is unique, this tuple
can be used as a substitute primary key in the ORM mapping (which for
SQLAlchemy I seem to have to do manually).  SQLAlchemy would do this
automatically if the schema actually used (taxon_id,name,name_class)
as a primary key explicitly.  i.e. Why not this:

CREATE TABLE taxon_name (
       taxon_id		INT(10) UNSIGNED NOT NULL,
       name		VARCHAR(255) BINARY NOT NULL,
       name_class	VARCHAR(32) BINARY NOT NULL,
       PRIMARY KEY (taxon_id,name,name_class)
) TYPE=INNODB;

See also this thread where I wrote:
http://lists.open-bio.org/pipermail/biosql-l/2008-November/001386.html
> Was there a reason why tables like taxon_name never had a
> (composite/compound) primary key in the first place?

Thanks,

Peter


From n.j.loman at bham.ac.uk  Fri Nov 28 11:12:32 2008
From: n.j.loman at bham.ac.uk (Nick Loman)
Date: Fri, 28 Nov 2008 11:12:32 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
Message-ID: <492FD220.60505@bham.ac.uk>

Peter wrote:

>>> Yes, SQLAlchemy seems pretty good.  The only catch was for a table
>>> with no primary key defined at all (the taxon_name table) which
>>> required a little more work setting up the ORM mapping, but which also
>>> seems to work fine.
>> It has one unique key defined on (name, name_class, taxon_id). Is that not
>> what you are seeing?
>>
>>        -hilmar
> 
> According to the MySQL schema, taxon_name has a unique restraint but
> does NOT have a primary key:

Just to say I've had good results using the Django 
(http://www.djangoproject.com) ORM system with a BioSQL database.

You can get going quite quickly using the Django introspection feature 
(configure your settings.py file and run python manage.py inspectdb to 
get a models file).

However, the version of Django I use (not sure about latest) didn't 
support multi-column indexes as primary key, so I had to add another 
auto_increment column to use as a primary key.

Cheers,

Nick.


From biopython at maubp.freeserve.co.uk  Fri Nov 28 11:55:22 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 11:55:22 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <492FD220.60505@bham.ac.uk>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
	<492FD220.60505@bham.ac.uk>
Message-ID: <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>

Nick Loman wrote:
>
>> According to the MySQL schema, taxon_name has a unique restraint but
>> does NOT have a primary key:
>
> Just to say I've had good results using the Django
> (http://www.djangoproject.com) ORM system with a BioSQL database.
>
> You can get going quite quickly using the Django introspection feature
> (configure your settings.py file and run python manage.py inspectdb to get a
> models file).
>
> However, the version of Django I use (not sure about latest) didn't support
> multi-column indexes as primary key, so I had to add another auto_increment
> column to use as a primary key.

When I started looking at web-frameworks and ORM, I didn't want to
modify a perfectly good schema (BioSQL) just to cope with a limited
tool.

I investigated Django earlier this month, and rejected it because it
doesn't yet support multi-column indices as primary keys.  They've had
an open bug on this for 3 years, with no expected date yet:
http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys
http://code.djangoproject.com/ticket/373

My impression is that Django's philosophy is that they expect you to
define your objects which then automatically defines the database
schema.  Note the title of this FAQ page refers to an existing schema
as a "legacy" database:
http://docs.djangoproject.com/en/dev/howto/legacy-databases/
If Django can cope with an existing schema, then it does look like an
excellent package, and seems well documented.

I also rejected another python ORM system, SQLObjects, on similar
grounds.  Their documentation says "SQLObject does not support primary
keys made up of multiple columns (that probably won't change)".  In
fact, they currently are even less flexible in that SQLObject requires
an *integer* primary key on each table!

This left SQLAlchemy as the remaining python ORM candidate, which
seems to cope just fine with the unmodified BioSQL schema.  Brad and I
have reported using BioSQL with SQLAlchemy successfully (within the
python web-frameworks TurboGears and Pylons respectively).

Peter


From raoul.bonnal at itb.cnr.it  Fri Nov 28 13:51:02 2008
From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal)
Date: Fri, 28 Nov 2008 14:51:02 +0100
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<492FD220.60505@bham.ac.uk>
	<320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>
Message-ID: <200811281451.03433.raoul.bonnal@itb.cnr.it>

Il venerd? 28 novembre 2008 12:55:22 Peter ha scritto:
> Nick Loman wrote:....
> My impression is that Django's philosophy is that they expect you to
> define your objects which then automatically defines the database
> schema.  Note the title of this FAQ page refers to an existing schema
> as a "legacy" database:
> http://docs.djangoproject.com/en/dev/howto/legacy-databases/
> If Django can cope with an existing schema, then it does look like an
> excellent package, and seems well documented.
You are right.This, philosophy, is the same of others ORM like ActiveRecord 
and DataMapper. In Ruby I prefer DataMapper because is simpler and 
configurable than AR. Usually they have particular conventions and 
requirements. So choose the one which fits best with BioSQL schema without  
modify the schema. 
I wasted a lot of time digging into the API to understand how ORM handles 
relationships.

Last, check if your ORM can handle transactions for free.

Ciao.

--
Ra


From n.j.loman at bham.ac.uk  Fri Nov 28 15:59:53 2008
From: n.j.loman at bham.ac.uk (Nick Loman)
Date: Fri, 28 Nov 2008 15:59:53 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>	
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>	
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>	
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>	
	<492FD220.60505@bham.ac.uk>
	<320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>
Message-ID: <49301579.2060700@bham.ac.uk>

Peter wrote:

>> However, the version of Django I use (not sure about latest) didn't support
>> multi-column indexes as primary key, so I had to add another auto_increment
>> column to use as a primary key.
> 
> When I started looking at web-frameworks and ORM, I didn't want to
> modify a perfectly good schema (BioSQL) just to cope with a limited
> tool.

Fair enough.

> I investigated Django earlier this month, and rejected it because it
> doesn't yet support multi-column indices as primary keys.  They've had
> an open bug on this for 3 years, with no expected date yet:
> http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys
> http://code.djangoproject.com/ticket/373
> 
> My impression is that Django's philosophy is that they expect you to
> define your objects which then automatically defines the database
> schema.  Note the title of this FAQ page refers to an existing schema
> as a "legacy" database:
> http://docs.djangoproject.com/en/dev/howto/legacy-databases/
> If Django can cope with an existing schema, then it does look like an
> excellent package, and seems well documented.

You can still use Django even if you don't want to modify your database, 
with the caveat that certain functions (e.g. adding a new taxon via the 
ORM) will not work correctly. If you are just querying data that still 
might be sufficiently useful.

And Django will let you fall back to raw SQL if you should need to at 
any point.

I personally was extremely skeptical about an ORM because of the added 
level of complexity, and sometimes difficulty understanding the 
relationship between the generated models and the underlying database. 
However, Django (like Python's) basic principles of DRY and "don't do 
anything magic" mean that I find the results acceptable enough for my 
applications.

I wouldn't make an argument to change BioSQL to suit Django, but I would 
commend Django to anyone using Python who wants an ORM - particularly if 
they are building a dynamic web site!

Cheers,

Nick.


From biopython at maubp.freeserve.co.uk  Fri Nov 28 16:57:47 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 16:57:47 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <49301579.2060700@bham.ac.uk>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
	<492FD220.60505@bham.ac.uk>
	<320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>
	<49301579.2060700@bham.ac.uk>
Message-ID: <320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com>

>> I investigated Django earlier this month, and rejected it because it
>> doesn't yet support multi-column indices as primary keys.  They've had
>> an open bug on this for 3 years, with no expected date yet:
>> http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys
>> http://code.djangoproject.com/ticket/373
>>
>> My impression is that Django's philosophy is that they expect you to
>> define your objects which then automatically defines the database
>> schema.  Note the title of this FAQ page refers to an existing schema
>> as a "legacy" database:
>> http://docs.djangoproject.com/en/dev/howto/legacy-databases/
>> If Django can cope with an existing schema, then it does look like an
>> excellent package, and seems well documented.
>
> You can still use Django even if you don't want to modify your database,
> with the caveat that certain functions (e.g. adding a new taxon via the ORM)
> will not work correctly. If you are just querying data that still might be
> sufficiently useful.

Maybe - but given so much of BioSQL uses composite primary keys etc it
was my impression that trying to use Django would be making life
difficult for myself.  If you already are familiar with Django, then
perhaps this wouldn't be so bad.

> I wouldn't make an argument to change BioSQL to suit Django, ...

Agreed.

> ... but I would commend Django to anyone using Python who wants an
> ORM - particularly if they are building a dynamic web site!

I agree but ONLY if you are not trying to use an existing schema with
composite primary keys and/or tables with no primary key.  For these
SQLAlchemy seems to be the current best bet with python, leading to
the choice of either TurboGears (which I went for) or Pylons (picked
by Brad).

Peter


From n.j.loman at bham.ac.uk  Fri Nov 28 17:00:58 2008
From: n.j.loman at bham.ac.uk (Nick Loman)
Date: Fri, 28 Nov 2008 17:00:58 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>	
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>	
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>	
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>	
	<492FD220.60505@bham.ac.uk>	
	<320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>	
	<49301579.2060700@bham.ac.uk>
	<320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com>
Message-ID: <493023CA.9030108@bham.ac.uk>

Peter wrote:

>> You can still use Django even if you don't want to modify your database,
>> with the caveat that certain functions (e.g. adding a new taxon via the ORM)
>> will not work correctly. If you are just querying data that still might be
>> sufficiently useful.
> 
> Maybe - but given so much of BioSQL uses composite primary keys etc it
> was my impression that trying to use Django would be making life
> difficult for myself.  If you already are familiar with Django, then
> perhaps this wouldn't be so bad.

Depends again on what you want to do. If you wanted to knock up a quick 
web-based genome viewer for example you might not need to amend (or even 
access) the taxon table as it would be pre-populated. And if you REALLY 
had to, you could just fashion some SQL to do it.

>> ... but I would commend Django to anyone using Python who wants an
>> ORM - particularly if they are building a dynamic web site!
> 
> I agree but ONLY if you are not trying to use an existing schema with
> composite primary keys and/or tables with no primary key.  For these
> SQLAlchemy seems to be the current best bet with python, leading to
> the choice of either TurboGears (which I went for) or Pylons (picked
> by Brad).

Well, I wouldn't be that prescriptive - I would say people can just use 
what they feel comfortable with and they can get good results from 
quickly. I've had good experiences with Django so I wouldn't put people 
off it just because of the primary key issue which can be partially 
solved easily enough with a single ALTER TABLE statement :)

Cheers,

Nick.


From biopython at maubp.freeserve.co.uk  Fri Nov 28 17:46:01 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 17:46:01 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <493023CA.9030108@bham.ac.uk>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
	<492FD220.60505@bham.ac.uk>
	<320fb6e00811280355r2c07df9bn9e4af21ce9e40dfc@mail.gmail.com>
	<49301579.2060700@bham.ac.uk>
	<320fb6e00811280857k572c44a5pce0ad35410898dca@mail.gmail.com>
	<493023CA.9030108@bham.ac.uk>
Message-ID: <320fb6e00811280946r391eaac9q8c54ae6a4a59595c@mail.gmail.com>

> Peter wrote:
>> I agree but ONLY if you are not trying to use an existing schema with
>> composite primary keys and/or tables with no primary key.  For these
>> SQLAlchemy seems to be the current best bet with python, leading to
>> the choice of either TurboGears (which I went for) or Pylons (picked
>> by Brad).

Nick wrote:
> Well, I wouldn't be that prescriptive - I would say people can just use what
> they feel comfortable with and they can get good results from quickly. I've
> had good experiences with Django so I wouldn't put people off it just
> because of the primary key issue which can be partially solved easily enough
> with a single ALTER TABLE statement :)

Maybe adding a few surrogate primary keys to tables to make Django (or
your ORM of choice) happy isn't such a big deal.  However, I put a lot
of value on the shared standard nature of the BioSQL schema, and
prefer to modify it as little as possible - not just in case I break
something another software package relies on, but also to reduce long
term maintenance and re-installation hassles.

In your case you had existing experience with Django, while I had no
prior investment in it or any other ORM tool.  I can therefore
understand your choice - and might even have done the same in your
position.

I'm not convinced that the BioSQL schema needs to be changed for
v1.1.x to help ORM software either (surrogate primary keys on all
tables - something mooted on the roadmap).
http://www.biosql.org/wiki/Enhancement_Requests

Peter


From hlapp at gmx.net  Fri Nov 28 18:31:55 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 28 Nov 2008 13:31:55 -0500
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
Message-ID: <C0450178-EE82-4EC8-B664-D3CC14B069AA@gmx.net>


On Nov 28, 2008, at 5:43 AM, Peter wrote:

> Why not this:
>
> CREATE TABLE taxon_name (
>       taxon_id		INT(10) UNSIGNED NOT NULL,
>       name		VARCHAR(255) BINARY NOT NULL,
>       name_class	VARCHAR(32) BINARY NOT NULL,
>       PRIMARY KEY (taxon_id,name,name_class)
> ) TYPE=INNODB;


It's part of the changes planned for the next release indeed. At the  
time this was written it didn't seem to matter much as they are really  
semantically equivalent, and ORM tools weren't around much at the  
time :-)

I do hope that no-one is using a dynamically configuring ORM at run  
time so that this change can be a drop-in replacement that's fully  
backwards compatible.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Fri Nov 28 18:41:52 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 18:41:52 +0000
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <C0450178-EE82-4EC8-B664-D3CC14B069AA@gmx.net>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
	<C0450178-EE82-4EC8-B664-D3CC14B069AA@gmx.net>
Message-ID: <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com>

On Fri, Nov 28, 2008 at 6:31 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Nov 28, 2008, at 5:43 AM, Peter wrote:
>
>> Why not this:
>>
>> CREATE TABLE taxon_name (
>>      taxon_id          INT(10) UNSIGNED NOT NULL,
>>      name              VARCHAR(255) BINARY NOT NULL,
>>      name_class        VARCHAR(32) BINARY NOT NULL,
>>      PRIMARY KEY (taxon_id,name,name_class)
>> ) TYPE=INNODB;
>
>
> It's part of the changes planned for the next release indeed.

By next release, do you mean BioSQL v1.0.2 or v1.1.0 here?

> At the time this was written it didn't seem to matter much as they are really
> semantically equivalent, and ORM tools weren't around much at the time :-)

I see - that kind of explains the reason why some tables have explicit
composite primary keys, while others just have a unique set of fields.

> I do hope that no-one is using a dynamically configuring ORM at run time so
> that this change can be a drop-in replacement that's fully backwards
> compatible.

Some dynamically configuring ORM code would never have coped with
these tables in the first place - so it doesn't matter here.  In other
cases the user can tell the ORM to treat the tuple
(taxon_id,name,name_class) as a primary key - and this should still be
fine even when this is explicit in the database schema.  I expect (and
hope) this will be a backwards compatible change.

Peter


From hlapp at gmx.net  Fri Nov 28 18:46:26 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 28 Nov 2008 13:46:26 -0500
Subject: [BioSQL-l] Python ORM mapping for BioSQL
In-Reply-To: <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com>
References: <20081125211622.GE83220@sobchak.mgh.harvard.edu>
	<320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com>
	<161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net>
	<320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com>
	<C0450178-EE82-4EC8-B664-D3CC14B069AA@gmx.net>
	<320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com>
Message-ID: <20D70002-B512-4EA0-8755-1CF00310ADC6@gmx.net>


On Nov 28, 2008, at 1:41 PM, Peter wrote:

>>
>> It's part of the changes planned for the next release indeed.
>
> By next release, do you mean BioSQL v1.0.2 or v1.1.0 here?


That would be 1.0.2. Otherwise there would be no need to worry about  
backward compatibility (as 1.1x won't be by definition).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Fri Nov 28 18:57:40 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 18:57:40 +0000
Subject: [BioSQL-l] BioSQL and ontology "standards".
Message-ID: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com>

Hi all,

The BioSQL schema allows multiple ontologies, so that things like
entries in seqfeature_qualifier_value can say when they mean by
"locus_tag".

Currently BioPerl and Biopython (and I assume the other projects but
haven't checked) use a couple of ad-hoc ontology names for storing
annotation.  In particular, if there is no predefined entry for a
novel ontology term, it gets added on the fly.  This is very
convenient as it means a BioSQL database can be used without first
importing a predefined ontology.  However there are downsides, for
example spelling errors in the keys of a GenBank file get treated as a
ontology entries.

Have these ad-hoc ontologies ever been defined?  i.e. For table
bioentry_qualifier_value terms, which ad-hoc ontology name should be
used?  Biopython uses ad-hoc ontology named  'SeqFeature Keys',
'SeqFeature Sources', 'Annotation Tags' for various different tables
(which I believe is the same for BioPerl).

On a related point, it might make more sense to use a predefined
ontology, like SOFA or SO from http://www.sequenceontology.org/ where
a novel term is treated as an error (or perhaps falls back on the
ad-hoc ontology).  How do the various Bio* projects cope with
annotations in the database for different or multiple ontologies?  Or
has this not been considered?

Thanks,

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 28 20:04:33 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Nov 2008 20:04:33 +0000
Subject: [BioSQL-l] BioSQL and ontology "standards".
In-Reply-To: <49304392.4080908@eaglegenomics.com>
References: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com>
	<49304392.4080908@eaglegenomics.com>
Message-ID: <320fb6e00811281204i3bae31e4kc18f70121244b4d1@mail.gmail.com>

On Fri, Nov 28, 2008 at 7:16 PM, Richard Holland wrote:
>
> BioJava does what BioPerl does and pretty much makes it up as it goes
> along, using whatever the input files tell it.

OK, good.  But which ontology names do you use for which tables?  i.e.
Do you also use ad-hoc ontologies named  'SeqFeature Keys',
'SeqFeature Sources' and 'Annotation Tags'?

To be a little more specific, here are some examples - which I presume
(hope) are all coping BioPerl's conventions.

In recording a bioentry date, Biopython sets
bioentry_qualifier_value.term_id to point to a term table entry
"date_changed" which belongs to the ad-hoc "Annotation Tags" ontology.

In recording most bioentry annotations (a list of keywords), Biopython
sets bioentry_qualifier_value.term_id to point to a term table entry
for that annotation type (e.g. "keywords") which belongs to the ad-hoc
"Annotation Tags" ontology.

In recording a seqfeature, Biopython sets seqfeature.seqfeature_key_id
to point to a term table entry for that feature type (e.g. "CDS",
"misc_feature", "gene") which belongs to the ad-hoc "SeqFeature Keys"
ontology.  Biopython always sets seqfeature.type_term_id to point to a
term table entry for "EMBL/GenBank/SwissProt" within the ad-hoc
"SeqFeature Sources" ontology.

In recording most of a seqfeature's qualifiers (annotations),
Biopython sets seqfeature_qualifier_value.term_id to point to a term
table entry for the key (e.g. "locus_tag", "note", "translation")
which belongs to the ad-hoc "Annotation Tags" ontology.

Notice that the ad-hoc "Annotation Tags" ontology serves double duty,
doing both bioentry and seqfeature annotations.  This doesn't seem
entirely sensible.

On the other hand, when recording a seqfeature's location Biopython
and BioPerl leave location.term_id as NULL (rather than using any
particular ontology term).  This seems arbitary.

Relating to this, if we want to record a composite location type
(typically "join"), we'd want to use the location_qualifier_value
table.  BioPerl seems to leave this table empty (presumably assuming
all composite locations are joins) which is what Biopython currently
does too.  Here we can't just set location_qualifier_value.term_id as
NULL (why not?) so we have to introduce something.  The BioSQL
projects should first agree what ontology term and what ontology this
should be stored with.

> The trouble with throwing exceptions when things don't meet standards is
> that people complain when their custom files don't work, and can't be
> made to work without editing the file itself. ...

I'm not sure if you are talking about parsing files, or loading them
into BioSQL.  I agree that when parsing sometimes some leeway is
required.

In terms of *optionally* enforcing a strict ontology, throwing an
error is a good thing if the input file doesn't follow the ontology -
this indicates a problem with the file (or perhaps an out of date
ontology).  I would certainly leave the default behaviour as is with
the ad-hoc ontologies extended on the fly.

> I think the best approach is to always to use what the file says, and
> trust that it's accurate. What needs to be agreed between projects is
> any additional annotations that get introduced outside the context of
> file parsing, and the names of the ontologies used for the file
> annotations so that all projects use the same ontologies and don't
> replicate them inside the BioSQL database. It would be nice to
> standardise these names and the additional custom terms across the
> projects, in much the same way as people tried already to standardise
> the way general objects get mapped to BioSQL.

This is what I am trying to get at here - documenting the existing "ad
hoc" ontology usage.  My impression is that it has not been
documented, and that the BioPerl behaviour is the defacto BioSQL
standard.

I'd like to pin down this standard, and extend it for situations like
the location_qualifier_value.term_id and perhaps location.term_id
where BioPerl seems to ignore the ontology issue.

Peter


From holland at eaglegenomics.com  Fri Nov 28 19:16:34 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 28 Nov 2008 19:16:34 +0000
Subject: [BioSQL-l] BioSQL and ontology "standards".
In-Reply-To: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com>
References: <320fb6e00811281057r2d3a1145j3072b6a537112e12@mail.gmail.com>
Message-ID: <49304392.4080908@eaglegenomics.com>

BioJava does what BioPerl does and pretty much makes it up as it goes
along, using whatever the input files tell it.

The trouble with throwing exceptions when things don't meet standards is
that people complain when their custom files don't work, and can't be
made to work without editing the file itself. By custom I mean not only
things they've written themselves, but also files coming from
established tools which don't follow the rules (NEXUS format is a
classic example of this - the most popular tools that output NEXUS
pretty much ignore the format specification). Even the standards
providers themselves often don't comply with their own rules (several
Genbank examples supplied from NCBI/Entrez break any parser which tries
to be completely strict with the declared format).

I think the best approach is to always to use what the file says, and
trust that it's accurate. What needs to be agreed between projects is
any additional annotations that get introduced outside the context of
file parsing, and the names of the ontologies used for the file
annotations so that all projects use the same ontologies and don't
replicate them inside the BioSQL database. It would be nice to
standardise these names and the additional custom terms across the
projects, in much the same way as people tried already to standardise
the way general objects get mapped to BioSQL.

cheers,
Richard

Peter wrote:
> Hi all,
> 
> The BioSQL schema allows multiple ontologies, so that things like
> entries in seqfeature_qualifier_value can say when they mean by
> "locus_tag".
> 
> Currently BioPerl and Biopython (and I assume the other projects but
> haven't checked) use a couple of ad-hoc ontology names for storing
> annotation.  In particular, if there is no predefined entry for a
> novel ontology term, it gets added on the fly.  This is very
> convenient as it means a BioSQL database can be used without first
> importing a predefined ontology.  However there are downsides, for
> example spelling errors in the keys of a GenBank file get treated as a
> ontology entries.
> 
> Have these ad-hoc ontologies ever been defined?  i.e. For table
> bioentry_qualifier_value terms, which ad-hoc ontology name should be
> used?  Biopython uses ad-hoc ontology named  'SeqFeature Keys',
> 'SeqFeature Sources', 'Annotation Tags' for various different tables
> (which I believe is the same for BioPerl).
> 
> On a related point, it might make more sense to use a predefined
> ontology, like SOFA or SO from http://www.sequenceontology.org/ where
> a novel term is treated as an error (or perhaps falls back on the
> ad-hoc ontology).  How do the various Bio* projects cope with
> annotations in the database for different or multiple ontologies?  Or
> has this not been considered?
> 
> Thanks,
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From d.m.a.martin at dundee.ac.uk  Tue Nov 25 11:09:08 2008
From: d.m.a.martin at dundee.ac.uk (David Martin)
Date: Tue, 25 Nov 2008 11:09:08 +0000
Subject: [BioSQL-l] Passwords on biosql databases
Message-ID: <492BDCF2.6F09.00E0.0@dundee.ac.uk>

I have set up a biosql database on Postgres. The Bio::DB::BioDB module croaks complaining that it needs the password. I have tried the obvious things (-password -passwd and reading what docs I could find) but to no avail.
 
Any clues?
 
Assuming the database is on postgres and is called biosql with user biosqluser and password biosqlpassword I have been trying:
 
 my $dbadp = Bio::DB::BioDB->new(-database => 'biosql',
                                -user => 'biosqluser',
                                -dbname => 'biosql',
                                -host => 'postgres',
     -passwd=>'biosqlpassword',
                                -driver => 'Pg');
 
regards
 
..d
 
David Martin PhD
College of Life Sciences
University of Dundee 
The University of Dundee is a Scottish Registered Charity, No. SC015096.

The University of Dundee is a registered Scottish charity, No: SC015096


From chapmanb at 50mail.com  Tue Nov 25 21:16:22 2008
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 25 Nov 2008 16:16:22 -0500
Subject: [BioSQL-l] Python ORM mapping for BioSQL
Message-ID: <20081125211622.GE83220@sobchak.mgh.harvard.edu>

Hi Peter;
Hope all is going well with you. I was glancing at the BioSQL
mailing list archives last night and saw your messages earlier this
month about using an ORM mapper with BioSQL.

Some of my current work is using a BioSQL storage backend with a
javascript web interface. The middleware uses Pylons and SQLAlchemy.
This uses some parts of BioSQL not well represented via an object front
end like bioentry_relationship, and so it has been convenient to work
with these via SQLAlchemy directly.

To your initial question, SQLAlchemy can handle those non-primary key
tables without a problem by setting "primary_key = True" for all of
the unique columns.

What I have done thus far is definitely non-complete, and also
includes some add-on tables for storing experimental data linked to
BioSQL. However, I am attaching it here just to give you an idea
(init.py is the __init__.py of the module). You would use it like:

from Wherever.BioSQL import get_session, biosql

session = get_session("production")
entries = session.query(biosql.Bioentry).filter_by(identifier = "A12345")

If you, or anyone else, is developing something similar,
I'd be happy to help with something generalized.

Brad
-------------- next part --------------
A non-text attachment was scrubbed...
Name: init.py
Type: text/x-python
Size: 1116 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biosql-l/attachments/20081125/f4daa3a6/attachment-0004.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BioSQL.py
Type: text/x-python
Size: 6881 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biosql-l/attachments/20081125/f4daa3a6/attachment-0005.py>