From hlapp at gmx.net  Wed Jun  6 20:45:14 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 6 Jun 2007 20:45:14 -0400
Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db
Message-ID: <DC1816E2-68C0-400C-A777-F6D14DE2B870@gmx.net>

I have added support to BioSQL and bioperl-db for schemas in  
PostgreSQL. A schema in PostgreSQL is more or less a namespace for  
database objects (tables, indexes, views, etc) within a database.

(A database in PostgreSQL is similar to the concept of a user in  
Oracle or MySQL, and therefore for the latter two schemas are  
synonymous with a user. [Not sure I'm still up-to-date on this for  
MySQL, but at least that's what I recall.])

When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts,  
you specify the schema in which BioSQL resides using the --schema  
option.

If you are using bioperl-db as a library, the Bio::DB::BioDB->new()  
call also accepts a -schema named parameter, and Bio::DB::DBContextI  
objects have a $dbc->schema() property for getting/setting the  
schema, Bio::DB::SimpleDBContext->new() accepts a -schema parameter,  
and you may also add the property to the .bioperldb connection  
parameter file (-schema => 'yourschemahere').

Thanks for Brian Osborne for being the instigator (and tester, and  
for adding the code to load_ncbi_taxonomy.pl - I came too late).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed Jun  6 22:44:55 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 6 Jun 2007 22:44:55 -0400
Subject: [BioSQL-l] Phylogeny module
Message-ID: <3A264479-2FD9-407B-BFB4-9CB78188CDA6@gmx.net>

(for some reason I forgot to post this earlier - apologies)

I committed the phylogeny module a couple of weeks ago that Bill Piel  
and I created at Phyloinformatics Hackathon (http:// 
hackathon.nescent.org) in December (biosql-phylodb-pg.sql).

This is an optional module - BioSQL will work perfectly well without  
it. (Unless - surprise - you want to store phylogenetic trees.)

Right now there is only a PostgreSQL version, but Jamie Estill, a  
student in our Google Summer of Code program, has created a MySQL  
version that he or I will commit too.

I've now also added comments and made a few rather small changes to  
the module's schema since the initial revision:

	- widened width of of tree.identifier to 32 chars
	- added column tree.is_rooted of boolean type
	- renamed column node.gene_id to node.bioentry_id

If anyone was using this module already, here's the migration script  
in PostgreSQL:

ALTER TABLE tree ALTER COLUMN identifier TYPE VARCHAR(32);
ALTER TABLE tree ADD COLUMN is_rooted BOOLEAN DEFAULT TRUE;
ALTER TABLE node RENAME COLUMN gene_id TO bioentry_id;

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From holland at ebi.ac.uk  Thu Jun  7 03:33:25 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Thu, 07 Jun 2007 08:33:25 +0100
Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db
In-Reply-To: <DC1816E2-68C0-400C-A777-F6D14DE2B870@gmx.net>
References: <DC1816E2-68C0-400C-A777-F6D14DE2B870@gmx.net>
Message-ID: <4667B4C5.6070107@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sounds great.

BioJava users shouldn't need to change anything to get this to work as
PostgreSQL JDBC connection objects already require you to specify a schema.

cheers,
Richard


Hilmar Lapp wrote:
> I have added support to BioSQL and bioperl-db for schemas in PostgreSQL.
> A schema in PostgreSQL is more or less a namespace for database objects
> (tables, indexes, views, etc) within a database.
> 
> (A database in PostgreSQL is similar to the concept of a user in Oracle
> or MySQL, and therefore for the latter two schemas are synonymous with a
> user. [Not sure I'm still up-to-date on this for MySQL, but at least
> that's what I recall.])
> 
> When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts, you
> specify the schema in which BioSQL resides using the --schema option.
> 
> If you are using bioperl-db as a library, the Bio::DB::BioDB->new() call
> also accepts a -schema named parameter, and Bio::DB::DBContextI objects
> have a $dbc->schema() property for getting/setting the schema,
> Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and you may
> also add the property to the .bioperldb connection parameter file
> (-schema => 'yourschemahere').
> 
> Thanks for Brian Osborne for being the instigator (and tester, and for
> adding the code to load_ncbi_taxonomy.pl - I came too late).
> 
>     -hilmar
> --===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGZ7TF4C5LeMEKA/QRApwUAJ48q46iX152pB6Xcc/717Ie8foUTQCgm3ij
W/+0iO/ZsNDn1pLuf5yXbYA=
=asUn
-----END PGP SIGNATURE-----

From hlapp at gmx.net  Thu Jun  7 07:52:41 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 7 Jun 2007 07:52:41 -0400
Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db
In-Reply-To: <4667B4C5.6070107@ebi.ac.uk>
References: <DC1816E2-68C0-400C-A777-F6D14DE2B870@gmx.net>
	<4667B4C5.6070107@ebi.ac.uk>
Message-ID: <A33CC2FD-897C-4D13-8733-0F0D5BB50927@gmx.net>

I guess I'm behind the curve here a bit - schemas are optional in  
Postgres - if you say JDBC connection objects require a schema, does  
that mean it may also be null or empty?

	-hilmar

On Jun 7, 2007, at 3:33 AM, Richard Holland wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Sounds great.
>
> BioJava users shouldn't need to change anything to get this to work as
> PostgreSQL JDBC connection objects already require you to specify a  
> schema.
>
> cheers,
> Richard
>
>
> Hilmar Lapp wrote:
>> I have added support to BioSQL and bioperl-db for schemas in  
>> PostgreSQL.
>> A schema in PostgreSQL is more or less a namespace for database  
>> objects
>> (tables, indexes, views, etc) within a database.
>>
>> (A database in PostgreSQL is similar to the concept of a user in  
>> Oracle
>> or MySQL, and therefore for the latter two schemas are synonymous  
>> with a
>> user. [Not sure I'm still up-to-date on this for MySQL, but at least
>> that's what I recall.])
>>
>> When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl  
>> scripts, you
>> specify the schema in which BioSQL resides using the --schema option.
>>
>> If you are using bioperl-db as a library, the Bio::DB::BioDB->new 
>> () call
>> also accepts a -schema named parameter, and Bio::DB::DBContextI  
>> objects
>> have a $dbc->schema() property for getting/setting the schema,
>> Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and  
>> you may
>> also add the property to the .bioperldb connection parameter file
>> (-schema => 'yourschemahere').
>>
>> Thanks for Brian Osborne for being the instigator (and tester, and  
>> for
>> adding the code to load_ncbi_taxonomy.pl - I came too late).
>>
>>     -hilmar
>> --===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGZ7TF4C5LeMEKA/QRApwUAJ48q46iX152pB6Xcc/717Ie8foUTQCgm3ij
> W/+0iO/ZsNDn1pLuf5yXbYA=
> =asUn
> -----END PGP SIGNATURE-----

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From holland at ebi.ac.uk  Thu Jun  7 08:22:11 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Thu, 07 Jun 2007 13:22:11 +0100
Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db
In-Reply-To: <A33CC2FD-897C-4D13-8733-0F0D5BB50927@gmx.net>
References: <DC1816E2-68C0-400C-A777-F6D14DE2B870@gmx.net>
	<4667B4C5.6070107@ebi.ac.uk>
	<A33CC2FD-897C-4D13-8733-0F0D5BB50927@gmx.net>
Message-ID: <4667F873.3080103@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

When I said JDBC, what I really meant to say was Hibernate... Hibernate
controls the mapping between BioJava and BioSQL via a set of mapping
files and a connection parameters file (hibernate.cfg.xml), the latter
of which is what I was referring to.

Hibernate will use public if you don't specify a schema in the
connection parameters file. If you want to use something else, do this
in your connection parameters file:

   <property name="default_schema">biosql</property>

(changing biosql to whatever your schema happens to be).

cheers,
Richard

Hilmar Lapp wrote:
> I guess I'm behind the curve here a bit - schemas are optional in
> Postgres - if you say JDBC connection objects require a schema, does
> that mean it may also be null or empty?
> 
>     -hilmar
> 
> On Jun 7, 2007, at 3:33 AM, Richard Holland wrote:
> 
> Sounds great.
> 
> BioJava users shouldn't need to change anything to get this to work as
> PostgreSQL JDBC connection objects already require you to specify a
> schema.
> 
> cheers,
> Richard
> 
> 
> Hilmar Lapp wrote:
>>>> I have added support to BioSQL and bioperl-db for schemas in PostgreSQL.
>>>> A schema in PostgreSQL is more or less a namespace for database objects
>>>> (tables, indexes, views, etc) within a database.
>>>>
>>>> (A database in PostgreSQL is similar to the concept of a user in Oracle
>>>> or MySQL, and therefore for the latter two schemas are synonymous with a
>>>> user. [Not sure I'm still up-to-date on this for MySQL, but at least
>>>> that's what I recall.])
>>>>
>>>> When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts, you
>>>> specify the schema in which BioSQL resides using the --schema option.
>>>>
>>>> If you are using bioperl-db as a library, the Bio::DB::BioDB->new() call
>>>> also accepts a -schema named parameter, and Bio::DB::DBContextI objects
>>>> have a $dbc->schema() property for getting/setting the schema,
>>>> Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and you may
>>>> also add the property to the .bioperldb connection parameter file
>>>> (-schema => 'yourschemahere').
>>>>
>>>> Thanks for Brian Osborne for being the instigator (and tester, and for
>>>> adding the code to load_ncbi_taxonomy.pl - I came too late).
>>>>
>>>>     -hilmar
>>>> --===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>

> --===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGZ/hz4C5LeMEKA/QRAhwRAKCX1kNyn0UdknpyRjQr82jYe4Z6bgCeKMGl
/94ZBeUaNd4t+T5B7333b/4=
=wQL0
-----END PGP SIGNATURE-----

From hlapp at gmx.net  Thu Jun  7 20:06:21 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 7 Jun 2007 20:06:21 -0400
Subject: [BioSQL-l] adding a namespace for trees
Message-ID: <5F94A19C-D3F0-468A-AEFD-971D58495CFC@gmx.net>

We're doing some work for a small demonstration project here and we  
find that phylogenetic trees are data objects in their own rights,  
and in fact are often identifiable and come from a database, for  
example if they are from TreeBASE.

So I needed to add a namespace for trees in the form of a foreign key  
to biodatabase. Since namespaces that can't be relied upon are of  
little use, the foreign key is required, making this a fairly  
significant change.

Any thoughts or comments are welcome.

If anyone is using the phylogeny module already (the BioSQL core is  
completely unaffected by this), here's the migration path:

INSERT INTO biodatabase (name, description)
VALUES ('biosql_phylo','Default namespace for phylogenetic trees.');
ALTER TABLE tree ADD COLUMN biodatabase_id INTEGER;
UPDATE tree SET biodatabase_id = (
	SELECT biodatabase_id FROM biodatabase WHERE name = 'biosql_phylo'
);
ALTER TABLE tree ALTER COLUMN biodatabase SET NOT NULL;
ALTER TABLE tree ADD CONSTRAINT FKbiodatabase
        FOREIGN KEY (biodatabase_id) REFERENCES biodatabase  
(biodatabase_id);

Cheers,

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun Jun 10 10:41:11 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 10 Jun 2007 10:41:11 -0400
Subject: [BioSQL-l] Phylodb: unique key constraint on tree
Message-ID: <1A8148DB-D44B-4A1A-BC9E-EB6318F36EFD@gmx.net>

Hi all -

the unique key constraint on tree has been the name. With the  
addition of a mandatory namespace for trees, this doesn't really make  
sense to keep that way.

Instead, I propose to change this to names having to be unique only  
within a namespace. I.e., the unique key constraint would be on  
(name, biodatabase_id).

Let me know if you have any comments, suggestions, or concerns.

As we are starting to use the module with real data, there are likely  
going to be a few more changes to the schema. For those who are using  
the schema already, feel free to wait it out until the module  
stabilizes, and I will also try to provide a migration path whenever  
possible. Feel free to apply these immediately, or to accumulate them.

The migration path for this change is:

-- this assumes the default naming scheme for constraints used by  
PostgreSQL
ALTER TABLE tree DROP CONSTRAINT tree_name_key;
-- let's move towards named constraints to avoid having to rely on  
whatever
-- naming scheme an RDBMS employs (which may change anyway)
ALTER TABLE tree ADD CONSTRAINT tree_c1 UNIQUE (name, biodatabase_id);

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon Jun 11 07:30:24 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 11 Jun 2007 07:30:24 -0400
Subject: [BioSQL-l] script to load ITIS taxonomy
Message-ID: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net>

Hi all -

I added a script to load the ITIS taxonomy (www.itis.gov) into the  
phylodb module. It is called load_itis_taxonomy.pl and is in the  
scripts/ directory.

It is independent of BioPerl right now (the ITIS download is either a  
MS SQL Server or an Informix dump - no kidding), but I'm hoping that  
at some point support for this can be integrated into Bio::TreeIO.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon Jun 11 08:24:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Jun 2007 07:24:50 -0500
Subject: [BioSQL-l] [Bioperl-l] script to load ITIS taxonomy
In-Reply-To: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net>
References: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net>
Message-ID: <99AC6C0F-10DD-4587-AFB3-32BC495CD2BD@uiuc.edu>


On Jun 11, 2007, at 6:30 AM, Hilmar Lapp wrote:

> Hi all -
>
> I added a script to load the ITIS taxonomy (www.itis.gov) into the
> phylodb module. It is called load_itis_taxonomy.pl and is in the
> scripts/ directory.
>
> It is independent of BioPerl right now (the ITIS download is either a
> MS SQL Server or an Informix dump - no kidding), but I'm hoping that
> at some point support for this can be integrated into Bio::TreeIO.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================

I second the TreeIO support.  Anyone up for it?

chris

From hlapp at gmx.net  Mon Jun 11 20:04:24 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 11 Jun 2007 20:04:24 -0400
Subject: [BioSQL-l] index changes on the phylodb module
Message-ID: <DEAD4D0C-3A62-4C03-B713-C933D7766236@gmx.net>

It turns out that ITIS has duplicate node labels (i.e., taxon names)  
in their taxonomy (they don't all have the same validity attribute  
though). I suppose that many other data providers for trees won't  
satisfy this constraint either, so I propose to remove it by default.  
I'll leave it in as a commented out configuration option.

I also needed to add more indexes to efficiently support some  
queries, especially those needed in precomputing the optimization  
structures for trees.

The migration path is:

-- using the default naming scheme for Pg:
ALTER TABLE node DROP CONSTRAINT node_label_key;
-- simple index on label to support searching nodes by label
CREATE INDEX node_i1 ON node (label);

-- other indexes needed for better query performance:
CREATE INDEX node_i2 ON node (tree_id);
CREATE INDEX edge_i1 ON edge (parent_node_id);
CREATE INDEX node_path_i1 ON node_path (parent_node_id);

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From holland at ebi.ac.uk  Wed Jun 13 11:15:48 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Wed, 13 Jun 2007 16:15:48 +0100
Subject: [BioSQL-l] BioJava 1.5 Released
Message-ID: <46700A24.4040305@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all.

BioJava 1.5 has been released and is available for download from our
website at http://biojava.org/

Thanks to everyone who has made contributions, and in particular to
those who have spent many hours testing our new file parsers with every
combination of scenarios under the sun.

In addition to numerous bugfixes and enhancements, the highlights of
this release are brand new parsers for the most common file formats
(GenBank, Fasta, etc.), and a brand new BioSQL persistence layer that
uses Hibernate to interact with sequence databases. There is also a new
set of classes for creating genetic algorithms.

These are all part of the new org.biojavax package which represents
extensions to BioJava that would not fit easily into the existing
package structure. The classes in org.biojavax mostly extend and improve
on existing classes which could not be removed or replaced in order to
maintain compatibility with older code.

As usual if anyone finds any bugs in this release, please do report them
to us using the BugZilla tool at http://bugzilla.open-bio.org/

Please also note that this will be the last release of BioJava that will
be able to compile and run on Java 1.4. The next release (1.6) will move
at least to Java 5 or maybe straight to Java 6 (decision not yet made).

cheers,
Richard
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD4DBQFGcAoj4C5LeMEKA/QRAvZiAJjhHGWvq5nrj8aanmUtCpA8U8dpAJ0bsxzy
tv5LVdSEtAuA7gp12nLMCA==
=/Wbu
-----END PGP SIGNATURE-----

From hlapp at gmx.net  Fri Jun 22 08:49:37 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 22 Jun 2007 08:49:37 -0400
Subject: [BioSQL-l] phylodb ERD
Message-ID: <BC978837-D92F-4DEB-B60B-AB2589448FF0@gmx.net>

FYI, I committed an OmniGraffle and PDF version of an ERD for the  
BioSQL phylodb module. They are in the doc/ directory.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Thu Jun  7 00:45:14 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 6 Jun 2007 20:45:14 -0400
Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db
Message-ID: <DC1816E2-68C0-400C-A777-F6D14DE2B870@gmx.net>

I have added support to BioSQL and bioperl-db for schemas in  
PostgreSQL. A schema in PostgreSQL is more or less a namespace for  
database objects (tables, indexes, views, etc) within a database.

(A database in PostgreSQL is similar to the concept of a user in  
Oracle or MySQL, and therefore for the latter two schemas are  
synonymous with a user. [Not sure I'm still up-to-date on this for  
MySQL, but at least that's what I recall.])

When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts,  
you specify the schema in which BioSQL resides using the --schema  
option.

If you are using bioperl-db as a library, the Bio::DB::BioDB->new()  
call also accepts a -schema named parameter, and Bio::DB::DBContextI  
objects have a $dbc->schema() property for getting/setting the  
schema, Bio::DB::SimpleDBContext->new() accepts a -schema parameter,  
and you may also add the property to the .bioperldb connection  
parameter file (-schema => 'yourschemahere').

Thanks for Brian Osborne for being the instigator (and tester, and  
for adding the code to load_ncbi_taxonomy.pl - I came too late).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Thu Jun  7 02:44:55 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 6 Jun 2007 22:44:55 -0400
Subject: [BioSQL-l] Phylogeny module
Message-ID: <3A264479-2FD9-407B-BFB4-9CB78188CDA6@gmx.net>

(for some reason I forgot to post this earlier - apologies)

I committed the phylogeny module a couple of weeks ago that Bill Piel  
and I created at Phyloinformatics Hackathon (http:// 
hackathon.nescent.org) in December (biosql-phylodb-pg.sql).

This is an optional module - BioSQL will work perfectly well without  
it. (Unless - surprise - you want to store phylogenetic trees.)

Right now there is only a PostgreSQL version, but Jamie Estill, a  
student in our Google Summer of Code program, has created a MySQL  
version that he or I will commit too.

I've now also added comments and made a few rather small changes to  
the module's schema since the initial revision:

	- widened width of of tree.identifier to 32 chars
	- added column tree.is_rooted of boolean type
	- renamed column node.gene_id to node.bioentry_id

If anyone was using this module already, here's the migration script  
in PostgreSQL:

ALTER TABLE tree ALTER COLUMN identifier TYPE VARCHAR(32);
ALTER TABLE tree ADD COLUMN is_rooted BOOLEAN DEFAULT TRUE;
ALTER TABLE node RENAME COLUMN gene_id TO bioentry_id;

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From holland at ebi.ac.uk  Thu Jun  7 07:33:25 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Thu, 07 Jun 2007 08:33:25 +0100
Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db
In-Reply-To: <DC1816E2-68C0-400C-A777-F6D14DE2B870@gmx.net>
References: <DC1816E2-68C0-400C-A777-F6D14DE2B870@gmx.net>
Message-ID: <4667B4C5.6070107@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sounds great.

BioJava users shouldn't need to change anything to get this to work as
PostgreSQL JDBC connection objects already require you to specify a schema.

cheers,
Richard


Hilmar Lapp wrote:
> I have added support to BioSQL and bioperl-db for schemas in PostgreSQL.
> A schema in PostgreSQL is more or less a namespace for database objects
> (tables, indexes, views, etc) within a database.
> 
> (A database in PostgreSQL is similar to the concept of a user in Oracle
> or MySQL, and therefore for the latter two schemas are synonymous with a
> user. [Not sure I'm still up-to-date on this for MySQL, but at least
> that's what I recall.])
> 
> When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts, you
> specify the schema in which BioSQL resides using the --schema option.
> 
> If you are using bioperl-db as a library, the Bio::DB::BioDB->new() call
> also accepts a -schema named parameter, and Bio::DB::DBContextI objects
> have a $dbc->schema() property for getting/setting the schema,
> Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and you may
> also add the property to the .bioperldb connection parameter file
> (-schema => 'yourschemahere').
> 
> Thanks for Brian Osborne for being the instigator (and tester, and for
> adding the code to load_ncbi_taxonomy.pl - I came too late).
> 
>     -hilmar
> --===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGZ7TF4C5LeMEKA/QRApwUAJ48q46iX152pB6Xcc/717Ie8foUTQCgm3ij
W/+0iO/ZsNDn1pLuf5yXbYA=
=asUn
-----END PGP SIGNATURE-----


From hlapp at gmx.net  Thu Jun  7 11:52:41 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 7 Jun 2007 07:52:41 -0400
Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db
In-Reply-To: <4667B4C5.6070107@ebi.ac.uk>
References: <DC1816E2-68C0-400C-A777-F6D14DE2B870@gmx.net>
	<4667B4C5.6070107@ebi.ac.uk>
Message-ID: <A33CC2FD-897C-4D13-8733-0F0D5BB50927@gmx.net>

I guess I'm behind the curve here a bit - schemas are optional in  
Postgres - if you say JDBC connection objects require a schema, does  
that mean it may also be null or empty?

	-hilmar

On Jun 7, 2007, at 3:33 AM, Richard Holland wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Sounds great.
>
> BioJava users shouldn't need to change anything to get this to work as
> PostgreSQL JDBC connection objects already require you to specify a  
> schema.
>
> cheers,
> Richard
>
>
> Hilmar Lapp wrote:
>> I have added support to BioSQL and bioperl-db for schemas in  
>> PostgreSQL.
>> A schema in PostgreSQL is more or less a namespace for database  
>> objects
>> (tables, indexes, views, etc) within a database.
>>
>> (A database in PostgreSQL is similar to the concept of a user in  
>> Oracle
>> or MySQL, and therefore for the latter two schemas are synonymous  
>> with a
>> user. [Not sure I'm still up-to-date on this for MySQL, but at least
>> that's what I recall.])
>>
>> When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl  
>> scripts, you
>> specify the schema in which BioSQL resides using the --schema option.
>>
>> If you are using bioperl-db as a library, the Bio::DB::BioDB->new 
>> () call
>> also accepts a -schema named parameter, and Bio::DB::DBContextI  
>> objects
>> have a $dbc->schema() property for getting/setting the schema,
>> Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and  
>> you may
>> also add the property to the .bioperldb connection parameter file
>> (-schema => 'yourschemahere').
>>
>> Thanks for Brian Osborne for being the instigator (and tester, and  
>> for
>> adding the code to load_ncbi_taxonomy.pl - I came too late).
>>
>>     -hilmar
>> --===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGZ7TF4C5LeMEKA/QRApwUAJ48q46iX152pB6Xcc/717Ie8foUTQCgm3ij
> W/+0iO/ZsNDn1pLuf5yXbYA=
> =asUn
> -----END PGP SIGNATURE-----

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From holland at ebi.ac.uk  Thu Jun  7 12:22:11 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Thu, 07 Jun 2007 13:22:11 +0100
Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db
In-Reply-To: <A33CC2FD-897C-4D13-8733-0F0D5BB50927@gmx.net>
References: <DC1816E2-68C0-400C-A777-F6D14DE2B870@gmx.net>
	<4667B4C5.6070107@ebi.ac.uk>
	<A33CC2FD-897C-4D13-8733-0F0D5BB50927@gmx.net>
Message-ID: <4667F873.3080103@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

When I said JDBC, what I really meant to say was Hibernate... Hibernate
controls the mapping between BioJava and BioSQL via a set of mapping
files and a connection parameters file (hibernate.cfg.xml), the latter
of which is what I was referring to.

Hibernate will use public if you don't specify a schema in the
connection parameters file. If you want to use something else, do this
in your connection parameters file:

   <property name="default_schema">biosql</property>

(changing biosql to whatever your schema happens to be).

cheers,
Richard

Hilmar Lapp wrote:
> I guess I'm behind the curve here a bit - schemas are optional in
> Postgres - if you say JDBC connection objects require a schema, does
> that mean it may also be null or empty?
> 
>     -hilmar
> 
> On Jun 7, 2007, at 3:33 AM, Richard Holland wrote:
> 
> Sounds great.
> 
> BioJava users shouldn't need to change anything to get this to work as
> PostgreSQL JDBC connection objects already require you to specify a
> schema.
> 
> cheers,
> Richard
> 
> 
> Hilmar Lapp wrote:
>>>> I have added support to BioSQL and bioperl-db for schemas in PostgreSQL.
>>>> A schema in PostgreSQL is more or less a namespace for database objects
>>>> (tables, indexes, views, etc) within a database.
>>>>
>>>> (A database in PostgreSQL is similar to the concept of a user in Oracle
>>>> or MySQL, and therefore for the latter two schemas are synonymous with a
>>>> user. [Not sure I'm still up-to-date on this for MySQL, but at least
>>>> that's what I recall.])
>>>>
>>>> When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts, you
>>>> specify the schema in which BioSQL resides using the --schema option.
>>>>
>>>> If you are using bioperl-db as a library, the Bio::DB::BioDB->new() call
>>>> also accepts a -schema named parameter, and Bio::DB::DBContextI objects
>>>> have a $dbc->schema() property for getting/setting the schema,
>>>> Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and you may
>>>> also add the property to the .bioperldb connection parameter file
>>>> (-schema => 'yourschemahere').
>>>>
>>>> Thanks for Brian Osborne for being the instigator (and tester, and for
>>>> adding the code to load_ncbi_taxonomy.pl - I came too late).
>>>>
>>>>     -hilmar
>>>> --===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>

> --===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGZ/hz4C5LeMEKA/QRAhwRAKCX1kNyn0UdknpyRjQr82jYe4Z6bgCeKMGl
/94ZBeUaNd4t+T5B7333b/4=
=wQL0
-----END PGP SIGNATURE-----


From hlapp at gmx.net  Fri Jun  8 00:06:21 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 7 Jun 2007 20:06:21 -0400
Subject: [BioSQL-l] adding a namespace for trees
Message-ID: <5F94A19C-D3F0-468A-AEFD-971D58495CFC@gmx.net>

We're doing some work for a small demonstration project here and we  
find that phylogenetic trees are data objects in their own rights,  
and in fact are often identifiable and come from a database, for  
example if they are from TreeBASE.

So I needed to add a namespace for trees in the form of a foreign key  
to biodatabase. Since namespaces that can't be relied upon are of  
little use, the foreign key is required, making this a fairly  
significant change.

Any thoughts or comments are welcome.

If anyone is using the phylogeny module already (the BioSQL core is  
completely unaffected by this), here's the migration path:

INSERT INTO biodatabase (name, description)
VALUES ('biosql_phylo','Default namespace for phylogenetic trees.');
ALTER TABLE tree ADD COLUMN biodatabase_id INTEGER;
UPDATE tree SET biodatabase_id = (
	SELECT biodatabase_id FROM biodatabase WHERE name = 'biosql_phylo'
);
ALTER TABLE tree ALTER COLUMN biodatabase SET NOT NULL;
ALTER TABLE tree ADD CONSTRAINT FKbiodatabase
        FOREIGN KEY (biodatabase_id) REFERENCES biodatabase  
(biodatabase_id);

Cheers,

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun Jun 10 14:41:11 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 10 Jun 2007 10:41:11 -0400
Subject: [BioSQL-l] Phylodb: unique key constraint on tree
Message-ID: <1A8148DB-D44B-4A1A-BC9E-EB6318F36EFD@gmx.net>

Hi all -

the unique key constraint on tree has been the name. With the  
addition of a mandatory namespace for trees, this doesn't really make  
sense to keep that way.

Instead, I propose to change this to names having to be unique only  
within a namespace. I.e., the unique key constraint would be on  
(name, biodatabase_id).

Let me know if you have any comments, suggestions, or concerns.

As we are starting to use the module with real data, there are likely  
going to be a few more changes to the schema. For those who are using  
the schema already, feel free to wait it out until the module  
stabilizes, and I will also try to provide a migration path whenever  
possible. Feel free to apply these immediately, or to accumulate them.

The migration path for this change is:

-- this assumes the default naming scheme for constraints used by  
PostgreSQL
ALTER TABLE tree DROP CONSTRAINT tree_name_key;
-- let's move towards named constraints to avoid having to rely on  
whatever
-- naming scheme an RDBMS employs (which may change anyway)
ALTER TABLE tree ADD CONSTRAINT tree_c1 UNIQUE (name, biodatabase_id);

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon Jun 11 11:30:24 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 11 Jun 2007 07:30:24 -0400
Subject: [BioSQL-l] script to load ITIS taxonomy
Message-ID: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net>

Hi all -

I added a script to load the ITIS taxonomy (www.itis.gov) into the  
phylodb module. It is called load_itis_taxonomy.pl and is in the  
scripts/ directory.

It is independent of BioPerl right now (the ITIS download is either a  
MS SQL Server or an Informix dump - no kidding), but I'm hoping that  
at some point support for this can be integrated into Bio::TreeIO.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon Jun 11 12:24:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Jun 2007 07:24:50 -0500
Subject: [BioSQL-l] [Bioperl-l] script to load ITIS taxonomy
In-Reply-To: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net>
References: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net>
Message-ID: <99AC6C0F-10DD-4587-AFB3-32BC495CD2BD@uiuc.edu>


On Jun 11, 2007, at 6:30 AM, Hilmar Lapp wrote:

> Hi all -
>
> I added a script to load the ITIS taxonomy (www.itis.gov) into the
> phylodb module. It is called load_itis_taxonomy.pl and is in the
> scripts/ directory.
>
> It is independent of BioPerl right now (the ITIS download is either a
> MS SQL Server or an Informix dump - no kidding), but I'm hoping that
> at some point support for this can be integrated into Bio::TreeIO.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================

I second the TreeIO support.  Anyone up for it?

chris


From hlapp at gmx.net  Tue Jun 12 00:04:24 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 11 Jun 2007 20:04:24 -0400
Subject: [BioSQL-l] index changes on the phylodb module
Message-ID: <DEAD4D0C-3A62-4C03-B713-C933D7766236@gmx.net>

It turns out that ITIS has duplicate node labels (i.e., taxon names)  
in their taxonomy (they don't all have the same validity attribute  
though). I suppose that many other data providers for trees won't  
satisfy this constraint either, so I propose to remove it by default.  
I'll leave it in as a commented out configuration option.

I also needed to add more indexes to efficiently support some  
queries, especially those needed in precomputing the optimization  
structures for trees.

The migration path is:

-- using the default naming scheme for Pg:
ALTER TABLE node DROP CONSTRAINT node_label_key;
-- simple index on label to support searching nodes by label
CREATE INDEX node_i1 ON node (label);

-- other indexes needed for better query performance:
CREATE INDEX node_i2 ON node (tree_id);
CREATE INDEX edge_i1 ON edge (parent_node_id);
CREATE INDEX node_path_i1 ON node_path (parent_node_id);

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From holland at ebi.ac.uk  Wed Jun 13 15:15:48 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Wed, 13 Jun 2007 16:15:48 +0100
Subject: [BioSQL-l] BioJava 1.5 Released
Message-ID: <46700A24.4040305@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all.

BioJava 1.5 has been released and is available for download from our
website at http://biojava.org/

Thanks to everyone who has made contributions, and in particular to
those who have spent many hours testing our new file parsers with every
combination of scenarios under the sun.

In addition to numerous bugfixes and enhancements, the highlights of
this release are brand new parsers for the most common file formats
(GenBank, Fasta, etc.), and a brand new BioSQL persistence layer that
uses Hibernate to interact with sequence databases. There is also a new
set of classes for creating genetic algorithms.

These are all part of the new org.biojavax package which represents
extensions to BioJava that would not fit easily into the existing
package structure. The classes in org.biojavax mostly extend and improve
on existing classes which could not be removed or replaced in order to
maintain compatibility with older code.

As usual if anyone finds any bugs in this release, please do report them
to us using the BugZilla tool at http://bugzilla.open-bio.org/

Please also note that this will be the last release of BioJava that will
be able to compile and run on Java 1.4. The next release (1.6) will move
at least to Java 5 or maybe straight to Java 6 (decision not yet made).

cheers,
Richard
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD4DBQFGcAoj4C5LeMEKA/QRAvZiAJjhHGWvq5nrj8aanmUtCpA8U8dpAJ0bsxzy
tv5LVdSEtAuA7gp12nLMCA==
=/Wbu
-----END PGP SIGNATURE-----


From hlapp at gmx.net  Fri Jun 22 12:49:37 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 22 Jun 2007 08:49:37 -0400
Subject: [BioSQL-l] phylodb ERD
Message-ID: <BC978837-D92F-4DEB-B60B-AB2589448FF0@gmx.net>

FYI, I committed an OmniGraffle and PDF version of an ERD for the  
BioSQL phylodb module. They are in the doc/ directory.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================