From hlapp at gmx.net Wed Jun 6 20:45:14 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Jun 2007 20:45:14 -0400 Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db Message-ID: I have added support to BioSQL and bioperl-db for schemas in PostgreSQL. A schema in PostgreSQL is more or less a namespace for database objects (tables, indexes, views, etc) within a database. (A database in PostgreSQL is similar to the concept of a user in Oracle or MySQL, and therefore for the latter two schemas are synonymous with a user. [Not sure I'm still up-to-date on this for MySQL, but at least that's what I recall.]) When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts, you specify the schema in which BioSQL resides using the --schema option. If you are using bioperl-db as a library, the Bio::DB::BioDB->new() call also accepts a -schema named parameter, and Bio::DB::DBContextI objects have a $dbc->schema() property for getting/setting the schema, Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and you may also add the property to the .bioperldb connection parameter file (-schema => 'yourschemahere'). Thanks for Brian Osborne for being the instigator (and tester, and for adding the code to load_ncbi_taxonomy.pl - I came too late). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Jun 6 22:44:55 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Jun 2007 22:44:55 -0400 Subject: [BioSQL-l] Phylogeny module Message-ID: <3A264479-2FD9-407B-BFB4-9CB78188CDA6@gmx.net> (for some reason I forgot to post this earlier - apologies) I committed the phylogeny module a couple of weeks ago that Bill Piel and I created at Phyloinformatics Hackathon (http:// hackathon.nescent.org) in December (biosql-phylodb-pg.sql). This is an optional module - BioSQL will work perfectly well without it. (Unless - surprise - you want to store phylogenetic trees.) Right now there is only a PostgreSQL version, but Jamie Estill, a student in our Google Summer of Code program, has created a MySQL version that he or I will commit too. I've now also added comments and made a few rather small changes to the module's schema since the initial revision: - widened width of of tree.identifier to 32 chars - added column tree.is_rooted of boolean type - renamed column node.gene_id to node.bioentry_id If anyone was using this module already, here's the migration script in PostgreSQL: ALTER TABLE tree ALTER COLUMN identifier TYPE VARCHAR(32); ALTER TABLE tree ADD COLUMN is_rooted BOOLEAN DEFAULT TRUE; ALTER TABLE node RENAME COLUMN gene_id TO bioentry_id; -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at ebi.ac.uk Thu Jun 7 03:33:25 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 07 Jun 2007 08:33:25 +0100 Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db In-Reply-To: References: Message-ID: <4667B4C5.6070107@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sounds great. BioJava users shouldn't need to change anything to get this to work as PostgreSQL JDBC connection objects already require you to specify a schema. cheers, Richard Hilmar Lapp wrote: > I have added support to BioSQL and bioperl-db for schemas in PostgreSQL. > A schema in PostgreSQL is more or less a namespace for database objects > (tables, indexes, views, etc) within a database. > > (A database in PostgreSQL is similar to the concept of a user in Oracle > or MySQL, and therefore for the latter two schemas are synonymous with a > user. [Not sure I'm still up-to-date on this for MySQL, but at least > that's what I recall.]) > > When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts, you > specify the schema in which BioSQL resides using the --schema option. > > If you are using bioperl-db as a library, the Bio::DB::BioDB->new() call > also accepts a -schema named parameter, and Bio::DB::DBContextI objects > have a $dbc->schema() property for getting/setting the schema, > Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and you may > also add the property to the .bioperldb connection parameter file > (-schema => 'yourschemahere'). > > Thanks for Brian Osborne for being the instigator (and tester, and for > adding the code to load_ncbi_taxonomy.pl - I came too late). > > -hilmar > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGZ7TF4C5LeMEKA/QRApwUAJ48q46iX152pB6Xcc/717Ie8foUTQCgm3ij W/+0iO/ZsNDn1pLuf5yXbYA= =asUn -----END PGP SIGNATURE----- From hlapp at gmx.net Thu Jun 7 07:52:41 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 7 Jun 2007 07:52:41 -0400 Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db In-Reply-To: <4667B4C5.6070107@ebi.ac.uk> References: <4667B4C5.6070107@ebi.ac.uk> Message-ID: I guess I'm behind the curve here a bit - schemas are optional in Postgres - if you say JDBC connection objects require a schema, does that mean it may also be null or empty? -hilmar On Jun 7, 2007, at 3:33 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Sounds great. > > BioJava users shouldn't need to change anything to get this to work as > PostgreSQL JDBC connection objects already require you to specify a > schema. > > cheers, > Richard > > > Hilmar Lapp wrote: >> I have added support to BioSQL and bioperl-db for schemas in >> PostgreSQL. >> A schema in PostgreSQL is more or less a namespace for database >> objects >> (tables, indexes, views, etc) within a database. >> >> (A database in PostgreSQL is similar to the concept of a user in >> Oracle >> or MySQL, and therefore for the latter two schemas are synonymous >> with a >> user. [Not sure I'm still up-to-date on this for MySQL, but at least >> that's what I recall.]) >> >> When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl >> scripts, you >> specify the schema in which BioSQL resides using the --schema option. >> >> If you are using bioperl-db as a library, the Bio::DB::BioDB->new >> () call >> also accepts a -schema named parameter, and Bio::DB::DBContextI >> objects >> have a $dbc->schema() property for getting/setting the schema, >> Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and >> you may >> also add the property to the .bioperldb connection parameter file >> (-schema => 'yourschemahere'). >> >> Thanks for Brian Osborne for being the instigator (and tester, and >> for >> adding the code to load_ncbi_taxonomy.pl - I came too late). >> >> -hilmar >> --=========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGZ7TF4C5LeMEKA/QRApwUAJ48q46iX152pB6Xcc/717Ie8foUTQCgm3ij > W/+0iO/ZsNDn1pLuf5yXbYA= > =asUn > -----END PGP SIGNATURE----- -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at ebi.ac.uk Thu Jun 7 08:22:11 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 07 Jun 2007 13:22:11 +0100 Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db In-Reply-To: References: <4667B4C5.6070107@ebi.ac.uk> Message-ID: <4667F873.3080103@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 When I said JDBC, what I really meant to say was Hibernate... Hibernate controls the mapping between BioJava and BioSQL via a set of mapping files and a connection parameters file (hibernate.cfg.xml), the latter of which is what I was referring to. Hibernate will use public if you don't specify a schema in the connection parameters file. If you want to use something else, do this in your connection parameters file: biosql (changing biosql to whatever your schema happens to be). cheers, Richard Hilmar Lapp wrote: > I guess I'm behind the curve here a bit - schemas are optional in > Postgres - if you say JDBC connection objects require a schema, does > that mean it may also be null or empty? > > -hilmar > > On Jun 7, 2007, at 3:33 AM, Richard Holland wrote: > > Sounds great. > > BioJava users shouldn't need to change anything to get this to work as > PostgreSQL JDBC connection objects already require you to specify a > schema. > > cheers, > Richard > > > Hilmar Lapp wrote: >>>> I have added support to BioSQL and bioperl-db for schemas in PostgreSQL. >>>> A schema in PostgreSQL is more or less a namespace for database objects >>>> (tables, indexes, views, etc) within a database. >>>> >>>> (A database in PostgreSQL is similar to the concept of a user in Oracle >>>> or MySQL, and therefore for the latter two schemas are synonymous with a >>>> user. [Not sure I'm still up-to-date on this for MySQL, but at least >>>> that's what I recall.]) >>>> >>>> When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts, you >>>> specify the schema in which BioSQL resides using the --schema option. >>>> >>>> If you are using bioperl-db as a library, the Bio::DB::BioDB->new() call >>>> also accepts a -schema named parameter, and Bio::DB::DBContextI objects >>>> have a $dbc->schema() property for getting/setting the schema, >>>> Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and you may >>>> also add the property to the .bioperldb connection parameter file >>>> (-schema => 'yourschemahere'). >>>> >>>> Thanks for Brian Osborne for being the instigator (and tester, and for >>>> adding the code to load_ncbi_taxonomy.pl - I came too late). >>>> >>>> -hilmar >>>> --=========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGZ/hz4C5LeMEKA/QRAhwRAKCX1kNyn0UdknpyRjQr82jYe4Z6bgCeKMGl /94ZBeUaNd4t+T5B7333b/4= =wQL0 -----END PGP SIGNATURE----- From hlapp at gmx.net Thu Jun 7 20:06:21 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 7 Jun 2007 20:06:21 -0400 Subject: [BioSQL-l] adding a namespace for trees Message-ID: <5F94A19C-D3F0-468A-AEFD-971D58495CFC@gmx.net> We're doing some work for a small demonstration project here and we find that phylogenetic trees are data objects in their own rights, and in fact are often identifiable and come from a database, for example if they are from TreeBASE. So I needed to add a namespace for trees in the form of a foreign key to biodatabase. Since namespaces that can't be relied upon are of little use, the foreign key is required, making this a fairly significant change. Any thoughts or comments are welcome. If anyone is using the phylogeny module already (the BioSQL core is completely unaffected by this), here's the migration path: INSERT INTO biodatabase (name, description) VALUES ('biosql_phylo','Default namespace for phylogenetic trees.'); ALTER TABLE tree ADD COLUMN biodatabase_id INTEGER; UPDATE tree SET biodatabase_id = ( SELECT biodatabase_id FROM biodatabase WHERE name = 'biosql_phylo' ); ALTER TABLE tree ALTER COLUMN biodatabase SET NOT NULL; ALTER TABLE tree ADD CONSTRAINT FKbiodatabase FOREIGN KEY (biodatabase_id) REFERENCES biodatabase (biodatabase_id); Cheers, -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Jun 10 10:41:11 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 10 Jun 2007 10:41:11 -0400 Subject: [BioSQL-l] Phylodb: unique key constraint on tree Message-ID: <1A8148DB-D44B-4A1A-BC9E-EB6318F36EFD@gmx.net> Hi all - the unique key constraint on tree has been the name. With the addition of a mandatory namespace for trees, this doesn't really make sense to keep that way. Instead, I propose to change this to names having to be unique only within a namespace. I.e., the unique key constraint would be on (name, biodatabase_id). Let me know if you have any comments, suggestions, or concerns. As we are starting to use the module with real data, there are likely going to be a few more changes to the schema. For those who are using the schema already, feel free to wait it out until the module stabilizes, and I will also try to provide a migration path whenever possible. Feel free to apply these immediately, or to accumulate them. The migration path for this change is: -- this assumes the default naming scheme for constraints used by PostgreSQL ALTER TABLE tree DROP CONSTRAINT tree_name_key; -- let's move towards named constraints to avoid having to rely on whatever -- naming scheme an RDBMS employs (which may change anyway) ALTER TABLE tree ADD CONSTRAINT tree_c1 UNIQUE (name, biodatabase_id); -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon Jun 11 07:30:24 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 11 Jun 2007 07:30:24 -0400 Subject: [BioSQL-l] script to load ITIS taxonomy Message-ID: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net> Hi all - I added a script to load the ITIS taxonomy (www.itis.gov) into the phylodb module. It is called load_itis_taxonomy.pl and is in the scripts/ directory. It is independent of BioPerl right now (the ITIS download is either a MS SQL Server or an Informix dump - no kidding), but I'm hoping that at some point support for this can be integrated into Bio::TreeIO. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon Jun 11 08:24:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Jun 2007 07:24:50 -0500 Subject: [BioSQL-l] [Bioperl-l] script to load ITIS taxonomy In-Reply-To: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net> References: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net> Message-ID: <99AC6C0F-10DD-4587-AFB3-32BC495CD2BD@uiuc.edu> On Jun 11, 2007, at 6:30 AM, Hilmar Lapp wrote: > Hi all - > > I added a script to load the ITIS taxonomy (www.itis.gov) into the > phylodb module. It is called load_itis_taxonomy.pl and is in the > scripts/ directory. > > It is independent of BioPerl right now (the ITIS download is either a > MS SQL Server or an Informix dump - no kidding), but I'm hoping that > at some point support for this can be integrated into Bio::TreeIO. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== I second the TreeIO support. Anyone up for it? chris From hlapp at gmx.net Mon Jun 11 20:04:24 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 11 Jun 2007 20:04:24 -0400 Subject: [BioSQL-l] index changes on the phylodb module Message-ID: It turns out that ITIS has duplicate node labels (i.e., taxon names) in their taxonomy (they don't all have the same validity attribute though). I suppose that many other data providers for trees won't satisfy this constraint either, so I propose to remove it by default. I'll leave it in as a commented out configuration option. I also needed to add more indexes to efficiently support some queries, especially those needed in precomputing the optimization structures for trees. The migration path is: -- using the default naming scheme for Pg: ALTER TABLE node DROP CONSTRAINT node_label_key; -- simple index on label to support searching nodes by label CREATE INDEX node_i1 ON node (label); -- other indexes needed for better query performance: CREATE INDEX node_i2 ON node (tree_id); CREATE INDEX edge_i1 ON edge (parent_node_id); CREATE INDEX node_path_i1 ON node_path (parent_node_id); -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at ebi.ac.uk Wed Jun 13 11:15:48 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 13 Jun 2007 16:15:48 +0100 Subject: [BioSQL-l] BioJava 1.5 Released Message-ID: <46700A24.4040305@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all. BioJava 1.5 has been released and is available for download from our website at http://biojava.org/ Thanks to everyone who has made contributions, and in particular to those who have spent many hours testing our new file parsers with every combination of scenarios under the sun. In addition to numerous bugfixes and enhancements, the highlights of this release are brand new parsers for the most common file formats (GenBank, Fasta, etc.), and a brand new BioSQL persistence layer that uses Hibernate to interact with sequence databases. There is also a new set of classes for creating genetic algorithms. These are all part of the new org.biojavax package which represents extensions to BioJava that would not fit easily into the existing package structure. The classes in org.biojavax mostly extend and improve on existing classes which could not be removed or replaced in order to maintain compatibility with older code. As usual if anyone finds any bugs in this release, please do report them to us using the BugZilla tool at http://bugzilla.open-bio.org/ Please also note that this will be the last release of BioJava that will be able to compile and run on Java 1.4. The next release (1.6) will move at least to Java 5 or maybe straight to Java 6 (decision not yet made). cheers, Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD4DBQFGcAoj4C5LeMEKA/QRAvZiAJjhHGWvq5nrj8aanmUtCpA8U8dpAJ0bsxzy tv5LVdSEtAuA7gp12nLMCA== =/Wbu -----END PGP SIGNATURE----- From hlapp at gmx.net Fri Jun 22 08:49:37 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 22 Jun 2007 08:49:37 -0400 Subject: [BioSQL-l] phylodb ERD Message-ID: FYI, I committed an OmniGraffle and PDF version of an ERD for the BioSQL phylodb module. They are in the doc/ directory. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Jun 7 00:45:14 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Jun 2007 20:45:14 -0400 Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db Message-ID: I have added support to BioSQL and bioperl-db for schemas in PostgreSQL. A schema in PostgreSQL is more or less a namespace for database objects (tables, indexes, views, etc) within a database. (A database in PostgreSQL is similar to the concept of a user in Oracle or MySQL, and therefore for the latter two schemas are synonymous with a user. [Not sure I'm still up-to-date on this for MySQL, but at least that's what I recall.]) When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts, you specify the schema in which BioSQL resides using the --schema option. If you are using bioperl-db as a library, the Bio::DB::BioDB->new() call also accepts a -schema named parameter, and Bio::DB::DBContextI objects have a $dbc->schema() property for getting/setting the schema, Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and you may also add the property to the .bioperldb connection parameter file (-schema => 'yourschemahere'). Thanks for Brian Osborne for being the instigator (and tester, and for adding the code to load_ncbi_taxonomy.pl - I came too late). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Jun 7 02:44:55 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Jun 2007 22:44:55 -0400 Subject: [BioSQL-l] Phylogeny module Message-ID: <3A264479-2FD9-407B-BFB4-9CB78188CDA6@gmx.net> (for some reason I forgot to post this earlier - apologies) I committed the phylogeny module a couple of weeks ago that Bill Piel and I created at Phyloinformatics Hackathon (http:// hackathon.nescent.org) in December (biosql-phylodb-pg.sql). This is an optional module - BioSQL will work perfectly well without it. (Unless - surprise - you want to store phylogenetic trees.) Right now there is only a PostgreSQL version, but Jamie Estill, a student in our Google Summer of Code program, has created a MySQL version that he or I will commit too. I've now also added comments and made a few rather small changes to the module's schema since the initial revision: - widened width of of tree.identifier to 32 chars - added column tree.is_rooted of boolean type - renamed column node.gene_id to node.bioentry_id If anyone was using this module already, here's the migration script in PostgreSQL: ALTER TABLE tree ALTER COLUMN identifier TYPE VARCHAR(32); ALTER TABLE tree ADD COLUMN is_rooted BOOLEAN DEFAULT TRUE; ALTER TABLE node RENAME COLUMN gene_id TO bioentry_id; -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at ebi.ac.uk Thu Jun 7 07:33:25 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 07 Jun 2007 08:33:25 +0100 Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db In-Reply-To: References: Message-ID: <4667B4C5.6070107@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sounds great. BioJava users shouldn't need to change anything to get this to work as PostgreSQL JDBC connection objects already require you to specify a schema. cheers, Richard Hilmar Lapp wrote: > I have added support to BioSQL and bioperl-db for schemas in PostgreSQL. > A schema in PostgreSQL is more or less a namespace for database objects > (tables, indexes, views, etc) within a database. > > (A database in PostgreSQL is similar to the concept of a user in Oracle > or MySQL, and therefore for the latter two schemas are synonymous with a > user. [Not sure I'm still up-to-date on this for MySQL, but at least > that's what I recall.]) > > When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts, you > specify the schema in which BioSQL resides using the --schema option. > > If you are using bioperl-db as a library, the Bio::DB::BioDB->new() call > also accepts a -schema named parameter, and Bio::DB::DBContextI objects > have a $dbc->schema() property for getting/setting the schema, > Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and you may > also add the property to the .bioperldb connection parameter file > (-schema => 'yourschemahere'). > > Thanks for Brian Osborne for being the instigator (and tester, and for > adding the code to load_ncbi_taxonomy.pl - I came too late). > > -hilmar > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGZ7TF4C5LeMEKA/QRApwUAJ48q46iX152pB6Xcc/717Ie8foUTQCgm3ij W/+0iO/ZsNDn1pLuf5yXbYA= =asUn -----END PGP SIGNATURE----- From hlapp at gmx.net Thu Jun 7 11:52:41 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 7 Jun 2007 07:52:41 -0400 Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db In-Reply-To: <4667B4C5.6070107@ebi.ac.uk> References: <4667B4C5.6070107@ebi.ac.uk> Message-ID: I guess I'm behind the curve here a bit - schemas are optional in Postgres - if you say JDBC connection objects require a schema, does that mean it may also be null or empty? -hilmar On Jun 7, 2007, at 3:33 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Sounds great. > > BioJava users shouldn't need to change anything to get this to work as > PostgreSQL JDBC connection objects already require you to specify a > schema. > > cheers, > Richard > > > Hilmar Lapp wrote: >> I have added support to BioSQL and bioperl-db for schemas in >> PostgreSQL. >> A schema in PostgreSQL is more or less a namespace for database >> objects >> (tables, indexes, views, etc) within a database. >> >> (A database in PostgreSQL is similar to the concept of a user in >> Oracle >> or MySQL, and therefore for the latter two schemas are synonymous >> with a >> user. [Not sure I'm still up-to-date on this for MySQL, but at least >> that's what I recall.]) >> >> When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl >> scripts, you >> specify the schema in which BioSQL resides using the --schema option. >> >> If you are using bioperl-db as a library, the Bio::DB::BioDB->new >> () call >> also accepts a -schema named parameter, and Bio::DB::DBContextI >> objects >> have a $dbc->schema() property for getting/setting the schema, >> Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and >> you may >> also add the property to the .bioperldb connection parameter file >> (-schema => 'yourschemahere'). >> >> Thanks for Brian Osborne for being the instigator (and tester, and >> for >> adding the code to load_ncbi_taxonomy.pl - I came too late). >> >> -hilmar >> --=========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGZ7TF4C5LeMEKA/QRApwUAJ48q46iX152pB6Xcc/717Ie8foUTQCgm3ij > W/+0iO/ZsNDn1pLuf5yXbYA= > =asUn > -----END PGP SIGNATURE----- -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at ebi.ac.uk Thu Jun 7 12:22:11 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 07 Jun 2007 13:22:11 +0100 Subject: [BioSQL-l] PostgreSQL schema support in BioSQL and bioperl-db In-Reply-To: References: <4667B4C5.6070107@ebi.ac.uk> Message-ID: <4667F873.3080103@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 When I said JDBC, what I really meant to say was Hibernate... Hibernate controls the mapping between BioJava and BioSQL via a set of mapping files and a connection parameters file (hibernate.cfg.xml), the latter of which is what I was referring to. Hibernate will use public if you don't specify a schema in the connection parameters file. If you want to use something else, do this in your connection parameters file: biosql (changing biosql to whatever your schema happens to be). cheers, Richard Hilmar Lapp wrote: > I guess I'm behind the curve here a bit - schemas are optional in > Postgres - if you say JDBC connection objects require a schema, does > that mean it may also be null or empty? > > -hilmar > > On Jun 7, 2007, at 3:33 AM, Richard Holland wrote: > > Sounds great. > > BioJava users shouldn't need to change anything to get this to work as > PostgreSQL JDBC connection objects already require you to specify a > schema. > > cheers, > Richard > > > Hilmar Lapp wrote: >>>> I have added support to BioSQL and bioperl-db for schemas in PostgreSQL. >>>> A schema in PostgreSQL is more or less a namespace for database objects >>>> (tables, indexes, views, etc) within a database. >>>> >>>> (A database in PostgreSQL is similar to the concept of a user in Oracle >>>> or MySQL, and therefore for the latter two schemas are synonymous with a >>>> user. [Not sure I'm still up-to-date on this for MySQL, but at least >>>> that's what I recall.]) >>>> >>>> When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts, you >>>> specify the schema in which BioSQL resides using the --schema option. >>>> >>>> If you are using bioperl-db as a library, the Bio::DB::BioDB->new() call >>>> also accepts a -schema named parameter, and Bio::DB::DBContextI objects >>>> have a $dbc->schema() property for getting/setting the schema, >>>> Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and you may >>>> also add the property to the .bioperldb connection parameter file >>>> (-schema => 'yourschemahere'). >>>> >>>> Thanks for Brian Osborne for being the instigator (and tester, and for >>>> adding the code to load_ncbi_taxonomy.pl - I came too late). >>>> >>>> -hilmar >>>> --=========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGZ/hz4C5LeMEKA/QRAhwRAKCX1kNyn0UdknpyRjQr82jYe4Z6bgCeKMGl /94ZBeUaNd4t+T5B7333b/4= =wQL0 -----END PGP SIGNATURE----- From hlapp at gmx.net Fri Jun 8 00:06:21 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 7 Jun 2007 20:06:21 -0400 Subject: [BioSQL-l] adding a namespace for trees Message-ID: <5F94A19C-D3F0-468A-AEFD-971D58495CFC@gmx.net> We're doing some work for a small demonstration project here and we find that phylogenetic trees are data objects in their own rights, and in fact are often identifiable and come from a database, for example if they are from TreeBASE. So I needed to add a namespace for trees in the form of a foreign key to biodatabase. Since namespaces that can't be relied upon are of little use, the foreign key is required, making this a fairly significant change. Any thoughts or comments are welcome. If anyone is using the phylogeny module already (the BioSQL core is completely unaffected by this), here's the migration path: INSERT INTO biodatabase (name, description) VALUES ('biosql_phylo','Default namespace for phylogenetic trees.'); ALTER TABLE tree ADD COLUMN biodatabase_id INTEGER; UPDATE tree SET biodatabase_id = ( SELECT biodatabase_id FROM biodatabase WHERE name = 'biosql_phylo' ); ALTER TABLE tree ALTER COLUMN biodatabase SET NOT NULL; ALTER TABLE tree ADD CONSTRAINT FKbiodatabase FOREIGN KEY (biodatabase_id) REFERENCES biodatabase (biodatabase_id); Cheers, -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Jun 10 14:41:11 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 10 Jun 2007 10:41:11 -0400 Subject: [BioSQL-l] Phylodb: unique key constraint on tree Message-ID: <1A8148DB-D44B-4A1A-BC9E-EB6318F36EFD@gmx.net> Hi all - the unique key constraint on tree has been the name. With the addition of a mandatory namespace for trees, this doesn't really make sense to keep that way. Instead, I propose to change this to names having to be unique only within a namespace. I.e., the unique key constraint would be on (name, biodatabase_id). Let me know if you have any comments, suggestions, or concerns. As we are starting to use the module with real data, there are likely going to be a few more changes to the schema. For those who are using the schema already, feel free to wait it out until the module stabilizes, and I will also try to provide a migration path whenever possible. Feel free to apply these immediately, or to accumulate them. The migration path for this change is: -- this assumes the default naming scheme for constraints used by PostgreSQL ALTER TABLE tree DROP CONSTRAINT tree_name_key; -- let's move towards named constraints to avoid having to rely on whatever -- naming scheme an RDBMS employs (which may change anyway) ALTER TABLE tree ADD CONSTRAINT tree_c1 UNIQUE (name, biodatabase_id); -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon Jun 11 11:30:24 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 11 Jun 2007 07:30:24 -0400 Subject: [BioSQL-l] script to load ITIS taxonomy Message-ID: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net> Hi all - I added a script to load the ITIS taxonomy (www.itis.gov) into the phylodb module. It is called load_itis_taxonomy.pl and is in the scripts/ directory. It is independent of BioPerl right now (the ITIS download is either a MS SQL Server or an Informix dump - no kidding), but I'm hoping that at some point support for this can be integrated into Bio::TreeIO. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon Jun 11 12:24:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Jun 2007 07:24:50 -0500 Subject: [BioSQL-l] [Bioperl-l] script to load ITIS taxonomy In-Reply-To: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net> References: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net> Message-ID: <99AC6C0F-10DD-4587-AFB3-32BC495CD2BD@uiuc.edu> On Jun 11, 2007, at 6:30 AM, Hilmar Lapp wrote: > Hi all - > > I added a script to load the ITIS taxonomy (www.itis.gov) into the > phylodb module. It is called load_itis_taxonomy.pl and is in the > scripts/ directory. > > It is independent of BioPerl right now (the ITIS download is either a > MS SQL Server or an Informix dump - no kidding), but I'm hoping that > at some point support for this can be integrated into Bio::TreeIO. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== I second the TreeIO support. Anyone up for it? chris From hlapp at gmx.net Tue Jun 12 00:04:24 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 11 Jun 2007 20:04:24 -0400 Subject: [BioSQL-l] index changes on the phylodb module Message-ID: It turns out that ITIS has duplicate node labels (i.e., taxon names) in their taxonomy (they don't all have the same validity attribute though). I suppose that many other data providers for trees won't satisfy this constraint either, so I propose to remove it by default. I'll leave it in as a commented out configuration option. I also needed to add more indexes to efficiently support some queries, especially those needed in precomputing the optimization structures for trees. The migration path is: -- using the default naming scheme for Pg: ALTER TABLE node DROP CONSTRAINT node_label_key; -- simple index on label to support searching nodes by label CREATE INDEX node_i1 ON node (label); -- other indexes needed for better query performance: CREATE INDEX node_i2 ON node (tree_id); CREATE INDEX edge_i1 ON edge (parent_node_id); CREATE INDEX node_path_i1 ON node_path (parent_node_id); -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at ebi.ac.uk Wed Jun 13 15:15:48 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 13 Jun 2007 16:15:48 +0100 Subject: [BioSQL-l] BioJava 1.5 Released Message-ID: <46700A24.4040305@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all. BioJava 1.5 has been released and is available for download from our website at http://biojava.org/ Thanks to everyone who has made contributions, and in particular to those who have spent many hours testing our new file parsers with every combination of scenarios under the sun. In addition to numerous bugfixes and enhancements, the highlights of this release are brand new parsers for the most common file formats (GenBank, Fasta, etc.), and a brand new BioSQL persistence layer that uses Hibernate to interact with sequence databases. There is also a new set of classes for creating genetic algorithms. These are all part of the new org.biojavax package which represents extensions to BioJava that would not fit easily into the existing package structure. The classes in org.biojavax mostly extend and improve on existing classes which could not be removed or replaced in order to maintain compatibility with older code. As usual if anyone finds any bugs in this release, please do report them to us using the BugZilla tool at http://bugzilla.open-bio.org/ Please also note that this will be the last release of BioJava that will be able to compile and run on Java 1.4. The next release (1.6) will move at least to Java 5 or maybe straight to Java 6 (decision not yet made). cheers, Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD4DBQFGcAoj4C5LeMEKA/QRAvZiAJjhHGWvq5nrj8aanmUtCpA8U8dpAJ0bsxzy tv5LVdSEtAuA7gp12nLMCA== =/Wbu -----END PGP SIGNATURE----- From hlapp at gmx.net Fri Jun 22 12:49:37 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 22 Jun 2007 08:49:37 -0400 Subject: [BioSQL-l] phylodb ERD Message-ID: FYI, I committed an OmniGraffle and PDF version of an ERD for the BioSQL phylodb module. They are in the doc/ directory. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : ===========================================================