From robert.roth at home.se Wed Oct 8 07:24:25 2003 From: robert.roth at home.se (Robert Roth) Date: Wed Oct 8 07:21:59 2003 Subject: [BioSQL-l] Error using load_seqdatabase.pl Message-ID: Hello, I'm using Mysql 4.0.15, Bioperl 1.2.3, bioperl-db from the cvs and perl 5.6.1 on WinXP. After installation I pre-loaded the the biosql database with the NCBI taxon database using load_ncbi_taxonomy.pl. Now when I try to load sequence data with load_seqdatabase.pl I get the following error, perl load_seqdatabase.pl --host localhost --dbname biosql --dbus er root --namespace bioperl --debug --safe --testonly --format genbank test.genbank Loading test.genbank ... attempting to load adaptor class for Bio::Seq::RichSeq attempting to load module Bio::DB::BioSQL::RichSeqAdaptor attempting to load adaptor class for Bio::Seq attempting to load module Bio::DB::BioSQL::SeqAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor attempting to load adaptor class for Bio::SeqFeature::Generic attempting to load module Bio::DB::BioSQL::GenericAdaptor attempting to load adaptor class for Bio::Root::Root attempting to load module Bio::DB::BioSQL::RootAdaptor Undefined subroutine &Bio::Root::Root::debug called at E:/Perl/site/lib/Bio/DB/B ioSQL/BasePersistenceAdaptor.pm line 1502, line 41. I've tried different formats (swiss and genbank) and different files with the same result. The line number given in the error always correspond to the end of the first record (and it doesn't matter if the file contains one or more records). Any ideas? Many thanks in advance, Robert From hlapp at gnf.org Wed Oct 8 14:54:04 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Oct 8 14:51:43 2003 Subject: [BioSQL-l] Error using load_seqdatabase.pl In-Reply-To: Message-ID: This has nothing to do with the input file but everything with run-time loading of modules in perl on Windows. The error you get is generated by the following line of code: $self->debug("attempting to load driver for adaptor class $class\n"); The class Bio::DB::BioSQL::BasePersistenceAdaptor.pm has a 'use Bio::Root::Root' statement, it inherits directly from Bio::Root::Root, and Bio::Root::Root does have a debug() method as you can (and should) easily convince yourself of. Have you run the bioperl-db test suite? If so, what was the result? If not, run it. So, I'm copying the bioperl list here in case somebody has more clues as to why you might be seeing this error on WinXP. We've had a case before on the list where it turned out eventually that there had been a version mix-up on the machine and everything worked fine after re-installing from scratch. -hilmar On 10/8/03 4:24 AM, "Robert Roth" wrote: > > Hello, > > I'm using Mysql 4.0.15, Bioperl 1.2.3, bioperl-db from the cvs and perl > 5.6.1 on WinXP. > After installation I pre-loaded the the biosql database with the NCBI taxon > database using > load_ncbi_taxonomy.pl. Now when I try to load sequence data with > load_seqdatabase.pl I get > the following error, > > perl load_seqdatabase.pl --host localhost --dbname biosql --dbus > er root --namespace bioperl --debug --safe --testonly --format > genbank test.genbank > Loading test.genbank ... > attempting to load adaptor class for Bio::Seq::RichSeq > attempting to load module Bio::DB::BioSQL::RichSeqAdaptor > attempting to load adaptor class for Bio::Seq > attempting to load module Bio::DB::BioSQL::SeqAdaptor > instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor > attempting to load adaptor class for Bio::SeqFeature::Generic > attempting to load module Bio::DB::BioSQL::GenericAdaptor > attempting to load adaptor class for Bio::Root::Root > attempting to load module Bio::DB::BioSQL::RootAdaptor > Undefined subroutine &Bio::Root::Root::debug called at > E:/Perl/site/lib/Bio/DB/B > ioSQL/BasePersistenceAdaptor.pm line 1502, line 41. > > I've tried different formats (swiss and genbank) and different files with > the same > result. The line number given in the error always correspond to the end of > the first > record (and it doesn't matter if the file contains one or more records). > Any ideas? > > Many thanks in advance, > > Robert > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From arthur.fen at vitagenomics.com Thu Oct 9 04:00:52 2003 From: arthur.fen at vitagenomics.com (Arthur Fen) Date: Thu Oct 9 08:22:10 2003 Subject: [BioSQL-l] How fast to execute load_seqdatabase.pl parsing swiss-prot data into Oracle database? Message-ID: <2F146949A49BB34DADB689AC286698FE4D8063@exchange2k.vitagenomics.com> Dear sir: I tried to load decompressed SWISSPROT data (Release 41.13 of 21-Jun-2003,"sprot.dat") into ORACLE database Server (ver 9.2.0.1.0 on Linux 9.0) by using the load_seqdatabase.pl script. If I use following status to run the program, could you estimate the reasonable speed and time spanning for loading data? Environments: 2 AMD 2000+ CPU, 4 GB RAM, 80 GB HDD, Virtual swap 6G space. RedHat Linux 8.0, Perl v.5.8.1 (Compile on default setting by myself), BioPerl v.1.22, DBI v.1.37, DBD::Oracle v.1.14 and newest bioperl-db I got a relatively slow motion about 5,000 records/hr in comparing to mySQL. Is it normal for using Oracle? Or I need to do some tuning for my system? Thanks for your answer... Arthur From hlapp at gnf.org Thu Oct 9 14:20:58 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Oct 9 14:18:32 2003 Subject: [BioSQL-l] How fast to execute load_seqdatabase.pl parsing swiss-prot data into Oracle database? In-Reply-To: <2F146949A49BB34DADB689AC286698FE4D8063@exchange2k.vitagenomics.com> Message-ID: On 10/9/03 1:00 AM, "Arthur Fen" wrote: > Dear sir: > > I tried to load decompressed SWISSPROT data (Release 41.13 of > 21-Jun-2003,"sprot.dat") into ORACLE database Server (ver 9.2.0.1.0 on > Linux 9.0) by using the load_seqdatabase.pl script. > > If I use following status to run the program, could you estimate the > reasonable speed and time spanning for loading data? > Environments: > 2 AMD 2000+ CPU, 4 GB RAM, 80 GB HDD, Virtual swap 6G space. > RedHat Linux 8.0, > Perl v.5.8.1 (Compile on default setting by myself), > BioPerl v.1.22, DBI v.1.37, DBD::Oracle v.1.14 and newest bioperl-db > > > I got a relatively slow motion about 5,000 records/hr in comparing to > mySQL. Is it normal for using Oracle? Or I need to do some tuning for > my system? It should be able to go faster. Given your setup, if you want to use Oracle for anything else than just toying around, you need more than one disk. Two at minimum, three are better, most of the serious people have 4 or 5 mount points. Ideally you can distribute archive & transaction logs, data tablespace, and index tablespace each to separate disks. Also, for the first upload don't supply --lookup to load_seqdatabase.pl. -hilmar > > Thanks for your answer... > > > Arthur > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From matthew_pocock at yahoo.co.uk Thu Oct 16 10:33:20 2003 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Thu Oct 16 10:34:32 2003 Subject: [BioSQL-l] Re: [Biojava-l] ontology exception, addSequence & BioSQLSequenceDB In-Reply-To: <0192157C-FF2B-11D7-8842-000A959EB4C4@gnf.org> References: <0192157C-FF2B-11D7-8842-000A959EB4C4@gnf.org> Message-ID: <3F8EAC30.6010401@yahoo.co.uk> Hi all, I've added this table to my copy of biosql: CREATE TABLE term_relationship_term ( term_relationship_id INTEGER NOT NULL, term_id INTEGER NOT NULL, PRIMARY KEY ( term_relationship_id, term_id ), UNIQUE ( term_relationship_id ), UNIQUE ( term_id ) ); This could be modelled more correctly by adding a nullable field to term_relationship that refers to a term_id, but doing it this way didn't break existing biosql schemas. This lets associate a single term with a term_relationship effecively allowing us to treat triples as 1st class terms. * Why bother? For stooring simple facts, you just need terms and triples. For this case, the existing schema is fine. For stooring inference rules, I think you need both variables (which currently we identify by leading unerscors) and the ability to compose complex expressions from simple triples and terms e.g. this is a rule for transitive closure transitive_closure as implies(and(isa(_t, transitive), and(_t(_x, _y), _t(_y, _z))), _t(_x, _z)) And here's one for doing a dumb linking of SO terms to embl feature types by just appending "urn:so:" to the embl feature name: so2ft { convert between so terms and feature table types } equal(so2ft(_so, _ft), and(and(term_name(_so_name, _so), term_name(_ft_name, _ft)), concatenation(_so_name, ["urn:so:", _so_name]))) This would let us store the inference rules that are needed to interpret a biosql database actually in that database, which must be a good thing. These expressions seem to be able to express prety much everything. They are also explict about how they can be interpreted. A large class of things which are represented in this form can be directly turned into prolog expressions (ok, prolog can't do higher order logic, but HiLog or something else could be used, you get the idea). * what biojava does Our ontolgy API has Terms and Triples. The Triples extend Term, so that every Triple is a 1st class Term that can be reasoned about. Triples that are not the subject, object or object of other triples are the things that look like prolog predicates. Terms that are not triples are atoms (the alphabet of your language) and simple triples are used as normal for basic rules like isa(x, y) and things. * what we do in biosql So, whenever a triple is persisted to biosql, we now write an entry to term_relationship_term and to term reprenting the triple. This could be simplified by adding a nullable field to term_relationship. * open questions Perhaps there is another way to represent complex constraints and inferece rules that uses old-style triples. However, most of what I have seen introduces more ambiguity to the system. Perhaps people think we should not be storing meta-data or inference rules and the like in the database. All I can say to this is that I believe that it should be possible to provide the relevant knowledge to reason over the data along with that data if we choose to do so. Enough from me. Comments? If I don't hear anything back within some reasonable length of time, I will just add the extra foreign key to term_relationship. Matthew Hilmar Lapp wrote: > Starts making sense. I in fact suspected that it is about being very > explicit about which is otherwise implied behind the scenes, and I > know you don't like that. > > Would you care to write this up like you did below and post to > biosql-l? Otherwise I'd do it and you may not like how I quote you ;-) > > An additional foreign key on either term_relationship or term > shouldn't actually break anything unless you make it NOT NULL (it's > just not going to be supported for a while outside of biojava). What > would be significantly more involved is adding foreign keys to > term_relationship subject and object pointing back to rel.ship and > having them alternative by constraints to the term subject and object. > > -hilmar From FredericP at DEVGEN.com Fri Oct 17 06:42:18 2003 From: FredericP at DEVGEN.com (Frederic Pecqueur) Date: Fri Oct 17 06:39:37 2003 Subject: [BioSQL-l] Problem With the Bio::Seq::RichSeq object. Message-ID: Hi all, Here is my problem : I wish insert a Bio::Seq::RichSeq object in my BioSQL database, but it doesn't work. I use this code source : my $db = Bio::DB::BioDB->new(-database => "biosql", -host => $host, -dbname => $dbname, -driver => $driver, -user => $dbuser, -pass => $dbpass, ); my $pseq = $db->create_persistent($richSeq); $pseq->namespace($namespace); $pseq->store(); When I execute my script I find these errors : DBD::Oracle::st execute failed: ORA-01400: cannot insert NULL into ("BIOSQL"."SG_SEQFEATURE"."SOURCE_TRM_OID") (DBD ERROR: OCIStmtExecute) [for statement ``INSERT INTO seqfeature (display_name, rank, ent_oid, type_trm_oid, source_trm_oid) VALUES (?, ?, ?, ?, ?)'' with params: :p4=11, :p5=undef, :p1=undef, :p2=1, :p3='1909725']) at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BaseDriver.pm line 1001. Use of uninitialized value in join or string at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BaseDriver.pm line 1835. -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values were ("","1") FKs (1909725,11,) ORA-01400: cannot insert NULL into ("BIOSQL"."SG_SEQFEATURE"."SOURCE_TRM_OID") (DBD ERROR: OCIStmtExecute) --------------------------------------------------- ------------- EXCEPTION ------------- MSG: create: object (Bio::SeqFeature::Generic) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:270 STACK Bio::DB::BioSQL::SeqAdaptor::store_children /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/SeqAdaptor.pm:246 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:215 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:270 STACK toplevel ./test.pl:60 When I tranform my RichSeq object to a Bio::SeqIO object with the genbank format, I can use the load_seqdatabase.pl script to load my Information with none errors. But I would wish to insert my RichSeq object directly in the BioSQL database. (I don't need of Genbank file) Thanks for your help. Fr?d?ric. From hlapp at gnf.org Fri Oct 17 13:55:42 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Oct 17 13:52:51 2003 Subject: [BioSQL-l] Problem With the Bio::Seq::RichSeq object. In-Reply-To: Message-ID: It looks like you have at least one seqfeature that doesn't have the source tag set ($feature->source_tag()). The source tag is mandatory for biosql. Let me know if that's not the problem. BTW round-tripping through genbank format solves this because on re-import the genbank parser sets a default source tag. -hilmar On 10/17/03 3:42 AM, "Frederic Pecqueur" wrote: > Hi all, > > Here is my problem : > > I wish insert a Bio::Seq::RichSeq object in my BioSQL database, but it doesn't > work. > > I use this code source : > > my $db = Bio::DB::BioDB->new(-database => "biosql", > -host => $host, > -dbname => $dbname, > -driver => $driver, > -user => $dbuser, > -pass => $dbpass, > ); > > my $pseq = $db->create_persistent($richSeq); > $pseq->namespace($namespace); > $pseq->store(); > > When I execute my script I find these errors : > > DBD::Oracle::st execute failed: ORA-01400: cannot insert NULL into > ("BIOSQL"."SG_SEQFEATURE"."SOURCE_TRM_OID") (DBD ERROR: OCIStmtExecute) [for > statement ``INSERT INTO seqfeature (display_name, rank, ent_oid, type_trm_oid, > source_trm_oid) VALUES (?, ?, ?, ?, ?)'' with params: :p4=11, :p5=undef, > :p1=undef, :p2=1, :p3='1909725']) at > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BaseDriver.pm line 1001. > Use of uninitialized value in join or string at > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BaseDriver.pm line 1835. > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values were > ("","1") FKs (1909725,11,) > ORA-01400: cannot insert NULL into ("BIOSQL"."SG_SEQFEATURE"."SOURCE_TRM_OID") > (DBD ERROR: OCIStmtExecute) > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::SeqFeature::Generic) failed to insert or to be found > by unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:270 > STACK Bio::DB::BioSQL::SeqAdaptor::store_children > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/SeqAdaptor.pm:246 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:215 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:270 > STACK toplevel ./test.pl:60 > > When I tranform my RichSeq object to a Bio::SeqIO object with the genbank > format, I can use the load_seqdatabase.pl script to load my Information with > none errors. > > But I would wish to insert my RichSeq object directly in the BioSQL database. > (I don't need of Genbank file) > > Thanks for your help. > Fr?d?ric. > > > > > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From FredericP at DEVGEN.com Mon Oct 20 10:48:28 2003 From: FredericP at DEVGEN.com (Frederic Pecqueur) Date: Mon Oct 20 10:56:20 2003 Subject: [BioSQL-l] Link between a gene and a protein? Message-ID: Hi all, I have a question about the load_seqdatabase.pl script. Is it possible to include in a genbank file a special feature or something else to create a link between two bioentry. My idea is : this feature must be recognized by the load_seqdatabase.pl script in the aim of create the links between the bioentry table and the bioentry relationship table. may be It's not possible........ So, I must create the links between the bioentry table and the bioentry relationalship table myself? Thanks for your help. Fr?d?ric. PS: thanks Hilmar for your solution about my RichSeq problem. From hlapp at gnf.org Mon Oct 20 15:47:44 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Oct 20 15:44:42 2003 Subject: [BioSQL-l] Link between a gene and a protein? In-Reply-To: Message-ID: The bioentry_relationship table is entirely up to you how you populate that at this point ... I'd consider it sort of research to figure out the best or just one reasonable way. The system that we've been building here at GNF does that a lot. We're on the way to releasing the implementation to the public, which for that table is all Oracle sql-scripts taking advantage of knowledge materialized as meta-information (i.e., a custom ontology). Basically, we're taking two approaches: harvest relationships off of dbxrefs that can be linked back to bioentries, and harvesting relationships off of genomic co-location. The former is working relatively well. Let me know if you think the former is what you want and I'll ask for release clearance. You could obviously also invent your system that works off of 'special' features. Sorry for not being able to be more helpful at this point. -hilmar On 10/20/03 7:48 AM, "Frederic Pecqueur" wrote: > Hi all, > > I have a question about the load_seqdatabase.pl script. > > Is it possible to include in a genbank file a special feature or something > else to create a link between two bioentry. > > My idea is : this feature must be recognized by the load_seqdatabase.pl script > in the aim of create the links between the bioentry table and the bioentry > relationship table. > > may be It's not possible........ > So, I must create the links between the bioentry table and the bioentry > relationalship table myself? > Thanks for your help. > Fr?d?ric. > > PS: thanks Hilmar for your solution about my RichSeq problem. > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Tue Oct 21 22:45:48 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Oct 21 22:42:43 2003 Subject: [BioSQL-l] Re: [Biojava-l] ontology exception, addSequence & BioSQLSequenceDB In-Reply-To: <3F8EBC39.3020600@nrc-cnrc.gc.ca> Message-ID: On 10/16/03 8:41 AM, "Simon Foote" wrote: > > I also cc'd Hilmar, for possibly changing this in the biosql schema for > MySQL. I far as I can tell it doesn't cause any problems with existing > databases as it only effects sorting and comparing of the field. > You mean making all VARCHAR columns BINARY in the mysql version that constitute or are part of an alternative key? Sounds reasonable to me. Also, would be consistent with the other 2 supported platforms. Does anybody on this list have an opinion or comment for or against doing so? -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From len at reeltwo.com Tue Oct 21 22:58:55 2003 From: len at reeltwo.com (Len Trigg) Date: Tue Oct 21 23:45:10 2003 Subject: [BioSQL-l] Re: HSQLDB support... In-Reply-To: References: <3F8EBC39.3020600@nrc-cnrc.gc.ca> Message-ID: Hilmar Lapp wrote: > Sounds reasonable to me. Also, would be consistent with the other 2 > supported platforms. This reminds me, I ported the BioSQL schema to support HSQLDB, and added the neccessary support to BioJava's BioSQL binding. We use HSQLDB in our BioSQL unit tests, and I have seen references to other people making use of it. I have attached the appropriate schema files, if you wish include them as part of the BioSQL project (in which case BioSQL CVS can become the canonical location). Cheers, Len. -------------- next part -------------- -- biosqldb-hsqldb.sql -- Authors: Ewan Birney, Elia Stupka -- Contributors: Hilmar Lapp, Aaron Mackey -- -- Copyright Ewan Birney. You may use, modify, and distribute this code under -- the same terms as Perl. See the Perl Artistic License. -- -- comments to biosql - biosql-l@open-bio.org -- -- Migration of the MySQL schema to InnoDB by Hilmar Lapp -- Post-Cape Town changes by Hilmar Lapp. -- Singapore changes by Hilmar Lapp and Aaron Mackey. -- Migration to HSQLDB by Len Trigg -- See biosql schema documentation for general documentation regarding the -- schema. This file contains documetation specific to the hsqldb schema. -- HSQLDB Version compatibility notes: -- HSQLDB 1.7.1 has problems with null values in columns with UNIQUE constraints, several of these -- constraints have been commented out for compatibility. -- HSQLDB 1.7.2 alpha N. has problem with PreparedStatements that affect the BioJava binding (these -- will apparently be addressed by the 1.7.2 release) CREATE TABLE biodatabase ( biodatabase_id INT NOT NULL IDENTITY, name VARCHAR(128) NOT NULL, authority VARCHAR(128), description LONGVARCHAR, UNIQUE (name) ); CREATE INDEX db_auth on biodatabase(authority); CREATE TABLE taxon ( taxon_id INT NOT NULL IDENTITY, ncbi_taxon_id INT, parent_taxon_id INT, node_rank VARCHAR(32), genetic_code TINYINT, mito_genetic_code TINYINT, left_value INT, right_value INT, UNIQUE (ncbi_taxon_id) ); -- HSQLDB 1.7.1 UNIQUE BUG -- UNIQUE (left_value), -- UNIQUE (right_value) CREATE INDEX taxparent ON taxon(parent_taxon_id); CREATE TABLE taxon_name ( taxon_id INT NOT NULL, name VARCHAR(255) NOT NULL, name_class VARCHAR(32) NOT NULL, UNIQUE (taxon_id,name,name_class) ); CREATE INDEX taxnametaxonid ON taxon_name(taxon_id); CREATE INDEX taxnamename ON taxon_name(name); CREATE TABLE ontology ( ontology_id INT NOT NULL IDENTITY, name VARCHAR(32) NOT NULL, definition LONGVARCHAR, UNIQUE (name) ); CREATE TABLE term ( term_id INT NOT NULL IDENTITY, name VARCHAR(255) NOT NULL, definition LONGVARCHAR, identifier VARCHAR(40), is_obsolete CHAR(1), ontology_id INT NOT NULL, UNIQUE (name,ontology_id) ); -- HSQLDB 1.7.1 UNIQUE BUG -- UNIQUE (identifier) CREATE INDEX term_ont ON term(ontology_id); -- We use the field name "name" instead of "synonym" (which is a reserved word in some RDBMS) CREATE TABLE term_synonym ( name VARCHAR(255) NOT NULL, term_id INT NOT NULL, PRIMARY KEY (term_id,name) ); CREATE TABLE term_dbxref ( term_id INT NOT NULL, dbxref_id INT NOT NULL, rank SMALLINT, PRIMARY KEY (term_id, dbxref_id) ); CREATE INDEX trmdbxref_dbxrefid ON term_dbxref(dbxref_id); CREATE TABLE term_relationship ( term_relationship_id INT NOT NULL IDENTITY, subject_term_id INT NOT NULL, predicate_term_id INT NOT NULL, object_term_id INT NOT NULL, ontology_id INT NOT NULL, UNIQUE (subject_term_id,predicate_term_id,object_term_id,ontology_id) ); CREATE INDEX trmrel_predicateid ON term_relationship(predicate_term_id); CREATE INDEX trmrel_objectid ON term_relationship(object_term_id); CREATE INDEX trmrel_ontid ON term_relationship(ontology_id); CREATE TABLE term_path ( term_path_id INT NOT NULL IDENTITY, subject_term_id INT NOT NULL, predicate_term_id INT NOT NULL, object_term_id INT NOT NULL, ontology_id INT NOT NULL, distance INT, UNIQUE (subject_term_id,predicate_term_id,object_term_id,ontology_id,distance) ); CREATE INDEX trmpath_predicateid ON term_path(predicate_term_id); CREATE INDEX trmpath_objectid ON term_path(object_term_id); CREATE INDEX trmpath_ontid ON term_path(ontology_id); CREATE TABLE bioentry ( bioentry_id INT NOT NULL IDENTITY, biodatabase_id INT NOT NULL, taxon_id INT, name VARCHAR(40) NOT NULL, accession VARCHAR(40) NOT NULL, identifier VARCHAR(40), division VARCHAR(6), description LONGVARCHAR, version SMALLINT NOT NULL, UNIQUE (accession,biodatabase_id,version) ); -- HSQLDB 1.7.1 UNIQUE BUG -- UNIQUE (identifier) CREATE INDEX bioentry_name ON bioentry(name); CREATE INDEX bioentry_db ON bioentry(biodatabase_id); CREATE INDEX bioentry_tax ON bioentry(taxon_id); CREATE TABLE bioentry_relationship ( bioentry_relationship_id INT NOT NULL IDENTITY, object_bioentry_id INT NOT NULL, subject_bioentry_id INT NOT NULL, term_id INT NOT NULL, rank INT, UNIQUE (object_bioentry_id,subject_bioentry_id,term_id) ); CREATE INDEX bioentryrel_trm ON bioentry_relationship(term_id); CREATE INDEX bioentryrel_child ON bioentry_relationship(subject_bioentry_id); CREATE TABLE bioentry_path ( object_bioentry_id INT NOT NULL, subject_bioentry_id INT NOT NULL, term_id INT NOT NULL, distance INT, UNIQUE (object_bioentry_id,subject_bioentry_id,term_id,distance) ); CREATE INDEX bioentrypath_trm ON bioentry_path(term_id); CREATE INDEX bioentrypath_child ON bioentry_path(subject_bioentry_id); CREATE TABLE biosequence ( bioentry_id INT NOT NULL, version SMALLINT, length INT, alphabet VARCHAR(10), seq LONGVARCHAR, PRIMARY KEY (bioentry_id) ); -- add these only if you want them: -- ALTER TABLE biosequence ADD COLUMN ( isoelec_pt NUMERIC(4,2) ); -- ALTER TABLE biosequence ADD COLUMN ( mol_wgt DOUBLE PRECISION ); -- ALTER TABLE biosequence ADD COLUMN ( perc_gc DOUBLE PRECISION ); CREATE TABLE dbxref ( dbxref_id INT NOT NULL IDENTITY, dbname VARCHAR(40) NOT NULL, accession VARCHAR(40) NOT NULL, version SMALLINT NOT NULL, UNIQUE(accession, dbname, version) ); CREATE INDEX dbxref_db ON dbxref(dbname); CREATE TABLE dbxref_qualifier_value ( dbxref_id INT NOT NULL, term_id INT NOT NULL, rank SMALLINT DEFAULT 0 NOT NULL, value LONGVARCHAR, PRIMARY KEY (dbxref_id,term_id,rank) ); CREATE INDEX dbxrefqual_dbx ON dbxref_qualifier_value(dbxref_id); CREATE INDEX dbxrefqual_trm ON dbxref_qualifier_value(term_id); CREATE TABLE bioentry_dbxref ( bioentry_id INT NOT NULL, dbxref_id INT NOT NULL, rank SMALLINT, PRIMARY KEY (bioentry_id,dbxref_id) ); CREATE INDEX dblink_dbx ON bioentry_dbxref(dbxref_id); CREATE TABLE reference ( reference_id INT NOT NULL IDENTITY, dbxref_id INT, location LONGVARCHAR NOT NULL, title LONGVARCHAR, authors LONGVARCHAR NOT NULL, crc VARCHAR(32), UNIQUE (dbxref_id), UNIQUE (crc) ); CREATE TABLE bioentry_reference ( bioentry_id INT NOT NULL, reference_id INT NOT NULL, start_pos INT, end_pos INT, rank SMALLINT DEFAULT 0 NOT NULL, PRIMARY KEY(bioentry_id,reference_id,rank) ); CREATE INDEX bioentryref_ref ON bioentry_reference(reference_id); -- We use the table name "anncomment" instead of "comment" (which is a reserved word in some RDBMS) CREATE TABLE anncomment ( comment_id INT NOT NULL IDENTITY, bioentry_id INT NOT NULL, comment_text LONGVARCHAR NOT NULL, rank SMALLINT DEFAULT 0 NOT NULL, UNIQUE(bioentry_id, rank) ); CREATE TABLE bioentry_qualifier_value ( bioentry_id INT NOT NULL, term_id INT NOT NULL, value LONGVARCHAR, rank INT DEFAULT 0 NOT NULL, UNIQUE (bioentry_id,term_id,rank) ); CREATE INDEX bioentryqual_trm ON bioentry_qualifier_value(term_id); CREATE TABLE seqfeature ( seqfeature_id INT NOT NULL IDENTITY, bioentry_id INT NOT NULL, type_term_id INT NOT NULL, source_term_id INT NOT NULL, display_name VARCHAR(64), rank SMALLINT DEFAULT 0 NOT NULL, UNIQUE (bioentry_id,type_term_id,source_term_id,rank) ); CREATE INDEX seqfeature_trm ON seqfeature(type_term_id); CREATE INDEX seqfeature_fsrc ON seqfeature(source_term_id); CREATE TABLE seqfeature_relationship ( seqfeature_relationship_id INT NOT NULL IDENTITY, object_seqfeature_id INT NOT NULL, subject_seqfeature_id INT NOT NULL, term_id INT NOT NULL, rank INT, UNIQUE (object_seqfeature_id,subject_seqfeature_id,term_id) ); CREATE INDEX seqfeaturerel_trm ON seqfeature_relationship(term_id); CREATE INDEX seqfeaturerel_child ON seqfeature_relationship(subject_seqfeature_id); CREATE TABLE seqfeature_path ( object_seqfeature_id INT NOT NULL, subject_seqfeature_id INT NOT NULL, term_id INT NOT NULL, distance INT, UNIQUE (object_seqfeature_id,subject_seqfeature_id,term_id,distance) ); CREATE INDEX seqfeaturepath_trm ON seqfeature_path(term_id); CREATE INDEX seqfeaturepath_child ON seqfeature_path(subject_seqfeature_id); CREATE TABLE seqfeature_qualifier_value ( seqfeature_id INT NOT NULL, term_id INT NOT NULL, rank SMALLINT DEFAULT 0 NOT NULL, value LONGVARCHAR NOT NULL, PRIMARY KEY (seqfeature_id,term_id,rank) ); CREATE INDEX seqfeaturequal_trm ON seqfeature_qualifier_value(term_id); CREATE TABLE seqfeature_dbxref ( seqfeature_id INT NOT NULL, dbxref_id INT NOT NULL, rank SMALLINT, PRIMARY KEY (seqfeature_id,dbxref_id) ); CREATE INDEX feadblink_dbx ON seqfeature_dbxref(dbxref_id); CREATE TABLE location ( location_id INT NOT NULL IDENTITY, seqfeature_id INT NOT NULL, dbxref_id INT, term_id INT, start_pos INT, end_pos INT, strand TINYINT NOT NULL, rank SMALLINT DEFAULT 0 NOT NULL, UNIQUE (seqfeature_id, rank) ); CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); CREATE INDEX seqfeatureloc_dbx ON location(dbxref_id); CREATE INDEX seqfeatureloc_trm ON location(term_id); CREATE TABLE location_qualifier_value ( location_id INT NOT NULL, term_id INT NOT NULL, value VARCHAR(255) NOT NULL, int_value INT, PRIMARY KEY (location_id,term_id) ); CREATE INDEX locationqual_trm ON location_qualifier_value(term_id); -- -- Create the foreign key constraints -- -- ontology term ALTER TABLE term ADD CONSTRAINT FKont_term FOREIGN KEY (ontology_id) REFERENCES ontology(ontology_id) ON DELETE CASCADE; -- term synonyms ALTER TABLE term_synonym ADD CONSTRAINT FKterm_syn FOREIGN KEY (term_id) REFERENCES term(term_id) ON DELETE CASCADE; -- term_dbxref ALTER TABLE term_dbxref ADD CONSTRAINT FKdbxref_trmdbxref FOREIGN KEY (dbxref_id) REFERENCES dbxref(dbxref_id) ON DELETE CASCADE; ALTER TABLE term_dbxref ADD CONSTRAINT FKterm_trmdbxref FOREIGN KEY (term_id) REFERENCES term(term_id) ON DELETE CASCADE; -- term_relationship ALTER TABLE term_relationship ADD CONSTRAINT FKtrmsubject_trmrel FOREIGN KEY (subject_term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE term_relationship ADD CONSTRAINT FKtrmpredicate_trmrel FOREIGN KEY (predicate_term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE term_relationship ADD CONSTRAINT FKtrmobject_trmrel FOREIGN KEY (object_term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE term_relationship ADD CONSTRAINT FKterm_trmrel FOREIGN KEY (ontology_id) REFERENCES ontology(ontology_id) ON DELETE CASCADE; -- term_path ALTER TABLE term_path ADD CONSTRAINT FKtrmsubject_trmpath FOREIGN KEY (subject_term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE term_path ADD CONSTRAINT FKtrmpredicate_trmpath FOREIGN KEY (predicate_term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE term_path ADD CONSTRAINT FKtrmobject_trmpath FOREIGN KEY (object_term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE term_path ADD CONSTRAINT FKontology_trmpath FOREIGN KEY (ontology_id) REFERENCES ontology(ontology_id) ON DELETE CASCADE; -- taxon, taxon_name -- unfortunately, we can't constrain parent_taxon_id as it is violated -- occasionally by the downloads available from NCBI -- ALTER TABLE taxon ADD CONSTRAINT FKtaxon_taxon -- FOREIGN KEY (parent_taxon_id) REFERENCES taxon(taxon_id); ALTER TABLE taxon_name ADD CONSTRAINT FKtaxon_taxonname FOREIGN KEY (taxon_id) REFERENCES taxon(taxon_id) ON DELETE CASCADE; -- bioentry ALTER TABLE bioentry ADD CONSTRAINT FKtaxon_bioentry FOREIGN KEY (taxon_id) REFERENCES taxon(taxon_id) ON DELETE CASCADE; ALTER TABLE bioentry ADD CONSTRAINT FKbiodatabase_bioentry FOREIGN KEY (biodatabase_id) REFERENCES biodatabase(biodatabase_id) ON DELETE CASCADE; -- bioentry_relationship ALTER TABLE bioentry_relationship ADD CONSTRAINT FKterm_bioentryrel FOREIGN KEY (term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE bioentry_relationship ADD CONSTRAINT FKparentent_bioentryrel FOREIGN KEY (object_bioentry_id) REFERENCES bioentry(bioentry_id) ON DELETE CASCADE; ALTER TABLE bioentry_relationship ADD CONSTRAINT FKchildent_bioentryrel FOREIGN KEY (subject_bioentry_id) REFERENCES bioentry(bioentry_id) ON DELETE CASCADE; -- bioentry_path ALTER TABLE bioentry_path ADD CONSTRAINT FKterm_bioentrypath FOREIGN KEY (term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE bioentry_path ADD CONSTRAINT FKparentent_bioentrypath FOREIGN KEY (object_bioentry_id) REFERENCES bioentry(bioentry_id) ON DELETE CASCADE; ALTER TABLE bioentry_path ADD CONSTRAINT FKchildent_bioentrypath FOREIGN KEY (subject_bioentry_id) REFERENCES bioentry(bioentry_id) ON DELETE CASCADE; -- biosequence ALTER TABLE biosequence ADD CONSTRAINT FKbioentry_bioseq FOREIGN KEY (bioentry_id) REFERENCES bioentry(bioentry_id) ON DELETE CASCADE; -- comment ALTER TABLE anncomment ADD CONSTRAINT FKbioentry_comment FOREIGN KEY(bioentry_id) REFERENCES bioentry(bioentry_id) ON DELETE CASCADE; -- bioentry_dbxref ALTER TABLE bioentry_dbxref ADD CONSTRAINT FKbioentry_dblink FOREIGN KEY (bioentry_id) REFERENCES bioentry(bioentry_id) ON DELETE CASCADE; ALTER TABLE bioentry_dbxref ADD CONSTRAINT FKdbxref_dblink FOREIGN KEY (dbxref_id) REFERENCES dbxref(dbxref_id) ON DELETE CASCADE; -- dbxref_qualifier_value ALTER TABLE dbxref_qualifier_value ADD CONSTRAINT FKtrm_dbxrefqual FOREIGN KEY (term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE dbxref_qualifier_value ADD CONSTRAINT FKdbxref_dbxrefqual FOREIGN KEY (dbxref_id) REFERENCES dbxref(dbxref_id) ON DELETE CASCADE; -- bioentry_reference ALTER TABLE bioentry_reference ADD CONSTRAINT FKbioentry_entryref FOREIGN KEY (bioentry_id) REFERENCES bioentry(bioentry_id) ON DELETE CASCADE; ALTER TABLE bioentry_reference ADD CONSTRAINT FKreference_entryref FOREIGN KEY (reference_id) REFERENCES reference(reference_id) ON DELETE CASCADE; -- bioentry_qualifier_value ALTER TABLE bioentry_qualifier_value ADD CONSTRAINT FKbioentry_entqual FOREIGN KEY (bioentry_id) REFERENCES bioentry(bioentry_id) ON DELETE CASCADE; ALTER TABLE bioentry_qualifier_value ADD CONSTRAINT FKterm_entqual FOREIGN KEY (term_id) REFERENCES term(term_id) ON DELETE CASCADE; -- reference ALTER TABLE reference ADD CONSTRAINT FKdbxref_reference FOREIGN KEY ( dbxref_id ) REFERENCES dbxref ( dbxref_id ) ; -- seqfeature ALTER TABLE seqfeature ADD CONSTRAINT FKterm_seqfeature FOREIGN KEY (type_term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE seqfeature ADD CONSTRAINT FKsourceterm_seqfeature FOREIGN KEY (source_term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE seqfeature ADD CONSTRAINT FKbioentry_seqfeature FOREIGN KEY (bioentry_id) REFERENCES bioentry(bioentry_id) ON DELETE CASCADE; -- seqfeature_relationship ALTER TABLE seqfeature_relationship ADD CONSTRAINT FKterm_seqfeatrel FOREIGN KEY (term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE seqfeature_relationship ADD CONSTRAINT FKparentfeat_seqfeatrel FOREIGN KEY (object_seqfeature_id) REFERENCES seqfeature(seqfeature_id) ON DELETE CASCADE; ALTER TABLE seqfeature_relationship ADD CONSTRAINT FKchildfeat_seqfeatrel FOREIGN KEY (subject_seqfeature_id) REFERENCES seqfeature(seqfeature_id) ON DELETE CASCADE; -- seqfeature_path ALTER TABLE seqfeature_path ADD CONSTRAINT FKterm_seqfeatpath FOREIGN KEY (term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE seqfeature_path ADD CONSTRAINT FKparentfeat_seqfeatpath FOREIGN KEY (object_seqfeature_id) REFERENCES seqfeature(seqfeature_id) ON DELETE CASCADE; ALTER TABLE seqfeature_path ADD CONSTRAINT FKchildfeat_seqfeatpath FOREIGN KEY (subject_seqfeature_id) REFERENCES seqfeature(seqfeature_id) ON DELETE CASCADE; -- seqfeature_qualifier_value ALTER TABLE seqfeature_qualifier_value ADD CONSTRAINT FKterm_featqual FOREIGN KEY (term_id) REFERENCES term(term_id) ON DELETE CASCADE; ALTER TABLE seqfeature_qualifier_value ADD CONSTRAINT FKseqfeature_featqual FOREIGN KEY (seqfeature_id) REFERENCES seqfeature(seqfeature_id) ON DELETE CASCADE; -- seqfeature_dbxref ALTER TABLE seqfeature_dbxref ADD CONSTRAINT FKseqfeature_feadblink FOREIGN KEY (seqfeature_id) REFERENCES seqfeature(seqfeature_id) ON DELETE CASCADE; ALTER TABLE seqfeature_dbxref ADD CONSTRAINT FKdbxref_feadblink FOREIGN KEY (dbxref_id) REFERENCES dbxref(dbxref_id) ON DELETE CASCADE; -- location ALTER TABLE location ADD CONSTRAINT FKseqfeature_location FOREIGN KEY (seqfeature_id) REFERENCES seqfeature(seqfeature_id) ON DELETE CASCADE; ALTER TABLE location ADD CONSTRAINT FKdbxref_location FOREIGN KEY (dbxref_id) REFERENCES dbxref(dbxref_id) ON DELETE CASCADE; ALTER TABLE location ADD CONSTRAINT FKterm_featloc FOREIGN KEY (term_id) REFERENCES term(term_id) ON DELETE CASCADE; -- location_qualifier_value ALTER TABLE location_qualifier_value ADD CONSTRAINT FKfeatloc_locqual FOREIGN KEY (location_id) REFERENCES location(location_id) ON DELETE CASCADE; ALTER TABLE location_qualifier_value ADD CONSTRAINT FKterm_locqual FOREIGN KEY (term_id) REFERENCES term(term_id) ON DELETE CASCADE; -------------- next part -------------- -- For some reason we need to drop the constraints explicitly -- There's probably a correct way to do this. ALTER TABLE term DROP CONSTRAINT FKont_term; ALTER TABLE term_synonym DROP CONSTRAINT FKterm_syn; ALTER TABLE term_dbxref DROP CONSTRAINT FKdbxref_trmdbxref; ALTER TABLE term_dbxref DROP CONSTRAINT FKterm_trmdbxref; ALTER TABLE term_relationship DROP CONSTRAINT FKtrmsubject_trmrel; ALTER TABLE term_relationship DROP CONSTRAINT FKtrmpredicate_trmrel; ALTER TABLE term_relationship DROP CONSTRAINT FKtrmobject_trmrel; ALTER TABLE term_relationship DROP CONSTRAINT FKterm_trmrel; ALTER TABLE term_path DROP CONSTRAINT FKtrmsubject_trmpath; ALTER TABLE term_path DROP CONSTRAINT FKtrmpredicate_trmpath; ALTER TABLE term_path DROP CONSTRAINT FKtrmobject_trmpath; ALTER TABLE term_path DROP CONSTRAINT FKontology_trmpath; -- ALTER TABLE taxon DROP CONSTRAINT FKtaxon_taxon; ALTER TABLE taxon_name DROP CONSTRAINT FKtaxon_taxonname; ALTER TABLE bioentry DROP CONSTRAINT FKtaxon_bioentry; ALTER TABLE bioentry DROP CONSTRAINT FKbiodatabase_bioentry; ALTER TABLE bioentry_relationship DROP CONSTRAINT FKterm_bioentryrel; ALTER TABLE bioentry_relationship DROP CONSTRAINT FKparentent_bioentryrel; ALTER TABLE bioentry_relationship DROP CONSTRAINT FKchildent_bioentryrel; ALTER TABLE bioentry_path DROP CONSTRAINT FKterm_bioentrypath; ALTER TABLE bioentry_path DROP CONSTRAINT FKparentent_bioentrypath; ALTER TABLE bioentry_path DROP CONSTRAINT FKchildent_bioentrypath; ALTER TABLE biosequence DROP CONSTRAINT FKbioentry_bioseq; ALTER TABLE anncomment DROP CONSTRAINT FKbioentry_comment; ALTER TABLE bioentry_dbxref DROP CONSTRAINT FKbioentry_dblink; ALTER TABLE bioentry_dbxref DROP CONSTRAINT FKdbxref_dblink; ALTER TABLE dbxref_qualifier_value DROP CONSTRAINT FKtrm_dbxrefqual; ALTER TABLE dbxref_qualifier_value DROP CONSTRAINT FKdbxref_dbxrefqual; ALTER TABLE bioentry_reference DROP CONSTRAINT FKbioentry_entryref; ALTER TABLE bioentry_reference DROP CONSTRAINT FKreference_entryref; ALTER TABLE bioentry_qualifier_value DROP CONSTRAINT FKbioentry_entqual; ALTER TABLE bioentry_qualifier_value DROP CONSTRAINT FKterm_entqual; ALTER TABLE reference DROP CONSTRAINT FKdbxref_reference; ALTER TABLE seqfeature DROP CONSTRAINT FKterm_seqfeature; ALTER TABLE seqfeature DROP CONSTRAINT FKsourceterm_seqfeature; ALTER TABLE seqfeature DROP CONSTRAINT FKbioentry_seqfeature; ALTER TABLE seqfeature_relationship DROP CONSTRAINT FKterm_seqfeatrel; ALTER TABLE seqfeature_relationship DROP CONSTRAINT FKparentfeat_seqfeatrel; ALTER TABLE seqfeature_relationship DROP CONSTRAINT FKchildfeat_seqfeatrel; ALTER TABLE seqfeature_path DROP CONSTRAINT FKterm_seqfeatpath; ALTER TABLE seqfeature_path DROP CONSTRAINT FKparentfeat_seqfeatpath; ALTER TABLE seqfeature_path DROP CONSTRAINT FKchildfeat_seqfeatpath; ALTER TABLE seqfeature_qualifier_value DROP CONSTRAINT FKterm_featqual; ALTER TABLE seqfeature_qualifier_value DROP CONSTRAINT FKseqfeature_featqual; ALTER TABLE seqfeature_dbxref DROP CONSTRAINT FKseqfeature_feadblink; ALTER TABLE seqfeature_dbxref DROP CONSTRAINT FKdbxref_feadblink; ALTER TABLE location DROP CONSTRAINT FKseqfeature_location; ALTER TABLE location DROP CONSTRAINT FKdbxref_location; ALTER TABLE location DROP CONSTRAINT FKterm_featloc; ALTER TABLE location_qualifier_value DROP CONSTRAINT FKfeatloc_locqual; ALTER TABLE location_qualifier_value DROP CONSTRAINT FKterm_locqual; DROP TABLE anncomment; DROP TABLE biodatabase; DROP TABLE bioentry; DROP TABLE bioentry_dbxref; DROP TABLE bioentry_path; DROP TABLE bioentry_qualifier_value; DROP TABLE bioentry_reference; DROP TABLE bioentry_relationship; DROP TABLE biosequence; DROP TABLE dbxref; DROP TABLE dbxref_qualifier_value; DROP TABLE location; DROP TABLE location_qualifier_value; DROP TABLE ontology; DROP TABLE reference; DROP TABLE seqfeature; DROP TABLE seqfeature_dbxref; DROP TABLE seqfeature_path; DROP TABLE seqfeature_qualifier_value; DROP TABLE seqfeature_relationship; DROP TABLE taxon; DROP TABLE taxon_name; DROP TABLE term; DROP TABLE term_dbxref; DROP TABLE term_path; DROP TABLE term_relationship; DROP TABLE term_synonym; -- Hsqldb complains about indexes not existing -- maybe it automatically -- deletes them when you remove the table? --DROP INDEX bioentrypath_parent ON bioentry_pathobject_bioentry_id); --DROP INDEX bioentryrel_parent; --DROP INDEX ontrel_subjectid ON term_relationshipsubject_term_id); --DROP INDEX seqfeature_bioentryid ON seqfeaturebioentry_id); --DROP INDEX seqfeaturerel_parent ON seqfeature_pathobject_seqfeature_id); --DROP INDEX seqfeaturerel_parent ON seqfeature_relationshipobject_seqfeature_id); --DROP INDEX trmpath_subjectid ON term_pathsubject_term_id); -- DROP INDEX bioentry_db; -- DROP INDEX bioentry_name; -- DROP INDEX bioentry_tax; -- DROP INDEX bioentrypath_child; -- DROP INDEX bioentrypath_trm; -- DROP INDEX bioentryqual_trm; -- DROP INDEX bioentryref_ref; -- DROP INDEX bioentryrel_child; -- DROP INDEX bioentryrel_trm; -- DROP INDEX db_auth; -- DROP INDEX dblink_dbx; -- DROP INDEX dbxref_db; -- DROP INDEX dbxrefqual_dbx; -- DROP INDEX dbxrefqual_trm; -- DROP INDEX feadblink_dbx; -- DROP INDEX locationqual_trm; -- DROP INDEX seqfeature_fsrc; -- DROP INDEX seqfeature_trm; -- DROP INDEX seqfeatureloc_dbx; -- DROP INDEX seqfeatureloc_start; -- DROP INDEX seqfeatureloc_trm; -- DROP INDEX seqfeaturepath_child; -- DROP INDEX seqfeaturepath_trm; -- DROP INDEX seqfeaturequal_trm; -- DROP INDEX seqfeaturerel_child; -- DROP INDEX seqfeaturerel_trm; -- DROP INDEX taxnamename; -- DROP INDEX taxnametaxonid; -- DROP INDEX taxparent; -- DROP INDEX term_ont; -- DROP INDEX trmdbxref_dbxrefid; -- DROP INDEX trmpath_objectid; -- DROP INDEX trmpath_ontid; -- DROP INDEX trmpath_predicateid; -- DROP INDEX trmrel_objectid; -- DROP INDEX trmrel_ontid; -- DROP INDEX trmrel_predicateid; From hlapp at gnf.org Wed Oct 22 01:55:25 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Oct 22 01:52:20 2003 Subject: [BioSQL-l] Re: HSQLDB support... In-Reply-To: Message-ID: <550F2B16-0454-11D8-B5E7-000A959EB4C4@gnf.org> Is HSQLDB the HyperSonic engine? Out of curiosity, it it good for any real use? Also, Len, would you be willing to provide maintenance for the hsqldb schema upon schema changes or fixes? In any case, thanks for your submission. -hilmar On Tuesday, October 21, 2003, at 07:58 PM, Len Trigg wrote: > Hilmar Lapp wrote: >> Sounds reasonable to me. Also, would be consistent with the other 2 >> supported platforms. > > This reminds me, I ported the BioSQL schema to support HSQLDB, and > added the neccessary support to BioJava's BioSQL binding. We use > HSQLDB in our BioSQL unit tests, and I have seen references to other > people making use of it. I have attached the appropriate schema files, > if you wish include them as part of the BioSQL project (in which case > BioSQL CVS can become the canonical location). > > > Cheers, > Len. > >

-- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From len at reeltwo.com Wed Oct 22 18:21:24 2003 From: len at reeltwo.com (Len Trigg) Date: Wed Oct 22 18:18:55 2003 Subject: [BioSQL-l] Re: HSQLDB support... In-Reply-To: <550F2B16-0454-11D8-B5E7-000A959EB4C4@gnf.org> References: <550F2B16-0454-11D8-B5E7-000A959EB4C4@gnf.org> Message-ID: Hilmar Lapp wrote: > Is HSQLDB the HyperSonic engine? I believe it's the successor to the HyperSonic engine. See http://hsqldb.sourceforge.net/ > Out of curiosity, it it good for any > real use? Depends on what you consider real use :-). It provides an easy way to create unit tests for BioJava/BioSQL, creating, manipulating and destroying small databases (and I've found it extemely useful for tracking down and fixing bugs in the BioJava/BioSQL binding). Secondly, it makes it possible to develop self-contained, cross-platform applications (e.g.: distribute an application containing data in BioSQL database on a CD). I haven't had a chance to actually do this yet for a BioJava application, but we (Reel Two) have used HSQLDB for another unrelated application on a CD and it was fine. However, I haven't tried loading large amounts of data into it. I would imagine it is not the most efficient, but don't have any data to back this up. It may also be of limited use for non-java people (other than the fact that it's probably a lot easier to use for individual users who don't have administrative access to install "regular" databases on their system). > Also, Len, would you be willing to provide maintenance for the hsqldb > schema upon schema changes or fixes? Sure. Cheers, Len. From ericp at genoscope.cns.fr Thu Oct 23 12:22:01 2003 From: ericp at genoscope.cns.fr (Eric Pelletier) Date: Thu Oct 23 12:19:09 2003 Subject: [BioSQL-l] Sequence loading in biosql MySQL sooooo long Message-ID: Hi all, I just deployed biosql from cvs few days ago, populated it with taxonomy and ontology data in a really decent time, but I experience a very long loading time fro loading sequence data. The command used is : perl bioperl/db/scripts/biosql/load_seqdatabase.pl --host masaya7 --dbname biosql --namespace bioperl --format swiss --dbuser ericp --dbpass xxxxxx --lookup --noupdate Tmp/sprot.dat The loading is of about 1 complete entry each 25 seconds... I logged the queries on the MySQL server side, and observed that only a single query blocks the system for each entry : SELECT name.name, node.node_rank, node.left_value FROM taxon node, taxon taxon, taxon_name name WHERE name.taxon_id = node.taxon_id AND taxon.left_value > node.left_value and taxon.left_value < node.right_value AND taxon.taxon_id = '1056' AND name.name_class = 'scientific name' ORDER by node.left_value; However, the same query, without the ORDER BY statement is quite fast. And the indexes seems to be OK for this table : mysql> show index from taxon; +-------+------------+---------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +-------+------------+---------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+ | taxon | 0 | PRIMARY | 1 | taxon_id | A | 193316 | NULL | NULL | | BTREE | | | taxon | 0 | ncbi_taxon_id | 1 | ncbi_taxon_id | A | 193316 | NULL | NULL | YES | BTREE | | | taxon | 0 | left_value | 1 | left_value | A | 193316 | NULL | NULL | YES | BTREE | | | taxon | 0 | right_value | 1 | right_value | A | 193316 | NULL | NULL | YES | BTREE | | | taxon | 1 | taxparent | 1 | parent_taxon_id | A | 96658 | NULL | NULL | YES | BTREE | | | taxon | 1 | test | 1 | taxon_id | A | 193316 | NULL | NULL | | BTREE | | | taxon | 1 | test | 2 | left_value | A | 193316 | NULL | NULL | YES | BTREE | | | taxon | 1 | test | 3 | right_value | A | 193316 | NULL | NULL | YES | BTREE | | +-------+------------+---------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+ If I understand well the benchmarks, I would expect a speed of about 50 entries per second. So, . Does someone experiments the same effect ? . Does someone has any idea of what may happen here ? Thanks a lot. -- Eric Pelletier From hlapp at gnf.org Thu Oct 23 14:05:06 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Oct 23 14:01:55 2003 Subject: [BioSQL-l] Sequence loading in biosql MySQL sooooo long In-Reply-To: Message-ID: On 10/23/03 9:22 AM, "Eric Pelletier" wrote: > I logged the queries on the MySQL server side, and observed that > only a single query blocks the system for each entry : > > SELECT name.name, node.node_rank, node.left_value FROM taxon > node, taxon taxon, taxon_name name WHERE name.taxon_id = > node.taxon_id AND taxon.left_value > node.left_value and > taxon.left_value < node.right_value AND taxon.taxon_id = '1056' > AND name.name_class = 'scientific name' ORDER by node.left_value; > > > However, the same query, without the ORDER BY statement is quite > fast. Hmm - MySQL's query planner srews up completely. The ORDER BY should add a millisecond in a very poor implementation, since it is about sorting 10-30 rows. So much for the world's fastest database ... Which version of MySQL are you running? I suppose it's a rather recent one, as I haven't had that problem with mine (3.23.54). Are you supposed to run index and/or table statistics for that version once in a while, and you forgot to do that? If you had been talking about Pg, that would have been the most probable cause. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From simon.foote at nrc-cnrc.gc.ca Thu Oct 23 16:09:21 2003 From: simon.foote at nrc-cnrc.gc.ca (Simon Foote) Date: Thu Oct 23 16:09:46 2003 Subject: [BioSQL-l] Re: [Biojava-l] ontology exception, addSequence & BioSQLSequenceD B In-Reply-To: References: Message-ID: <3F983571.5030909@nrc-cnrc.gc.ca> Hilmar Lapp wrote: >On 10/16/03 8:41 AM, "Simon Foote" wrote: > > > >>I also cc'd Hilmar, for possibly changing this in the biosql schema for >>MySQL. I far as I can tell it doesn't cause any problems with existing >>databases as it only effects sorting and comparing of the field. >> >> >> > >You mean making all VARCHAR columns BINARY in the mysql version that >constitute or are part of an alternative key? > > > Yep, any VARCHAR field that is part of key where case sensitivity comes into play should be BINARY. Simon >Sounds reasonable to me. Also, would be consistent with the other 2 >supported platforms. > >Does anybody on this list have an opinion or comment for or against doing >so? > > -hilmar > > From ericp at genoscope.cns.fr Fri Oct 24 05:53:27 2003 From: ericp at genoscope.cns.fr (Eric Pelletier) Date: Fri Oct 24 05:50:36 2003 Subject: [BioSQL-l] Sequence loading in biosql MySQL sooooo long In-Reply-To: References: Message-ID: On Thu, 23 Oct 2003, Hilmar Lapp wrote: |Hmm - MySQL's query planner srews up completely. The ORDER BY should add a |millisecond in a very poor implementation, since it is about sorting 10-30 |rows. So much for the world's fastest database ... I do agree ;-> |Which version of MySQL are you running? I suppose it's a rather recent one, |as I haven't had that problem with mine (3.23.54). Server version: 4.0.15a-log Client version: Ver 11.18 Distrib 3.23.54, for dec-osf4.0f (alphaev6) -- Eric Pelletier