From ah at firewall.biotec.tu-dresden.de Thu Jul 1 09:29:03 2004 From: ah at firewall.biotec.tu-dresden.de (Andreas Henschel) Date: Thu Jul 1 09:33:25 2004 Subject: [BioSQL-l] GO dbxrefs in swissprot Message-ID: <40E4119F.2040401@biotec.tu-dresden.de> Hi all, We just installed swissprot as the first database in the bioseqdb scheme (using load_seqdatabase.pl) and apparently it looked succesful (using --safe). However, a lot of swissprot entries dont get the Gene ontology (GO) cross referencing at all, although they are contained in the original flatfile. Strange enough, its not that all GO dbxrefs get ignored: the bioseqdb.dbxref table contains more then 9000 GO entries. The swissprot flatfile contains more than 52000, though (tested with grep). We then tested a single swissprot entry (P53396, containing GO dbxrefs) to be loaded, which did not give any error message. But again, the GO-crossrefs did not appear (InterPro, EMBL etc. worked fine). Any idea? Cheers, Andreas Henschel PhD student, BioTec Dresden From hlapp at gnf.org Thu Jul 1 12:05:13 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Jul 1 12:07:21 2004 Subject: [BioSQL-l] GO dbxrefs in swissprot In-Reply-To: <40E4119F.2040401@biotec.tu-dresden.de> Message-ID: <6FA30A4F-CB78-11D8-84DF-000A959EB4C4@gnf.org> On Thursday, July 1, 2004, at 06:29 AM, Andreas Henschel wrote: > Strange enough, its not that all GO dbxrefs get ignored: the > bioseqdb.dbxref table contains more then 9000 GO entries. The > swissprot flatfile contains more than 52000, though (tested with > grep). How many unique entries are these though? Keep in mind that the dbxref table is normalized. When you say the GO dbxrefs did not appear, how do you mean? Are you referring to dbxrefs present in the source file but absent as association rows in bioentry_dbxref? If you have a swissprot entry that has GO dbxrefs in the source file but fails to have those associated in bioentry_dbxref, check whether the Bio::Seq object that's coming from the parser has them as annotation. It would sound strange if some entries get the associations whereas others don't. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From henschel at mpi-cbg.de Fri Jul 2 06:16:06 2004 From: henschel at mpi-cbg.de (Andreas Henschel) Date: Fri Jul 2 06:18:57 2004 Subject: [BioSQL-l] GO dbxrefs in swissprot In-Reply-To: <6FA30A4F-CB78-11D8-84DF-000A959EB4C4@gnf.org> References: <6FA30A4F-CB78-11D8-84DF-000A959EB4C4@gnf.org> Message-ID: <40E535E6.8050608@mpi-cbg.de> Hi Hilmar, Thanks for your reply. I was wondering if it is due to my patched bioperl 1.2.1? Hilmar Lapp wrote: > When you say the GO dbxrefs did not appear, how do you mean? Are you > referring to dbxrefs present in the source file but absent as > association rows in bioentry_dbxref? > Yes! > If you have a swissprot entry that has GO dbxrefs in the source file > but fails to have those associated in bioentry_dbxref, check whether > the Bio::Seq object that's coming from the parser has them as > annotation. It would sound strange if some entries get the > associations whereas others don't. > Ok, here is what I did: I modified load_seqdatabase.pl to print out the annotions. I ran it, comparing two small flatfiles, both containing GO annotations (according to flatfile and swissprot website). For the first, the parser detected no GO annotation, where as the latter got it: $prompt> perl load_seqdatabase.pl --host dbserver --dbuser ah --dbname bioseqdb --namespace swissprot --format swiss --lookup --remove --testonly P53396.dat Annotation dblink stringified value Direct database link to X64330 in database EMBL Annotation dblink stringified value Direct database link to U18197 in database EMBL Annotation dblink stringified value Direct database link to BC006195 in database EMBL Annotation dblink stringified value Direct database link to S21173 in database PIR Annotation dblink stringified value Direct database link to P07459 in database HSSP Annotation dblink stringified value Direct database link to HGNC:115 in database Genew Annotation dblink stringified value Direct database link to P53396 in database GK Annotation dblink stringified value Direct database link to 108728 in database MIM Annotation dblink stringified value Direct database link to IPR002020 in database InterPro Annotation dblink stringified value Direct database link to IPR003781 in database InterPro Annotation dblink stringified value Direct database link to IPR005811 in database InterPro Annotation dblink stringified value Direct database link to IPR005810 in database InterPro Annotation dblink stringified value Direct database link to IPR005809 in database InterPro Annotation dblink stringified value Direct database link to PF02629 in database Pfam Annotation dblink stringified value Direct database link to PF00549 in database Pfam Annotation dblink stringified value Direct database link to PS01216 in database PROSITE Annotation dblink stringified value Direct database link to PS00399 in database PROSITE Annotation dblink stringified value Direct database link to PS01217 in database PROSITE $prompt> perl load_seqdatabase.pl --host dbserver --dbuser ah --dbname bioseqdb --namespace swissprot --format swiss --lookup --remove --testonly Q15777.dat Loading Q15777.dat ... Annotation dblink stringified value Direct database link to U57911 in database EMBL Annotation dblink stringified value Direct database link to BC031582 in database EMBL Annotation dblink stringified value Direct database link to HGNC:1180 in database Genew Annotation dblink stringified value Direct database link to 600911 in database MIM Annotation dblink stringified value Direct database link to GO:0007399 in database GO Annotation dblink stringified value Direct database link to IPR004843 in database InterPro Annotation dblink stringified value Direct database link to PF00149 in database Pfam The corresponding DR entries in the two flat files are: P53396.dat: DR EMBL; X64330; CAA45614.1; -. DR EMBL; U18197; AAB60340.1; -. DR EMBL; BC006195; AAH06195.1; -. DR PIR; S21173; S21173. DR HSSP; P07459; 1JKJ. DR Genew; HGNC:115; ACLY. DR GK; P53396; -. DR MIM; 108728; -. DR GO; GO:0009346; C:citrate lyase complex; TAS. DR GO; GO:0003878; F:ATP citrate synthase activity; TAS. DR GO; GO:0006200; P:ATP catabolism; TAS. DR GO; GO:0006101; P:citrate metabolism; TAS. DR GO; GO:0015936; P:coenzyme A metabolism; TAS. DR InterPro; IPR002020; Citrate_synth. DR InterPro; IPR003781; CoA_binding. DR InterPro; IPR005811; CoA_ligase. DR InterPro; IPR005810; CoA_lig_alpha. DR InterPro; IPR005809; CoA_lig_beta. DR Pfam; PF02629; CoA_binding; 1. DR Pfam; PF00549; Ligase_CoA; 1. DR PROSITE; PS01216; SUCCINYL_COA_LIG_1; 1. DR PROSITE; PS00399; SUCCINYL_COA_LIG_2; 1. DR PROSITE; PS01217; SUCCINYL_COA_LIG_3; 1. Q15777.dat: DR EMBL; U57911; AAC50564.1; -. DR EMBL; BC031582; AAH31582.1; -. DR Genew; HGNC:1180; C11orf8. DR MIM; 600911; -. DR GO; GO:0007399; P:neurogenesis; TAS. DR InterPro; IPR004843; M-ppestrase. DR Pfam; PF00149; Metallophos; 1. Cheers Andreas From hlapp at gnf.org Fri Jul 2 12:48:16 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Jul 2 12:50:21 2004 Subject: [BioSQL-l] Re: GO dbxrefs in swissprot In-Reply-To: <40E535E6.8050608@mpi-cbg.de> Message-ID: <9D91A0A8-CC47-11D8-B628-000A959EB4C4@gnf.org> Pretty weird what you describe if it works for one entry but not another. Also, the DR lines don't look suspiciously different. If there's no direct reason that prevents you from doing so you should definitely upgrade to the 1.4.x series, possibly even to the latest version of the stable branch from CVS. There were quite some fixes meanwhile, some of which do affect how sequences get loaded into biosql because the affect the annotation bundle. Let me know if the problem persists after the upgrade, and if it does send me the two files. I'm also cc'ing this to the bioperl list because it is really a bioperl problem, not a biosql-related one. -hilmar On Friday, July 2, 2004, at 03:16 AM, Andreas Henschel wrote: > Hi Hilmar, > > Thanks for your reply. I was wondering if it is due to my patched > bioperl 1.2.1? > Hilmar Lapp wrote: > >> When you say the GO dbxrefs did not appear, how do you mean? Are you >> referring to dbxrefs present in the source file but absent as >> association rows in bioentry_dbxref? >> > Yes! > >> If you have a swissprot entry that has GO dbxrefs in the source file >> but fails to have those associated in bioentry_dbxref, check whether >> the Bio::Seq object that's coming from the parser has them as >> annotation. It would sound strange if some entries get the >> associations whereas others don't. >> > Ok, here is what I did: I modified load_seqdatabase.pl to print out > the annotions. I ran it, comparing two small flatfiles, both > containing GO annotations (according to flatfile and swissprot > website). > For the first, the parser detected no GO annotation, where as the > latter got it: > > $prompt> perl load_seqdatabase.pl --host dbserver --dbuser ah --dbname > bioseqdb --namespace swissprot --format swiss --lookup --remove > --testonly P53396.dat > > Annotation dblink stringified value Direct database link to X64330 in > database EMBL > Annotation dblink stringified value Direct database link to U18197 in > database EMBL > Annotation dblink stringified value Direct database link to BC006195 > in database EMBL > Annotation dblink stringified value Direct database link to S21173 in > database PIR > Annotation dblink stringified value Direct database link to P07459 in > database HSSP > Annotation dblink stringified value Direct database link to HGNC:115 > in database Genew > Annotation dblink stringified value Direct database link to P53396 in > database GK > Annotation dblink stringified value Direct database link to 108728 in > database MIM > Annotation dblink stringified value Direct database link to IPR002020 > in database InterPro > Annotation dblink stringified value Direct database link to IPR003781 > in database InterPro > Annotation dblink stringified value Direct database link to IPR005811 > in database InterPro > Annotation dblink stringified value Direct database link to IPR005810 > in database InterPro > Annotation dblink stringified value Direct database link to IPR005809 > in database InterPro > Annotation dblink stringified value Direct database link to PF02629 in > database Pfam > Annotation dblink stringified value Direct database link to PF00549 in > database Pfam > Annotation dblink stringified value Direct database link to PS01216 in > database PROSITE > Annotation dblink stringified value Direct database link to PS00399 in > database PROSITE > Annotation dblink stringified value Direct database link to PS01217 in > database PROSITE > > $prompt> perl load_seqdatabase.pl --host dbserver --dbuser ah --dbname > bioseqdb --namespace swissprot --format swiss --lookup --remove > --testonly Q15777.dat > Loading Q15777.dat ... > > Annotation dblink stringified value Direct database link to U57911 in > database EMBL > Annotation dblink stringified value Direct database link to BC031582 > in database EMBL > Annotation dblink stringified value Direct database link to HGNC:1180 > in database Genew > Annotation dblink stringified value Direct database link to 600911 in > database MIM > Annotation dblink stringified value Direct database link to GO:0007399 > in database GO > Annotation dblink stringified value Direct database link to IPR004843 > in database InterPro > Annotation dblink stringified value Direct database link to PF00149 in > database Pfam > > > The corresponding DR entries in the two flat files are: > P53396.dat: > DR EMBL; X64330; CAA45614.1; -. > DR EMBL; U18197; AAB60340.1; -. > DR EMBL; BC006195; AAH06195.1; -. > DR PIR; S21173; S21173. > DR HSSP; P07459; 1JKJ. > DR Genew; HGNC:115; ACLY. > DR GK; P53396; -. > DR MIM; 108728; -. > DR GO; GO:0009346; C:citrate lyase complex; TAS. > DR GO; GO:0003878; F:ATP citrate synthase activity; TAS. > DR GO; GO:0006200; P:ATP catabolism; TAS. > DR GO; GO:0006101; P:citrate metabolism; TAS. > DR GO; GO:0015936; P:coenzyme A metabolism; TAS. > DR InterPro; IPR002020; Citrate_synth. > DR InterPro; IPR003781; CoA_binding. > DR InterPro; IPR005811; CoA_ligase. > DR InterPro; IPR005810; CoA_lig_alpha. > DR InterPro; IPR005809; CoA_lig_beta. > DR Pfam; PF02629; CoA_binding; 1. > DR Pfam; PF00549; Ligase_CoA; 1. > DR PROSITE; PS01216; SUCCINYL_COA_LIG_1; 1. > DR PROSITE; PS00399; SUCCINYL_COA_LIG_2; 1. > DR PROSITE; PS01217; SUCCINYL_COA_LIG_3; 1. > > Q15777.dat: > DR EMBL; U57911; AAC50564.1; -. > DR EMBL; BC031582; AAH31582.1; -. > DR Genew; HGNC:1180; C11orf8. > DR MIM; 600911; -. > DR GO; GO:0007399; P:neurogenesis; TAS. > DR InterPro; IPR004843; M-ppestrase. > DR Pfam; PF00149; Metallophos; 1. > > Cheers > Andreas > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From henschel at mpi-cbg.de Tue Jul 6 09:14:42 2004 From: henschel at mpi-cbg.de (Andreas Henschel) Date: Tue Jul 6 09:17:26 2004 Subject: [BioSQL-l] Re: GO dbxrefs in swissprot In-Reply-To: <9D91A0A8-CC47-11D8-B628-000A959EB4C4@gnf.org> References: <9D91A0A8-CC47-11D8-B628-000A959EB4C4@gnf.org> Message-ID: <40EAA5C2.8090801@mpi-cbg.de> Hilmar Lapp wrote: > Pretty weird what you describe if it works for one entry but not > another. Also, the DR lines don't look suspiciously different. > > If there's no direct reason that prevents you from doing so you should > definitely upgrade to the 1.4.x series, possibly even to the latest > version of the stable branch from CVS. There were quite some fixes > meanwhile, some of which do affect how sequences get loaded into > biosql because the affect the annotation bundle. Hi Hilmar, I installed bioperl from cvs and repopulated the swissprot db into BioSQL. The entries I checked so far are apparantly correct. With the particular example I found that it was obviously a bug in the sequence annotation parsing of the 1.2.1 version. Sorry for having bothered you with versioning, I simply trusted the biosql installation instructions that claimed a patched 1.2.1 would do. What still puzzles me is the size of the database: starting with a 543MB flatfile, the first run (with the faulty parser) gave me 600MB database and 9100 GO annotations. After the rerun with load_seqdatabase (...) --lookup --remove I get 1.1GB database but only 5100 GO annotations in the dbxref table. Is this due to the normalization? Is there a full list of parseable databases (GenBank, EMBL, ENSEMBL?, PDB? etc) and the resp. place to download? Thanks again. Cheers, Andreas From hlapp at gnf.org Tue Jul 6 15:17:10 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jul 6 15:19:20 2004 Subject: [BioSQL-l] Re: GO dbxrefs in swissprot In-Reply-To: <40EAA5C2.8090801@mpi-cbg.de> References: <9D91A0A8-CC47-11D8-B628-000A959EB4C4@gnf.org> <40EAA5C2.8090801@mpi-cbg.de> Message-ID: <1417F5C6-CF81-11D8-87B1-000A95AE92B0@gnf.org> On Jul 6, 2004, at 6:14 AM, Andreas Henschel wrote: > Sorry for having bothered you with versioning, I simply trusted the > biosql installation instructions that claimed a patched 1.2.1 would > do. Sorry - the documentation needs to be updated. > What still puzzles me is the size of the database: starting with a > 543MB flatfile, the first run (with the faulty parser) gave me 600MB > database and 9100 GO annotations. After the rerun with > load_seqdatabase (...) --lookup --remove I get 1.1GB database but > only 5100 GO annotations in the dbxref table. Is this due to the > normalization? I'm confused. Did you start with a scratch biosql instance, or did you re-use the one loaded with swissprot before? If re-loading an existing one, the number of rows in dbxref should *not* go down, regardless of what you do to bioentries. The number of rows in the association table bioentry_dbxref will be affected though. Did you do a grep on the GO dbxrefs in the swissprot files followed by sort unique? How many did you get? You should have at least as many rows in dbxref. If you find a discrepancy, i.e., if you can identify a GO dbxref that's present in your swissprot file but not in the database, check out an entry that is (or should be) associated with that dbxref. > Is there a full list of parseable databases (GenBank, EMBL, ENSEMBL?, > PDB? etc) and the resp. place to download? This list is more or less identical with the list of formats readable by the Bio::SeqIO system in bioperl, because this is what load_seqdatabase.pl uses for parsing files. Genbank and Embl is among those formats. Ensembl used to come in an Embl-formatted flatfile dump, but I don't know whether it still does. Note that without any post-processing the bioentries resulting from a file upload will represent the entries found in the source file. E.g., if the source file contains an annotated whole chromosome entry, that's what you'll get (but not necessarily want) in biosql as well. As an example for integrated post-processing, I used to use a Bio::Factory::SequenceProcessorI implementation to split Ensembl whole chromosomes into predicted genes, transcripts, and proteins, which would then get loaded into biosql. (check out the documentation for the --pipeline option in load_seqdatabase.pl for how to make the script invoke a given post-processor) -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From birney at ebi.ac.uk Tue Jul 6 15:43:07 2004 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue Jul 6 20:14:32 2004 Subject: [BioSQL-l] Re: [Bioperl-l] Re: GO dbxrefs in swissprot In-Reply-To: <1417F5C6-CF81-11D8-87B1-000A95AE92B0@gnf.org> Message-ID: > > Is there a full list of parseable databases (GenBank, EMBL, ENSEMBL?, > > PDB? etc) and the resp. place to download? > Ensembl is best accessed through the Ensembl Perl API, parts of which still do comply to the Bioperl Bio::SeqI interface (ie, they can be dumped by SeqIO, and therefore in theory read into the BioSQL). Ensembl does make EMBL dumps *BUT*.... ... all we now put in the EMBL dumps are the genes. It is bad enough trying to keep everything tied down in place inside the Ensembl system correctly to also be agonising about how data should be represented inside EMBL/GenBank flat files (or Bio::SeqI objects more clearly) -- and we clearly can't dump all the SNPs, Features, Genes, Exon, Affy probe mappings, etc etc on our ftp site. We'd simply run out of space by feburary each year. A low priority project inside Ensembl has been to set up a more functional ensembl<->bioperl bridge that would give good access to Ensembl objects through a Bio::SeqI wrapper, presumably using the AnnotationI interface to its absolute max. This is in the "would be nice to do" but we always have things far higher on the priority stack (eg, this month's fun was dealing with selenocystines). For more info on the ensembl perl API check out: http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/CodeTutorial.html From Hegedus.Tamas at mayo.edu Fri Jul 9 12:14:20 2004 From: Hegedus.Tamas at mayo.edu (Hegedus, Tamas .) Date: Fri Jul 9 12:23:05 2004 Subject: [BioSQL-l] BioSQL/SwissProt Message-ID: Dear All, Dear Hilmar, I try to load SwissProt newest release or rel.43 into the BioSQL schema (I have the newest version from all perl package (BioPerl, BioDB)). (Note: I am not familiar with perl; but I could not do it with BioPython, so I wanted to try it with perl) I try to use the load_seqdatabase.pl: perl /home/src/bioperl/bioperl-db-0.1/scripts/load_seqdatabase.pl --driver mysql --format swiss /home/src/uniprot/uniprot_sprot.dat - If I specified more command line arguments than two (like --namespace --dbuser, too) the script wanted to open one of the options as a sequence file. - So I set the user name, password, dbname in the load_seqdatabase.pl, run the script, and I received the following errors (I think for all the entries parsed from the dat_file) DBD::mysql::st execute failed: Unknown column 'display_id' in 'field list' at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/SQL/SeqAdaptor.pm line 427, line 37864. Could not store P46952 because of DBD::mysql::st execute failed: Unknown column 'display_id' in 'field list' at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/SQL/SeqAdaptor.pm line 427, line 37864. Thanks for your help and suggestions, Tamas From hlapp at gnf.org Fri Jul 9 12:36:50 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Jul 9 12:38:52 2004 Subject: [BioSQL-l] BioSQL/SwissProt In-Reply-To: Message-ID: <2DB331E2-D1C6-11D8-A98F-000A959EB4C4@gnf.org> It sounds like you're using an old release of bioperl-db; 0.1 is very old and doesn't support the current version of biosql. Download bioperl-db from cvs and let me know if that doesn't solve the problem. If you check at the top of the load_seqdatabase.pl file, there should be a CVS Id tag that has a date stamp of no earlier than January this year (later is fine). -hilmar On Friday, July 9, 2004, at 09:14 AM, Hegedus, Tamas . wrote: > Dear All, > Dear Hilmar, > > I try to load SwissProt newest release or rel.43 into the BioSQL > schema (I have the newest version from all perl package (BioPerl, > BioDB)). > (Note: I am not familiar with perl; but I could not do it with > BioPython, so I wanted to try it with perl) > > I try to use the load_seqdatabase.pl: > perl /home/src/bioperl/bioperl-db-0.1/scripts/load_seqdatabase.pl > --driver mysql --format swiss /home/src/uniprot/uniprot_sprot.dat > > - If I specified more command line arguments than two (like > --namespace --dbuser, too) the script wanted to open one of the > options as a sequence file. > - So I set the user name, password, dbname in the load_seqdatabase.pl, > run the script, > and I received the following errors (I think for all the entries > parsed from the dat_file) > > DBD::mysql::st execute failed: Unknown column 'display_id' in 'field > list' at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/SQL/SeqAdaptor.pm line > 427, line 37864. > Could not store P46952 because of DBD::mysql::st execute failed: > Unknown column 'display_id' in 'field list' at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/SQL/SeqAdaptor.pm line 427, > line 37864. > > Thanks for your help and suggestions, > Tamas > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Hegedus.Tamas at mayo.edu Thu Jul 15 14:18:04 2004 From: Hegedus.Tamas at mayo.edu (Hegedus, Tamas .) Date: Thu Jul 15 14:27:00 2004 Subject: [BioSQL-l] ModBioSQL release 0.12 Message-ID: Dear All, Dear Hilmar, during my work I had to use UniProt in an RDBMS. Cause of performance problems I had to realize my system without BioSQL. I collected my scripts and experiments into a package called Modular BioSQL which has different features: -- Modular RDB realization of different biological databases allows fine-tuning with increased performance. -- Storing result sets in RDBMS allows more accurate, more comfortable analysis using SQL. -- User interaction with the RDBMS (installation, loading up and querying data) does not need programming skills. -- Light weight RDB interaction with analysis packages (only EMBOSS is implemented). -- Optimalized loading of flat files into the RDBMS. -- Using 'fixed value arrays' (*_ref tables) results in both smaller data size (smaller than the flat file) and smaller index -- size increasing the performance (theoretically both the uploading and querying performance). -- Relatively easily extendable to implement and handle databases other than the currently realized. You may think I suggest Modular BioSQL as a replacement of BioSQL. I do not think so! For details, please visit my web site, and please send comments and suggestions: http://www.biomembrane.hu/~hegedus/modbiosql/ Best regards, Tamas -- Tamas Hegedus, Research Fellow | phone: 480-301-6041 Mayo Clinic Scottsdale | fax: 480-301-7017 13000 E. Shea Blvd | mailto:hegedus.tamas@mayo.edu Scottsdale, AZ, 85259 | http://www.biomembrane.hu/~hegedus From mg at guerrilla-tech.com Thu Jul 22 11:04:29 2004 From: mg at guerrilla-tech.com (Michael Griffith) Date: Fri Jul 23 16:16:30 2004 Subject: [BioSQL-l] biosql-ora BS-prepopulate-db errors Message-ID: Hi, I am trying to setup BIOSQL to work with Oracle 9i. I was able to successfully create the database and all of the objects in it. Everything seems to be ok, but when I try to execute the prepopulate-db script, I get an exception when the script tries to insert into SLGD_Terms. The sql that causes the exception is: -- -- Ontology terms: relationship type ontology -- INSERT INTO SGLD_Terms (Trm_Name, Trm_Identifier, Cat_Name) VALUES ('EST','REO:0000345','Relationship Type Ontology'); The exception is: ORA-20101: failed to lookup Ont <> ORA-06512: at "BIOSQL_APP.TRM" line 102 ORA-06512: at "BIOSQL_APP.BIR_TERMS" line 6 ORA-04088: error during execution of trigger "BIOSQL_APP.BIR_TERMS" Where BIOSQL_APP is my user. I have looked at the procedures and triggers involved, and I see that the exception is being raised from the procedure TRM, because of what I think is a broken relationship. What is confusing to me is that the BS-prepopulate-db.sql script has a comment that reads: -- The following will be created automatically upon inserting their terms. -- 'Relationship Type Ontology' -- 'Bioentry Type Ontology' -- 'Qualifier Type Ontology' -- Evidently, this did not happen. Am I missing something? Any help would be greatly appreciated! Cheers! Michael Griffith From hlapp at gnf.org Mon Jul 26 15:10:13 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jul 26 15:12:45 2004 Subject: [BioSQL-l] biosql-ora BS-prepopulate-db errors In-Reply-To: References: Message-ID: <6BB59942-DF37-11D8-A4CA-000A95AE92B0@gnf.org> Sorry, the prepopulate script is old and I believe out of date w.r.t. some schema changes. 'Cat_Name' should read 'Ont_Name'. It may work after that change. You will want to carefully review what this script does; it's entirely custom and nothing in a plain biosql installation depends on it, nor does load_seqdatabase.pl. It populates some early versions of custom and small ontologies that make up part of the glue in a Symgene database, and may be completely useless for other applications. Let me know if you really want that content and I'll update the script. I also have dagflat versions of some of those ontologies. -hilmar On Jul 22, 2004, at 8:04 AM, Michael Griffith wrote: > Hi, > > I am trying to setup BIOSQL to work with Oracle 9i. I was able to > successfully create the database and all of the objects in it. > Everything > seems to be ok, but when I try to execute the prepopulate-db script, I > get > an exception when the script tries to insert into SLGD_Terms. > > The sql that causes the exception is: > > -- > -- Ontology terms: relationship type ontology > -- > INSERT INTO SGLD_Terms (Trm_Name, Trm_Identifier, Cat_Name) > VALUES ('EST','REO:0000345','Relationship Type Ontology'); > > The exception is: > ORA-20101: failed to lookup Ont <> > ORA-06512: at "BIOSQL_APP.TRM" line 102 > ORA-06512: at "BIOSQL_APP.BIR_TERMS" line 6 > ORA-04088: error during execution of trigger "BIOSQL_APP.BIR_TERMS" > > Where BIOSQL_APP is my user. I have looked at the procedures and > triggers > involved, and I see that the exception is being raised from the > procedure > TRM, because of what I think is a broken relationship. > > What is confusing to me is that the BS-prepopulate-db.sql script has a > comment that reads: > > -- The following will be created automatically upon inserting their > terms. > -- 'Relationship Type Ontology' > -- 'Bioentry Type Ontology' > -- 'Qualifier Type Ontology' > -- > > Evidently, this did not happen. Am I missing something? Any help > would be > greatly appreciated! > > Cheers! > > Michael Griffith > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Mon Jul 26 16:21:20 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jul 26 16:22:52 2004 Subject: [BioSQL-l] biosql-ora BS-prepopulate-db errors In-Reply-To: References: Message-ID: <5B35823E-DF41-11D8-A4CA-000A95AE92B0@gnf.org> I see. You didn't say Biojava and Biosql Oracle in the same sentence before did you. Biojava can't deal with different 'versions' of the same schema even if it's only column names, and the Oracle version differs in naming conventions. There is an adaptor API in the repository that maps table names, but you'll need another one that I've written that maps both tables and column names through views. I'll dig that up and post it. Also, the Biojava folks I believe added a table or a column to one of the ontology tables. You'll have to inquire about that on the biojava mailing list. -hilmar On Jul 26, 2004, at 12:24 PM, Michael Griffith wrote: > Hilmar, > > Thanks for the reply. > > I guess I don't really need it, I was just trying to put some data in > the > DB, so I could play with it. My bigger problem is that once I built > the DB > and tried to insert a record using BioJava 1.4, I got a message that > the > schema was old and I should have used a new schema. > > I posted that message on BioJava mail list, but have received no > reply. Is > this a question you can help me with? > > Where can I download the bio-sql-ora scripts that will work with > BioJava > 1.4? > > Thanks in advance! > > MG > > > On 7/26/04 2:10 PM, "Hilmar Lapp" wrote: > >> Sorry, the prepopulate script is old and I believe out of date w.r.t. >> some schema changes. 'Cat_Name' should read 'Ont_Name'. It may work >> after that change. >> >> You will want to carefully review what this script does; it's entirely >> custom and nothing in a plain biosql installation depends on it, nor >> does load_seqdatabase.pl. It populates some early versions of custom >> and small ontologies that make up part of the glue in a Symgene >> database, and may be completely useless for other applications. >> >> Let me know if you really want that content and I'll update the >> script. >> I also have dagflat versions of some of those ontologies. >> >> -hilmar >> >> On Jul 22, 2004, at 8:04 AM, Michael Griffith wrote: >> >>> Hi, >>> >>> I am trying to setup BIOSQL to work with Oracle 9i. I was able to >>> successfully create the database and all of the objects in it. >>> Everything >>> seems to be ok, but when I try to execute the prepopulate-db script, >>> I >>> get >>> an exception when the script tries to insert into SLGD_Terms. >>> >>> The sql that causes the exception is: >>> >>> -- >>> -- Ontology terms: relationship type ontology >>> -- >>> INSERT INTO SGLD_Terms (Trm_Name, Trm_Identifier, Cat_Name) >>> VALUES ('EST','REO:0000345','Relationship Type Ontology'); >>> >>> The exception is: >>> ORA-20101: failed to lookup Ont <> >>> ORA-06512: at "BIOSQL_APP.TRM" line 102 >>> ORA-06512: at "BIOSQL_APP.BIR_TERMS" line 6 >>> ORA-04088: error during execution of trigger "BIOSQL_APP.BIR_TERMS" >>> >>> Where BIOSQL_APP is my user. I have looked at the procedures and >>> triggers >>> involved, and I see that the exception is being raised from the >>> procedure >>> TRM, because of what I think is a broken relationship. >>> >>> What is confusing to me is that the BS-prepopulate-db.sql script has >>> a >>> comment that reads: >>> >>> -- The following will be created automatically upon inserting their >>> terms. >>> -- 'Relationship Type Ontology' >>> -- 'Bioentry Type Ontology' >>> -- 'Qualifier Type Ontology' >>> -- >>> >>> Evidently, this did not happen. Am I missing something? Any help >>> would be >>> greatly appreciated! >>> >>> Cheers! >>> >>> Michael Griffith >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l@open-bio.org >>> http://open-bio.org/mailman/listinfo/biosql-l >>> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Mon Jul 26 17:58:43 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jul 26 18:00:15 2004 Subject: [BioSQL-l] biosql-ora BS-prepopulate-db errors In-Reply-To: References: Message-ID: It's not really broken, but it's not perfectly packaged either. I'll commit myself in Glasgow to do my share to improve the situation. The Oracle version of the schema wasn't originally part of biosql, I added it later. Worse yet, it's not identical with and hence regularly goes out of sync with our in-house repository of that schema DDL scripts. Not nice, I know. -hilmar On Jul 26, 2004, at 1:54 PM, Michael Griffith wrote: > Hilmar, > > Thank you. > > From an outsider's view, this all seems a little broken. Wouldn't it > benefit the community to bundle all these things together? > > Thanks in advance! > > MG > > > On 7/26/04 3:21 PM, "Hilmar Lapp" wrote: > >> I see. You didn't say Biojava and Biosql Oracle in the same sentence >> before did you. >> >> Biojava can't deal with different 'versions' of the same schema even >> if >> it's only column names, and the Oracle version differs in naming >> conventions. There is an adaptor API in the repository that maps table >> names, but you'll need another one that I've written that maps both >> tables and column names through views. I'll dig that up and post it. >> >> Also, the Biojava folks I believe added a table or a column to one of >> the ontology tables. You'll have to inquire about that on the biojava >> mailing list. >> >> -hilmar >> >> >> >> On Jul 26, 2004, at 12:24 PM, Michael Griffith wrote: >> >>> Hilmar, >>> >>> Thanks for the reply. >>> >>> I guess I don't really need it, I was just trying to put some data in >>> the >>> DB, so I could play with it. My bigger problem is that once I built >>> the DB >>> and tried to insert a record using BioJava 1.4, I got a message that >>> the >>> schema was old and I should have used a new schema. >>> >>> I posted that message on BioJava mail list, but have received no >>> reply. Is >>> this a question you can help me with? >>> >>> Where can I download the bio-sql-ora scripts that will work with >>> BioJava >>> 1.4? >>> >>> Thanks in advance! >>> >>> MG >>> >>> >>> On 7/26/04 2:10 PM, "Hilmar Lapp" wrote: >>> >>>> Sorry, the prepopulate script is old and I believe out of date >>>> w.r.t. >>>> some schema changes. 'Cat_Name' should read 'Ont_Name'. It may work >>>> after that change. >>>> >>>> You will want to carefully review what this script does; it's >>>> entirely >>>> custom and nothing in a plain biosql installation depends on it, nor >>>> does load_seqdatabase.pl. It populates some early versions of custom >>>> and small ontologies that make up part of the glue in a Symgene >>>> database, and may be completely useless for other applications. >>>> >>>> Let me know if you really want that content and I'll update the >>>> script. >>>> I also have dagflat versions of some of those ontologies. >>>> >>>> -hilmar >>>> >>>> On Jul 22, 2004, at 8:04 AM, Michael Griffith wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to setup BIOSQL to work with Oracle 9i. I was able to >>>>> successfully create the database and all of the objects in it. >>>>> Everything >>>>> seems to be ok, but when I try to execute the prepopulate-db >>>>> script, >>>>> I >>>>> get >>>>> an exception when the script tries to insert into SLGD_Terms. >>>>> >>>>> The sql that causes the exception is: >>>>> >>>>> -- >>>>> -- Ontology terms: relationship type ontology >>>>> -- >>>>> INSERT INTO SGLD_Terms (Trm_Name, Trm_Identifier, Cat_Name) >>>>> VALUES ('EST','REO:0000345','Relationship Type Ontology'); >>>>> >>>>> The exception is: >>>>> ORA-20101: failed to lookup Ont <> >>>>> ORA-06512: at "BIOSQL_APP.TRM" line 102 >>>>> ORA-06512: at "BIOSQL_APP.BIR_TERMS" line 6 >>>>> ORA-04088: error during execution of trigger "BIOSQL_APP.BIR_TERMS" >>>>> >>>>> Where BIOSQL_APP is my user. I have looked at the procedures and >>>>> triggers >>>>> involved, and I see that the exception is being raised from the >>>>> procedure >>>>> TRM, because of what I think is a broken relationship. >>>>> >>>>> What is confusing to me is that the BS-prepopulate-db.sql script >>>>> has >>>>> a >>>>> comment that reads: >>>>> >>>>> -- The following will be created automatically upon inserting their >>>>> terms. >>>>> -- 'Relationship Type Ontology' >>>>> -- 'Bioentry Type Ontology' >>>>> -- 'Qualifier Type Ontology' >>>>> -- >>>>> >>>>> Evidently, this did not happen. Am I missing something? Any help >>>>> would be >>>>> greatly appreciated! >>>>> >>>>> Cheers! >>>>> >>>>> Michael Griffith >>>>> >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l@open-bio.org >>>>> http://open-bio.org/mailman/listinfo/biosql-l >>>>> >>> >>> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Mon Jul 26 19:22:32 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jul 26 19:24:02 2004 Subject: [BioSQL-l] biosql-ora BS-prepopulate-db errors In-Reply-To: References: Message-ID: I committed the other API to the repository, BS-create-Biosql-API2.sql. Note that some names are still different because of reserved word issues. E.g., comment and synonym are reserved in Oracle, so the Comment table is named Anncomment, and Term_Synonym.Synonym becomes Term_Synonym.Name. -hilmar On Jul 26, 2004, at 1:54 PM, Michael Griffith wrote: > Hilmar, > > Thank you. > > From an outsider's view, this all seems a little broken. Wouldn't it > benefit the community to bundle all these things together? > > Thanks in advance! > > MG > > > On 7/26/04 3:21 PM, "Hilmar Lapp" wrote: > >> I see. You didn't say Biojava and Biosql Oracle in the same sentence >> before did you. >> >> Biojava can't deal with different 'versions' of the same schema even >> if >> it's only column names, and the Oracle version differs in naming >> conventions. There is an adaptor API in the repository that maps table >> names, but you'll need another one that I've written that maps both >> tables and column names through views. I'll dig that up and post it. >> >> Also, the Biojava folks I believe added a table or a column to one of >> the ontology tables. You'll have to inquire about that on the biojava >> mailing list. >> >> -hilmar >> >> >> >> On Jul 26, 2004, at 12:24 PM, Michael Griffith wrote: >> >>> Hilmar, >>> >>> Thanks for the reply. >>> >>> I guess I don't really need it, I was just trying to put some data in >>> the >>> DB, so I could play with it. My bigger problem is that once I built >>> the DB >>> and tried to insert a record using BioJava 1.4, I got a message that >>> the >>> schema was old and I should have used a new schema. >>> >>> I posted that message on BioJava mail list, but have received no >>> reply. Is >>> this a question you can help me with? >>> >>> Where can I download the bio-sql-ora scripts that will work with >>> BioJava >>> 1.4? >>> >>> Thanks in advance! >>> >>> MG >>> >>> >>> On 7/26/04 2:10 PM, "Hilmar Lapp" wrote: >>> >>>> Sorry, the prepopulate script is old and I believe out of date >>>> w.r.t. >>>> some schema changes. 'Cat_Name' should read 'Ont_Name'. It may work >>>> after that change. >>>> >>>> You will want to carefully review what this script does; it's >>>> entirely >>>> custom and nothing in a plain biosql installation depends on it, nor >>>> does load_seqdatabase.pl. It populates some early versions of custom >>>> and small ontologies that make up part of the glue in a Symgene >>>> database, and may be completely useless for other applications. >>>> >>>> Let me know if you really want that content and I'll update the >>>> script. >>>> I also have dagflat versions of some of those ontologies. >>>> >>>> -hilmar >>>> >>>> On Jul 22, 2004, at 8:04 AM, Michael Griffith wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to setup BIOSQL to work with Oracle 9i. I was able to >>>>> successfully create the database and all of the objects in it. >>>>> Everything >>>>> seems to be ok, but when I try to execute the prepopulate-db >>>>> script, >>>>> I >>>>> get >>>>> an exception when the script tries to insert into SLGD_Terms. >>>>> >>>>> The sql that causes the exception is: >>>>> >>>>> -- >>>>> -- Ontology terms: relationship type ontology >>>>> -- >>>>> INSERT INTO SGLD_Terms (Trm_Name, Trm_Identifier, Cat_Name) >>>>> VALUES ('EST','REO:0000345','Relationship Type Ontology'); >>>>> >>>>> The exception is: >>>>> ORA-20101: failed to lookup Ont <> >>>>> ORA-06512: at "BIOSQL_APP.TRM" line 102 >>>>> ORA-06512: at "BIOSQL_APP.BIR_TERMS" line 6 >>>>> ORA-04088: error during execution of trigger "BIOSQL_APP.BIR_TERMS" >>>>> >>>>> Where BIOSQL_APP is my user. I have looked at the procedures and >>>>> triggers >>>>> involved, and I see that the exception is being raised from the >>>>> procedure >>>>> TRM, because of what I think is a broken relationship. >>>>> >>>>> What is confusing to me is that the BS-prepopulate-db.sql script >>>>> has >>>>> a >>>>> comment that reads: >>>>> >>>>> -- The following will be created automatically upon inserting their >>>>> terms. >>>>> -- 'Relationship Type Ontology' >>>>> -- 'Bioentry Type Ontology' >>>>> -- 'Qualifier Type Ontology' >>>>> -- >>>>> >>>>> Evidently, this did not happen. Am I missing something? Any help >>>>> would be >>>>> greatly appreciated! >>>>> >>>>> Cheers! >>>>> >>>>> Michael Griffith >>>>> >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l@open-bio.org >>>>> http://open-bio.org/mailman/listinfo/biosql-l >>>>> >>> >>> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mg at guerrilla-tech.com Mon Jul 26 15:24:29 2004 From: mg at guerrilla-tech.com (Michael Griffith) Date: Tue Jul 27 00:01:47 2004 Subject: [BioSQL-l] biosql-ora BS-prepopulate-db errors In-Reply-To: <6BB59942-DF37-11D8-A4CA-000A95AE92B0@gnf.org> Message-ID: Hilmar, Thanks for the reply. I guess I don't really need it, I was just trying to put some data in the DB, so I could play with it. My bigger problem is that once I built the DB and tried to insert a record using BioJava 1.4, I got a message that the schema was old and I should have used a new schema. I posted that message on BioJava mail list, but have received no reply. Is this a question you can help me with? Where can I download the bio-sql-ora scripts that will work with BioJava 1.4? Thanks in advance! MG On 7/26/04 2:10 PM, "Hilmar Lapp" wrote: > Sorry, the prepopulate script is old and I believe out of date w.r.t. > some schema changes. 'Cat_Name' should read 'Ont_Name'. It may work > after that change. > > You will want to carefully review what this script does; it's entirely > custom and nothing in a plain biosql installation depends on it, nor > does load_seqdatabase.pl. It populates some early versions of custom > and small ontologies that make up part of the glue in a Symgene > database, and may be completely useless for other applications. > > Let me know if you really want that content and I'll update the script. > I also have dagflat versions of some of those ontologies. > > -hilmar > > On Jul 22, 2004, at 8:04 AM, Michael Griffith wrote: > >> Hi, >> >> I am trying to setup BIOSQL to work with Oracle 9i. I was able to >> successfully create the database and all of the objects in it. >> Everything >> seems to be ok, but when I try to execute the prepopulate-db script, I >> get >> an exception when the script tries to insert into SLGD_Terms. >> >> The sql that causes the exception is: >> >> -- >> -- Ontology terms: relationship type ontology >> -- >> INSERT INTO SGLD_Terms (Trm_Name, Trm_Identifier, Cat_Name) >> VALUES ('EST','REO:0000345','Relationship Type Ontology'); >> >> The exception is: >> ORA-20101: failed to lookup Ont <> >> ORA-06512: at "BIOSQL_APP.TRM" line 102 >> ORA-06512: at "BIOSQL_APP.BIR_TERMS" line 6 >> ORA-04088: error during execution of trigger "BIOSQL_APP.BIR_TERMS" >> >> Where BIOSQL_APP is my user. I have looked at the procedures and >> triggers >> involved, and I see that the exception is being raised from the >> procedure >> TRM, because of what I think is a broken relationship. >> >> What is confusing to me is that the BS-prepopulate-db.sql script has a >> comment that reads: >> >> -- The following will be created automatically upon inserting their >> terms. >> -- 'Relationship Type Ontology' >> -- 'Bioentry Type Ontology' >> -- 'Qualifier Type Ontology' >> -- >> >> Evidently, this did not happen. Am I missing something? Any help >> would be >> greatly appreciated! >> >> Cheers! >> >> Michael Griffith >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l@open-bio.org >> http://open-bio.org/mailman/listinfo/biosql-l >>