From mg at base-pair.com Wed Aug 4 10:34:35 2004 From: mg at base-pair.com (Michael Griffith) Date: Wed Aug 4 10:38:07 2004 Subject: [BioSQL-l] Oracle BIoSQL Schema + BioJava Message-ID: Hi, I am able to get the latest BioJava 1.4 release to work with the BioSQL schema on MySQL, but I am having problems with setting up the oracle version. I have downloaded and installed the BS-create-Biosql-API2.sql file posted a week ago by Hilmar Lapp. Hilmar also informed me that there was a table missing -- which I believe is the term_relationship_term table. This table was also missing from the mySQL schema. I've recreated the DB with this table, and the new improved API/views. However when I am trying to load a Genebank record into the database, I get the following exception stack: [java] org.biojava.bio.BioRuntimeException: Error adding sequence: AA000001 (rolled back successfully) [java] at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB .java:498) {java] at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB. java:365) [java] at com.gts.genebank.GeneralReader.main(GeneralReader.java:74) [java] Caused by: java.sql.SQLException: ORA-02290: check constraint (BIOSQL_APP.ALPHABET4) violated [java] at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) [java] at oracle.jdbc.ttc7.TTIoer.processError(TTIoer.java:289) My code is simply doing the following: SequenceDB db = new BioSQLSequenceDB(dbDriver, dbURL, dbUser, dbPass, biodatabase, create); SequenceIterator iter = (SequenceIterator)SeqIOTools.fileToBiojava(format, alpha, br); while (iter.hasNext()) { Sequence seq = iter.nextSequence(); try { db.addSequence(seq); } catch (Exception e) { e.printStackTrace(); } } This code works successfully against the mySQL version of the current biosql schema. Although there was a problem with the term.name field and case sensitivity in that version as well. In order to get that to work, the field had to be marked as binary. I am wondering if I am running into a similar problem here? Any help would be greatly appreciated! Cheers! Michael Griffith From hlapp at gnf.org Sat Aug 7 20:17:40 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Sat Aug 7 20:18:51 2004 Subject: [BioSQL-l] Oracle BIoSQL Schema + BioJava In-Reply-To: Message-ID: <5C334D7D-E8D0-11D8-A694-000A959EB4C4@gnf.org> Sorry for the late reply Michael. I was trapped at Glasgow airport for quite a while ... It looks like what it's complaining about is a violation of the constraint on Biosequence.Alphabet. The constraint limits the allowed values to one of 'dna', 'protein', or 'rna'. You could check the code or with the biojava folks on what they are inserting there, or if you trust that what they do still makes sense, simply remove the constraint: SQL> ALTER TABLE SG_Biosequence DROP CONSTRAINT Alphabet4; If that gets you going, you could check later which values caused the complaint: SQL> SELECT DISTINCT Alphabet FROM Biosequence WHERE alphabet NOT IN ('dna','protein','rna'); Hth, -hilmar On Wednesday, August 4, 2004, at 07:34 AM, Michael Griffith wrote: > Hi, > > I am able to get the latest BioJava 1.4 release to work with the BioSQL > schema on MySQL, but I am having problems with setting up the oracle > version. > > I have downloaded and installed the BS-create-Biosql-API2.sql file > posted a > week ago by Hilmar Lapp. Hilmar also informed me that there was a > table > missing -- which I believe is the term_relationship_term table. This > table > was also missing from the mySQL schema. > > I've recreated the DB with this table, and the new improved API/views. > However when I am trying to load a Genebank record into the database, > I get > the following exception stack: > > [java] org.biojava.bio.BioRuntimeException: Error adding sequence: > AA000001 > (rolled back successfully) > [java] at > org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSeque > nceDB > .java:498) > {java] at > org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequen > ceDB. > java:365) > [java] at > com.gts.genebank.GeneralReader.main(GeneralReader.java:74) > [java] Caused by: java.sql.SQLException: ORA-02290: check constraint > (BIOSQL_APP.ALPHABET4) violated > [java] at > oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) > [java] at > oracle.jdbc.ttc7.TTIoer.processError(TTIoer.java:289) > > My code is simply doing the following: > > SequenceDB db = new BioSQLSequenceDB(dbDriver, dbURL, dbUser, dbPass, > biodatabase, create); > > SequenceIterator iter = > (SequenceIterator)SeqIOTools.fileToBiojava(format, alpha, br); > > while (iter.hasNext()) { > Sequence seq = iter.nextSequence(); > try { > db.addSequence(seq); > } catch (Exception e) { > e.printStackTrace(); > } > } > > > This code works successfully against the mySQL version of the current > biosql > schema. Although there was a problem with the term.name field and case > sensitivity in that version as well. In order to get that to work, the > field > had to be marked as binary. I am wondering if I am running into a > similar > problem here? > > Any help would be greatly appreciated! > > Cheers! > > Michael Griffith > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mg at base-pair.com Mon Aug 9 12:03:21 2004 From: mg at base-pair.com (Michael Griffith) Date: Mon Aug 9 12:06:51 2004 Subject: [BioSQL-l] MORE Oracle BioSQL & BioJava problems Message-ID: Hi, I have been trying to get the latest BioSQL (Oracle) and BioJava to play nicely -- and I feel that I am close, but I am still getting errors. I am trying to read a GenBank file to the Oracle BioSQL schema with the following code: SequenceDB db = new BioSQLSequenceDB(dbDriver, dbURL, dbUser, dbPass biodatabase, create); SequenceIterator iter = (SequenceIterator)SeqIOTools.fileToBiojava(format, alpha, br); int counter= 0; while (iter.hasNext()) { Sequence seq = iter.nextSequence(); try { db.addSequence(seq); } catch (Exception e) { e.printStackTrace(); } ... } This code works perfectly well with the mySQL version of the bio-sql schema, however with the oracle version, I get the following SQLException stack. The loop loads about 65 of the first 70 records, and hangs on record #71, every time. What is puzzling, is I have never had any sort of these kinds of errors with any other Java/Oracle application. [java] org.biojava.bio.BioRuntimeException: Error adding sequence: NM_019764 [java] at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB .java:498) [java] Trying to add: NM_021274 to the database -- insert attemp #:71 [java] at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB. java:365) [java] at com.gts.genebank.GeneralReader.main(GeneralReader.java:74) [java] Caused by: java.sql.SQLException: No more data to read from socket [java] at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) [java] at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179) [java] at oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160) [java] at oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:963) [java] at oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:893) [java] at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:369) [java] at oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1891) [java] at oracle.jdbc.ttc7.TTC7Protocol.parseExecuteFetch(TTC7Protocol.java:1093) [java] at oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement.java:2047 ) [java] at oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.java:1940) [java] at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java :2709) [java] at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedState ment.java:589) [java] at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(Delegating PreparedStatement.java:233) [java] at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(Delegating PreparedStatement.java:233) [java] at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB .java:455) [java] ... 2 more Any help would be greatly appreciated! Cheers! MG From hlapp at gnf.org Mon Aug 9 12:25:17 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Aug 9 12:26:01 2004 Subject: [BioSQL-l] Re: MORE Oracle BioSQL & BioJava problems In-Reply-To: Message-ID: This smells like a problem with one of the LOB columns, which is Anncomment.Comment_Text and Biosequence.Seq, and the stack trace looks like it's the Seq column (which holds the sequence). LOB columns in Oracle need to be streamed if they are over 4000 chars (otherwise the server can do the conversion). I believe the more recent versions of the Oracle JDBC driver do that transparently behind the scenes if you call {set,get}String() on a column that in reality is a LOB. Are you by any chance trying to communicate with a 9i+ database using an 8i driver? -hilmar On Monday, August 9, 2004, at 09:03 AM, Michael Griffith wrote: > Hi, > > I have been trying to get the latest BioSQL (Oracle) and BioJava to > play > nicely -- and I feel that I am close, but I am still getting errors. > I am > trying to read a GenBank file to the Oracle BioSQL schema with the > following > code: > > SequenceDB db = new BioSQLSequenceDB(dbDriver, dbURL, dbUser, dbPass > biodatabase, create); > > SequenceIterator iter = > (SequenceIterator)SeqIOTools.fileToBiojava(format, alpha, > br); > int counter= 0; > > while (iter.hasNext()) { > > Sequence seq = iter.nextSequence(); > > try { > db.addSequence(seq); > } > catch (Exception e) { > e.printStackTrace(); > } > ... > } > > This code works perfectly well with the mySQL version of the bio-sql > schema, > however with the oracle version, I get the following SQLException > stack. > > The loop loads about 65 of the first 70 records, and hangs on record > #71, > every time. What is puzzling, is I have never had any sort of these > kinds > of errors with any other Java/Oracle application. > > [java] org.biojava.bio.BioRuntimeException: Error adding sequence: > NM_019764 > [java] at > org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSeque > nceDB > .java:498) > [java] Trying to add: NM_021274 to the database -- insert attemp > #:71 > [java] at > org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequen > ceDB. > java:365) > [java] at > com.gts.genebank.GeneralReader.main(GeneralReader.java:74) > [java] Caused by: java.sql.SQLException: No more data to read from > socket > [java] at > oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) > [java] at > oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179) > [java] at > oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160) > [java] at > oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:963) > [java] at > oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:893) > [java] at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:369) > [java] at > oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1891) > [java] at > oracle.jdbc.ttc7.TTC7Protocol.parseExecuteFetch(TTC7Protocol.java:1093) > [java] at > oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement.java > :2047 > ) > [java] at > oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.java: > 1940) > [java] at > oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement > .java > :2709) > [java] at > oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePrepared > State > ment.java:589) > [java] at > org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(Deleg > ating > PreparedStatement.java:233) > [java] at > org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(Deleg > ating > PreparedStatement.java:233) > [java] at > org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSeque > nceDB > .java:455) > [java] ... 2 more > > Any help would be greatly appreciated! > > Cheers! > > MG > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From kdj at sanger.ac.uk Mon Aug 9 13:21:20 2004 From: kdj at sanger.ac.uk (Keith James) Date: Mon Aug 9 13:28:05 2004 Subject: [BioSQL-l] Re: [Biojava-dev] MORE Oracle BioSQL & BioJava problems In-Reply-To: References: Message-ID: >>>>> "Michael" == Michael Griffith writes: [...] Michael> The loop loads about 65 of the first 70 records, and Michael> hangs on record #71, every time. What is puzzling, is I Michael> have never had any sort of these kinds of errors with any Michael> other Java/Oracle application. Michael> [java] org.biojava.bio.BioRuntimeException: Error adding Michael> sequence: NM_019764 [java] at Michael> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB Michael> .java:498) [java] Trying to add: NM_021274 to the Michael> database -- insert attemp #:71 [java] at Michael> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB. Michael> java:365) [java] at Michael> com.gts.genebank.GeneralReader.main(GeneralReader.java:74) Michael> [java] Caused by: java.sql.SQLException: No more data to Michael> read from socket [java] at Michael> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) Firstly, IANABU (I am not a BioSQL user), but here's some things to try anyway, What version of Oracle and drivers? What DBCP version? It looks like the BioSQL code is using streams for the sequence, so it shouldn't be a data size problem. It looks like your connection may be getting closed too early. This could be a bad interaction with the the Oracle driver or it could be DBCP which in its 1.1 incarnation is quite flaky and prone to allowing shared connections. Is autocommit getting turned on accidentally? Anywhere a LOB gets populated it must be off in case multiple reads/writes are required (again this could be a symptom of a shared connection). Try a more recent DBCP. Use P6Spy (http://www.p6spy.com) to get the exact insert statements sent to the server and post them to the list. Keith -- - Keith James Microarray Facility, Team 65 - - The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK - From mg at guerrilla-tech.com Mon Aug 9 13:08:36 2004 From: mg at guerrilla-tech.com (Michael Griffith) Date: Mon Aug 9 13:28:40 2004 Subject: [BioSQL-l] Re: MORE Oracle BioSQL & BioJava problems In-Reply-To: Message-ID: Hilmar, Thanks for the reply. Just to make sure I had the latest and greatest JDBC driver, I downloaded 9.2.0.3 from Oracle's web site. I got the same exact error, in the same exact order. I am still puzzled as to what is going on. MG On 8/9/04 11:25 AM, "Hilmar Lapp" wrote: > This smells like a problem with one of the LOB columns, which is > Anncomment.Comment_Text and Biosequence.Seq, and the stack trace looks > like it's the Seq column (which holds the sequence). > > LOB columns in Oracle need to be streamed if they are over 4000 chars > (otherwise the server can do the conversion). I believe the more recent > versions of the Oracle JDBC driver do that transparently behind the > scenes if you call {set,get}String() on a column that in reality is a > LOB. > > Are you by any chance trying to communicate with a 9i+ database using > an 8i driver? > > -hilmar > > On Monday, August 9, 2004, at 09:03 AM, Michael Griffith wrote: > >> Hi, >> >> I have been trying to get the latest BioSQL (Oracle) and BioJava to >> play >> nicely -- and I feel that I am close, but I am still getting errors. >> I am >> trying to read a GenBank file to the Oracle BioSQL schema with the >> following >> code: >> >> SequenceDB db = new BioSQLSequenceDB(dbDriver, dbURL, dbUser, dbPass >> biodatabase, create); >> >> SequenceIterator iter = >> (SequenceIterator)SeqIOTools.fileToBiojava(format, alpha, >> br); >> int counter= 0; >> >> while (iter.hasNext()) { >> >> Sequence seq = iter.nextSequence(); >> >> try { >> db.addSequence(seq); >> } >> catch (Exception e) { >> e.printStackTrace(); >> } >> ... >> } >> >> This code works perfectly well with the mySQL version of the bio-sql >> schema, >> however with the oracle version, I get the following SQLException >> stack. >> >> The loop loads about 65 of the first 70 records, and hangs on record >> #71, >> every time. What is puzzling, is I have never had any sort of these >> kinds >> of errors with any other Java/Oracle application. >> >> [java] org.biojava.bio.BioRuntimeException: Error adding sequence: >> NM_019764 >> [java] at >> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSeque >> nceDB >> .java:498) >> [java] Trying to add: NM_021274 to the database -- insert attemp >> #:71 >> [java] at >> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequen >> ceDB. >> java:365) >> [java] at >> com.gts.genebank.GeneralReader.main(GeneralReader.java:74) >> [java] Caused by: java.sql.SQLException: No more data to read from >> socket >> [java] at >> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) >> [java] at >> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179) >> [java] at >> oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160) >> [java] at >> oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:963) >> [java] at >> oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:893) >> [java] at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:369) >> [java] at >> oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1891) >> [java] at >> oracle.jdbc.ttc7.TTC7Protocol.parseExecuteFetch(TTC7Protocol.java:1093) >> [java] at >> oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement.java >> :2047 >> ) >> [java] at >> oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.java: >> 1940) >> [java] at >> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement >> .java >> :2709) >> [java] at >> oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePrepared >> State >> ment.java:589) >> [java] at >> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(Deleg >> ating >> PreparedStatement.java:233) >> [java] at >> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(Deleg >> ating >> PreparedStatement.java:233) >> [java] at >> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSeque >> nceDB >> .java:455) >> [java] ... 2 more >> >> Any help would be greatly appreciated! >> >> Cheers! >> >> MG >> >> From hlapp at gnf.org Mon Aug 9 15:28:57 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Aug 9 15:30:19 2004 Subject: [BioSQL-l] Re: MORE Oracle BioSQL & BioJava problems In-Reply-To: References: Message-ID: <5BC16838-EA3A-11D8-A842-000A95AE92B0@gnf.org> Which rel. is the target Oracle DB? What is the length of the sequence string causing trouble? If it is indeed longer than 4000 chars, does the problem disappear when you make the sequence shorter than 4000 chars? Which JDBC API call is used to set the sequence string in the biojava language binding? If it is indeed setString(), what happens if you change that to the streaming API? -hilmar On Aug 9, 2004, at 10:08 AM, Michael Griffith wrote: > Hilmar, > > Thanks for the reply. > > Just to make sure I had the latest and greatest JDBC driver, I > downloaded > 9.2.0.3 from Oracle's web site. I got the same exact error, in the > same > exact order. > > I am still puzzled as to what is going on. > > MG > > > On 8/9/04 11:25 AM, "Hilmar Lapp" wrote: > >> This smells like a problem with one of the LOB columns, which is >> Anncomment.Comment_Text and Biosequence.Seq, and the stack trace looks >> like it's the Seq column (which holds the sequence). >> >> LOB columns in Oracle need to be streamed if they are over 4000 chars >> (otherwise the server can do the conversion). I believe the more >> recent >> versions of the Oracle JDBC driver do that transparently behind the >> scenes if you call {set,get}String() on a column that in reality is a >> LOB. >> >> Are you by any chance trying to communicate with a 9i+ database using >> an 8i driver? >> >> -hilmar >> >> On Monday, August 9, 2004, at 09:03 AM, Michael Griffith wrote: >> >>> Hi, >>> >>> I have been trying to get the latest BioSQL (Oracle) and BioJava to >>> play >>> nicely -- and I feel that I am close, but I am still getting errors. >>> I am >>> trying to read a GenBank file to the Oracle BioSQL schema with the >>> following >>> code: >>> >>> SequenceDB db = new BioSQLSequenceDB(dbDriver, dbURL, dbUser, dbPass >>> biodatabase, create); >>> >>> SequenceIterator iter = >>> (SequenceIterator)SeqIOTools.fileToBiojava(format, alpha, >>> br); >>> int counter= 0; >>> >>> while (iter.hasNext()) { >>> >>> Sequence seq = iter.nextSequence(); >>> >>> try { >>> db.addSequence(seq); >>> } >>> catch (Exception e) { >>> e.printStackTrace(); >>> } >>> ... >>> } >>> >>> This code works perfectly well with the mySQL version of the bio-sql >>> schema, >>> however with the oracle version, I get the following SQLException >>> stack. >>> >>> The loop loads about 65 of the first 70 records, and hangs on record >>> #71, >>> every time. What is puzzling, is I have never had any sort of these >>> kinds >>> of errors with any other Java/Oracle application. >>> >>> [java] org.biojava.bio.BioRuntimeException: Error adding sequence: >>> NM_019764 >>> [java] at >>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSeq >>> ue >>> nceDB >>> .java:498) >>> [java] Trying to add: NM_021274 to the database -- insert attemp >>> #:71 >>> [java] at >>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequ >>> en >>> ceDB. >>> java:365) >>> [java] at >>> com.gts.genebank.GeneralReader.main(GeneralReader.java:74) >>> [java] Caused by: java.sql.SQLException: No more data to read >>> from >>> socket >>> [java] at >>> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) >>> [java] at >>> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179) >>> [java] at >>> oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160) >>> [java] at >>> oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:963) >>> [java] at >>> oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:893) >>> [java] at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:369) >>> [java] at >>> oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1891) >>> [java] at >>> oracle.jdbc.ttc7.TTC7Protocol.parseExecuteFetch(TTC7Protocol.java: >>> 1093) >>> [java] at >>> oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement.ja >>> va >>> :2047 >>> ) >>> [java] at >>> oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.jav >>> a: >>> 1940) >>> [java] at >>> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStateme >>> nt >>> .java >>> :2709) >>> [java] at >>> oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePrepar >>> ed >>> State >>> ment.java:589) >>> [java] at >>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(Del >>> eg >>> ating >>> PreparedStatement.java:233) >>> [java] at >>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(Del >>> eg >>> ating >>> PreparedStatement.java:233) >>> [java] at >>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSeq >>> ue >>> nceDB >>> .java:455) >>> [java] ... 2 more >>> >>> Any help would be greatly appreciated! >>> >>> Cheers! >>> >>> MG >>> >>> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mg at base-pair.com Mon Aug 9 15:28:40 2004 From: mg at base-pair.com (Michael Griffith) Date: Mon Aug 9 15:32:08 2004 Subject: [BioSQL-l] Re: [Biojava-dev] MORE Oracle BioSQL & BioJava problems In-Reply-To: Message-ID: Keith, Thanks for the reply. Oracle Server is 9.2.0.4 running on Suse Linux JDBC Client is 9.2.0.3 running on OSX I've tried DBCP 1.1 and 1.2 According to what I've been able to trace to the source, autocommit is off. I've never seen P6spy before -- so it will take some time to figure out how it works. Once I have the SQL Statements, I'll post them to this thread. Thanks for your suggestions -- any other help is greatly appreciated! MG On 8/9/04 12:21 PM, "Keith James" wrote: >>>>>> "Michael" == Michael Griffith writes: > > [...] > > Michael> The loop loads about 65 of the first 70 records, and > Michael> hangs on record #71, every time. What is puzzling, is I > Michael> have never had any sort of these kinds of errors with any > Michael> other Java/Oracle application. > > Michael> [java] org.biojava.bio.BioRuntimeException: Error adding > Michael> sequence: NM_019764 [java] at > Michael> > org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB > Michael> .java:498) [java] Trying to add: NM_021274 to the > Michael> database -- insert attemp #:71 [java] at > Michael> > org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB. > Michael> java:365) [java] at > Michael> com.gts.genebank.GeneralReader.main(GeneralReader.java:74) > Michael> [java] Caused by: java.sql.SQLException: No more data to > Michael> read from socket [java] at > Michael> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) > > Firstly, IANABU (I am not a BioSQL user), but here's some things to > try anyway, > > What version of Oracle and drivers? What DBCP version? > > It looks like the BioSQL code is using streams for the sequence, so it > shouldn't be a data size problem. > > It looks like your connection may be getting closed too early. This > could be a bad interaction with the the Oracle driver or it could be > DBCP which in its 1.1 incarnation is quite flaky and prone to allowing > shared connections. Is autocommit getting turned on accidentally? > Anywhere a LOB gets populated it must be off in case multiple > reads/writes are required (again this could be a symptom of a shared > connection). > > Try a more recent DBCP. > > Use P6Spy (http://www.p6spy.com) to get the exact insert statements > sent to the server and post them to the list. > > Keith From mg at base-pair.com Tue Aug 10 16:29:03 2004 From: mg at base-pair.com (Michael Griffith) Date: Tue Aug 10 16:32:25 2004 Subject: [BioSQL-l] Re: MORE Oracle BioSQL & BioJava problems In-Reply-To: <5BC16838-EA3A-11D8-A842-000A95AE92B0@gnf.org> Message-ID: Hi Hilmar, The DB target is Oracle 9i (9.2.0.4) running on SUSE Linux 9x. I do believe the problem occurs on sequences that are > 4000 chars. The offending Java code appears to be: PreparedStatement create_biosequence = conn.prepareStatement("insert into biosequence " + "(bioentry_id, version, length, seq, alphabet) " + "values (?, ?, ?, ?, ?)"); String seqstr = seqToke.tokenizeSymbolList(seq); create_biosequence.setCharacterStream(4, new StringReader(seqstr), seqstr.length()); In all Java/Oracle applications we've developed, we've always inserted an empty_clob(), and then updated the clob separately using the record locator. I am a little apprehensive to hack the Opensource code, just because I want to stay in sync with the BioJava releases... Has anyone else using BioJava/BioSQL in Oracle run into this problem? Thanks in advance! MG On 8/9/04 2:28 PM, "Hilmar Lapp" wrote: > Which rel. is the target Oracle DB? > > What is the length of the sequence string causing trouble? If it is > indeed longer than 4000 chars, does the problem disappear when you make > the sequence shorter than 4000 chars? Which JDBC API call is used to > set the sequence string in the biojava language binding? If it is > indeed setString(), what happens if you change that to the streaming > API? > > -hilmar > > On Aug 9, 2004, at 10:08 AM, Michael Griffith wrote: > >> Hilmar, >> >> Thanks for the reply. >> >> Just to make sure I had the latest and greatest JDBC driver, I >> downloaded >> 9.2.0.3 from Oracle's web site. I got the same exact error, in the >> same >> exact order. >> >> I am still puzzled as to what is going on. >> >> MG >> >> >> On 8/9/04 11:25 AM, "Hilmar Lapp" wrote: >> >>> This smells like a problem with one of the LOB columns, which is >>> Anncomment.Comment_Text and Biosequence.Seq, and the stack trace looks >>> like it's the Seq column (which holds the sequence). >>> >>> LOB columns in Oracle need to be streamed if they are over 4000 chars >>> (otherwise the server can do the conversion). I believe the more >>> recent >>> versions of the Oracle JDBC driver do that transparently behind the >>> scenes if you call {set,get}String() on a column that in reality is a >>> LOB. >>> >>> Are you by any chance trying to communicate with a 9i+ database using >>> an 8i driver? >>> >>> -hilmar >>> >>> On Monday, August 9, 2004, at 09:03 AM, Michael Griffith wrote: >>> >>>> Hi, >>>> >>>> I have been trying to get the latest BioSQL (Oracle) and BioJava to >>>> play >>>> nicely -- and I feel that I am close, but I am still getting errors. >>>> I am >>>> trying to read a GenBank file to the Oracle BioSQL schema with the >>>> following >>>> code: >>>> >>>> SequenceDB db = new BioSQLSequenceDB(dbDriver, dbURL, dbUser, dbPass >>>> biodatabase, create); >>>> >>>> SequenceIterator iter = >>>> (SequenceIterator)SeqIOTools.fileToBiojava(format, alpha, >>>> br); >>>> int counter= 0; >>>> >>>> while (iter.hasNext()) { >>>> >>>> Sequence seq = iter.nextSequence(); >>>> >>>> try { >>>> db.addSequence(seq); >>>> } >>>> catch (Exception e) { >>>> e.printStackTrace(); >>>> } >>>> ... >>>> } >>>> >>>> This code works perfectly well with the mySQL version of the bio-sql >>>> schema, >>>> however with the oracle version, I get the following SQLException >>>> stack. >>>> >>>> The loop loads about 65 of the first 70 records, and hangs on record >>>> #71, >>>> every time. What is puzzling, is I have never had any sort of these >>>> kinds >>>> of errors with any other Java/Oracle application. >>>> >>>> [java] org.biojava.bio.BioRuntimeException: Error adding sequence: >>>> NM_019764 >>>> [java] at >>>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSeq >>>> ue >>>> nceDB >>>> .java:498) >>>> [java] Trying to add: NM_021274 to the database -- insert attemp >>>> #:71 >>>> [java] at >>>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequ >>>> en >>>> ceDB. >>>> java:365) >>>> [java] at >>>> com.gts.genebank.GeneralReader.main(GeneralReader.java:74) >>>> [java] Caused by: java.sql.SQLException: No more data to read >>>> from >>>> socket >>>> [java] at >>>> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) >>>> [java] at >>>> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179) >>>> [java] at >>>> oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160) >>>> [java] at >>>> oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:963) >>>> [java] at >>>> oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:893) >>>> [java] at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:369) >>>> [java] at >>>> oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1891) >>>> [java] at >>>> oracle.jdbc.ttc7.TTC7Protocol.parseExecuteFetch(TTC7Protocol.java: >>>> 1093) >>>> [java] at >>>> oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement.ja >>>> va >>>> :2047 >>>> ) >>>> [java] at >>>> oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.jav >>>> a: >>>> 1940) >>>> [java] at >>>> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStateme >>>> nt >>>> .java >>>> :2709) >>>> [java] at >>>> oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePrepar >>>> ed >>>> State >>>> ment.java:589) >>>> [java] at >>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(Del >>>> eg >>>> ating >>>> PreparedStatement.java:233) >>>> [java] at >>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(Del >>>> eg >>>> ating >>>> PreparedStatement.java:233) >>>> [java] at >>>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSeq >>>> ue >>>> nceDB >>>> .java:455) >>>> [java] ... 2 more >>>> >>>> Any help would be greatly appreciated! >>>> >>>> Cheers! >>>> >>>> MG >>>> >>>> >> >> From hlapp at gnf.org Tue Aug 10 16:48:27 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Aug 10 16:49:44 2004 Subject: [BioSQL-l] Re: MORE Oracle BioSQL & BioJava problems In-Reply-To: References: Message-ID: Can you reduce this to a simple test case using a dummy table with e.g. a single column of type CLOB, and then try to insert a value > 4000 chars through the JDBC driver? There's a remote possibility that the driver is trying to do behind the scenes what you describe if it determines the target column to be of a LOB type, and it may fail to determine that when the table is in fact a view (which it is here). To exclude that possibility, do the following: SQL> DROP VIEW biosequence; SQL> RENAME SG_Biosequence TO biosequence; SQL> ALTER TABLE biosequence RENAME ent_oid TO bioentry_id; and then try again the offending sequence. (I don't give this a high likelihood though. What you describe as the way to insert LOBs is e.g. the same way Tim Bunce wrote it for DBD::Oracle, so chances are this is where the problem is.) -hilm On Aug 10, 2004, at 1:29 PM, Michael Griffith wrote: > Hi Hilmar, > > The DB target is Oracle 9i (9.2.0.4) running on SUSE Linux 9x. > > I do believe the problem occurs on sequences that are > 4000 chars. > > The offending Java code appears to be: > > PreparedStatement create_biosequence = conn.prepareStatement("insert > into > biosequence " + "(bioentry_id, version, length, seq, alphabet) " + > "values > (?, ?, ?, ?, ?)"); > > String seqstr = seqToke.tokenizeSymbolList(seq); > create_biosequence.setCharacterStream(4, new StringReader(seqstr), > seqstr.length()); > > In all Java/Oracle applications we've developed, we've always inserted > an > empty_clob(), and then updated the clob separately using the record > locator. > > I am a little apprehensive to hack the Opensource code, just because I > want > to stay in sync with the BioJava releases... > > Has anyone else using BioJava/BioSQL in Oracle run into this problem? > > Thanks in advance! > > MG > > > > On 8/9/04 2:28 PM, "Hilmar Lapp" wrote: > >> Which rel. is the target Oracle DB? >> >> What is the length of the sequence string causing trouble? If it is >> indeed longer than 4000 chars, does the problem disappear when you >> make >> the sequence shorter than 4000 chars? Which JDBC API call is used to >> set the sequence string in the biojava language binding? If it is >> indeed setString(), what happens if you change that to the streaming >> API? >> >> -hilmar >> >> On Aug 9, 2004, at 10:08 AM, Michael Griffith wrote: >> >>> Hilmar, >>> >>> Thanks for the reply. >>> >>> Just to make sure I had the latest and greatest JDBC driver, I >>> downloaded >>> 9.2.0.3 from Oracle's web site. I got the same exact error, in the >>> same >>> exact order. >>> >>> I am still puzzled as to what is going on. >>> >>> MG >>> >>> >>> On 8/9/04 11:25 AM, "Hilmar Lapp" wrote: >>> >>>> This smells like a problem with one of the LOB columns, which is >>>> Anncomment.Comment_Text and Biosequence.Seq, and the stack trace >>>> looks >>>> like it's the Seq column (which holds the sequence). >>>> >>>> LOB columns in Oracle need to be streamed if they are over 4000 >>>> chars >>>> (otherwise the server can do the conversion). I believe the more >>>> recent >>>> versions of the Oracle JDBC driver do that transparently behind the >>>> scenes if you call {set,get}String() on a column that in reality is >>>> a >>>> LOB. >>>> >>>> Are you by any chance trying to communicate with a 9i+ database >>>> using >>>> an 8i driver? >>>> >>>> -hilmar >>>> >>>> On Monday, August 9, 2004, at 09:03 AM, Michael Griffith wrote: >>>> >>>>> Hi, >>>>> >>>>> I have been trying to get the latest BioSQL (Oracle) and BioJava to >>>>> play >>>>> nicely -- and I feel that I am close, but I am still getting >>>>> errors. >>>>> I am >>>>> trying to read a GenBank file to the Oracle BioSQL schema with the >>>>> following >>>>> code: >>>>> >>>>> SequenceDB db = new BioSQLSequenceDB(dbDriver, dbURL, dbUser, >>>>> dbPass >>>>> biodatabase, create); >>>>> >>>>> SequenceIterator iter = >>>>> (SequenceIterator)SeqIOTools.fileToBiojava(format, >>>>> alpha, >>>>> br); >>>>> int counter= 0; >>>>> >>>>> while (iter.hasNext()) { >>>>> >>>>> Sequence seq = iter.nextSequence(); >>>>> >>>>> try { >>>>> db.addSequence(seq); >>>>> } >>>>> catch (Exception e) { >>>>> e.printStackTrace(); >>>>> } >>>>> ... >>>>> } >>>>> >>>>> This code works perfectly well with the mySQL version of the >>>>> bio-sql >>>>> schema, >>>>> however with the oracle version, I get the following SQLException >>>>> stack. >>>>> >>>>> The loop loads about 65 of the first 70 records, and hangs on >>>>> record >>>>> #71, >>>>> every time. What is puzzling, is I have never had any sort of >>>>> these >>>>> kinds >>>>> of errors with any other Java/Oracle application. >>>>> >>>>> [java] org.biojava.bio.BioRuntimeException: Error adding sequence: >>>>> NM_019764 >>>>> [java] at >>>>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLS >>>>> eq >>>>> ue >>>>> nceDB >>>>> .java:498) >>>>> [java] Trying to add: NM_021274 to the database -- insert >>>>> attemp >>>>> #:71 >>>>> [java] at >>>>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSe >>>>> qu >>>>> en >>>>> ceDB. >>>>> java:365) >>>>> [java] at >>>>> com.gts.genebank.GeneralReader.main(GeneralReader.java:74) >>>>> [java] Caused by: java.sql.SQLException: No more data to read >>>>> from >>>>> socket >>>>> [java] at >>>>> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) >>>>> [java] at >>>>> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179) >>>>> [java] at >>>>> oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160) >>>>> [java] at >>>>> oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:963) >>>>> [java] at >>>>> oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:893) >>>>> [java] at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:369) >>>>> [java] at >>>>> oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1891) >>>>> [java] at >>>>> oracle.jdbc.ttc7.TTC7Protocol.parseExecuteFetch(TTC7Protocol.java: >>>>> 1093) >>>>> [java] at >>>>> oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement. >>>>> ja >>>>> va >>>>> :2047 >>>>> ) >>>>> [java] at >>>>> oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.j >>>>> av >>>>> a: >>>>> 1940) >>>>> [java] at >>>>> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleState >>>>> me >>>>> nt >>>>> .java >>>>> :2709) >>>>> [java] at >>>>> oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePrep >>>>> ar >>>>> ed >>>>> State >>>>> ment.java:589) >>>>> [java] at >>>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(D >>>>> el >>>>> eg >>>>> ating >>>>> PreparedStatement.java:233) >>>>> [java] at >>>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(D >>>>> el >>>>> eg >>>>> ating >>>>> PreparedStatement.java:233) >>>>> [java] at >>>>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLS >>>>> eq >>>>> ue >>>>> nceDB >>>>> .java:455) >>>>> [java] ... 2 more >>>>> >>>>> Any help would be greatly appreciated! >>>>> >>>>> Cheers! >>>>> >>>>> MG >>>>> >>>>> >>> >>> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mg at base-pair.com Tue Aug 17 11:39:18 2004 From: mg at base-pair.com (Michael Griffith) Date: Tue Aug 17 11:42:46 2004 Subject: [BioSQL-l] Re: MORE Oracle BioSQL & BioJava problems In-Reply-To: Message-ID: Hilmar, The problem is definitely related to clob handling. I adjusted the schema with the sql statements below, and ran the test. It made no difference. I think my only hope of getting this to work is if the BioJava group recognizes this as a bug and changes the code to first insert an empty_clob(), then to update the clob using the record locator. MG On 8/10/04 3:48 PM, "Hilmar Lapp" wrote: > Can you reduce this to a simple test case using a dummy table with e.g. > a single column of type CLOB, and then try to insert a value > 4000 > chars through the JDBC driver? > > There's a remote possibility that the driver is trying to do behind the > scenes what you describe if it determines the target column to be of a > LOB type, and it may fail to determine that when the table is in fact a > view (which it is here). To exclude that possibility, do the following: > > SQL> DROP VIEW biosequence; > SQL> RENAME SG_Biosequence TO biosequence; > SQL> ALTER TABLE biosequence RENAME ent_oid TO bioentry_id; > > and then try again the offending sequence. (I don't give this a high > likelihood though. What you describe as the way to insert LOBs is e.g. > the same way Tim Bunce wrote it for DBD::Oracle, so chances are this is > where the problem is.) > > -hilm > > On Aug 10, 2004, at 1:29 PM, Michael Griffith wrote: > >> Hi Hilmar, >> >> The DB target is Oracle 9i (9.2.0.4) running on SUSE Linux 9x. >> >> I do believe the problem occurs on sequences that are > 4000 chars. >> >> The offending Java code appears to be: >> >> PreparedStatement create_biosequence = conn.prepareStatement("insert >> into >> biosequence " + "(bioentry_id, version, length, seq, alphabet) " + >> "values >> (?, ?, ?, ?, ?)"); >> >> String seqstr = seqToke.tokenizeSymbolList(seq); >> create_biosequence.setCharacterStream(4, new StringReader(seqstr), >> seqstr.length()); >> >> In all Java/Oracle applications we've developed, we've always inserted >> an >> empty_clob(), and then updated the clob separately using the record >> locator. >> >> I am a little apprehensive to hack the Opensource code, just because I >> want >> to stay in sync with the BioJava releases... >> >> Has anyone else using BioJava/BioSQL in Oracle run into this problem? >> >> Thanks in advance! >> >> MG >> >> >> >> On 8/9/04 2:28 PM, "Hilmar Lapp" wrote: >> >>> Which rel. is the target Oracle DB? >>> >>> What is the length of the sequence string causing trouble? If it is >>> indeed longer than 4000 chars, does the problem disappear when you >>> make >>> the sequence shorter than 4000 chars? Which JDBC API call is used to >>> set the sequence string in the biojava language binding? If it is >>> indeed setString(), what happens if you change that to the streaming >>> API? >>> >>> -hilmar >>> >>> On Aug 9, 2004, at 10:08 AM, Michael Griffith wrote: >>> >>>> Hilmar, >>>> >>>> Thanks for the reply. >>>> >>>> Just to make sure I had the latest and greatest JDBC driver, I >>>> downloaded >>>> 9.2.0.3 from Oracle's web site. I got the same exact error, in the >>>> same >>>> exact order. >>>> >>>> I am still puzzled as to what is going on. >>>> >>>> MG >>>> >>>> >>>> On 8/9/04 11:25 AM, "Hilmar Lapp" wrote: >>>> >>>>> This smells like a problem with one of the LOB columns, which is >>>>> Anncomment.Comment_Text and Biosequence.Seq, and the stack trace >>>>> looks >>>>> like it's the Seq column (which holds the sequence). >>>>> >>>>> LOB columns in Oracle need to be streamed if they are over 4000 >>>>> chars >>>>> (otherwise the server can do the conversion). I believe the more >>>>> recent >>>>> versions of the Oracle JDBC driver do that transparently behind the >>>>> scenes if you call {set,get}String() on a column that in reality is >>>>> a >>>>> LOB. >>>>> >>>>> Are you by any chance trying to communicate with a 9i+ database >>>>> using >>>>> an 8i driver? >>>>> >>>>> -hilmar >>>>> >>>>> On Monday, August 9, 2004, at 09:03 AM, Michael Griffith wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have been trying to get the latest BioSQL (Oracle) and BioJava to >>>>>> play >>>>>> nicely -- and I feel that I am close, but I am still getting >>>>>> errors. >>>>>> I am >>>>>> trying to read a GenBank file to the Oracle BioSQL schema with the >>>>>> following >>>>>> code: >>>>>> >>>>>> SequenceDB db = new BioSQLSequenceDB(dbDriver, dbURL, dbUser, >>>>>> dbPass >>>>>> biodatabase, create); >>>>>> >>>>>> SequenceIterator iter = >>>>>> (SequenceIterator)SeqIOTools.fileToBiojava(format, >>>>>> alpha, >>>>>> br); >>>>>> int counter= 0; >>>>>> >>>>>> while (iter.hasNext()) { >>>>>> >>>>>> Sequence seq = iter.nextSequence(); >>>>>> >>>>>> try { >>>>>> db.addSequence(seq); >>>>>> } >>>>>> catch (Exception e) { >>>>>> e.printStackTrace(); >>>>>> } >>>>>> ... >>>>>> } >>>>>> >>>>>> This code works perfectly well with the mySQL version of the >>>>>> bio-sql >>>>>> schema, >>>>>> however with the oracle version, I get the following SQLException >>>>>> stack. >>>>>> >>>>>> The loop loads about 65 of the first 70 records, and hangs on >>>>>> record >>>>>> #71, >>>>>> every time. What is puzzling, is I have never had any sort of >>>>>> these >>>>>> kinds >>>>>> of errors with any other Java/Oracle application. >>>>>> >>>>>> [java] org.biojava.bio.BioRuntimeException: Error adding sequence: >>>>>> NM_019764 >>>>>> [java] at >>>>>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLS >>>>>> eq >>>>>> ue >>>>>> nceDB >>>>>> .java:498) >>>>>> [java] Trying to add: NM_021274 to the database -- insert >>>>>> attemp >>>>>> #:71 >>>>>> [java] at >>>>>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSe >>>>>> qu >>>>>> en >>>>>> ceDB. >>>>>> java:365) >>>>>> [java] at >>>>>> com.gts.genebank.GeneralReader.main(GeneralReader.java:74) >>>>>> [java] Caused by: java.sql.SQLException: No more data to read >>>>>> from >>>>>> socket >>>>>> [java] at >>>>>> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) >>>>>> [java] at >>>>>> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179) >>>>>> [java] at >>>>>> oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160) >>>>>> [java] at >>>>>> oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:963) >>>>>> [java] at >>>>>> oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:893) >>>>>> [java] at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:369) >>>>>> [java] at >>>>>> oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1891) >>>>>> [java] at >>>>>> oracle.jdbc.ttc7.TTC7Protocol.parseExecuteFetch(TTC7Protocol.java: >>>>>> 1093) >>>>>> [java] at >>>>>> oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement. >>>>>> ja >>>>>> va >>>>>> :2047 >>>>>> ) >>>>>> [java] at >>>>>> oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.j >>>>>> av >>>>>> a: >>>>>> 1940) >>>>>> [java] at >>>>>> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleState >>>>>> me >>>>>> nt >>>>>> .java >>>>>> :2709) >>>>>> [java] at >>>>>> oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePrep >>>>>> ar >>>>>> ed >>>>>> State >>>>>> ment.java:589) >>>>>> [java] at >>>>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(D >>>>>> el >>>>>> eg >>>>>> ating >>>>>> PreparedStatement.java:233) >>>>>> [java] at >>>>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(D >>>>>> el >>>>>> eg >>>>>> ating >>>>>> PreparedStatement.java:233) >>>>>> [java] at >>>>>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLS >>>>>> eq >>>>>> ue >>>>>> nceDB >>>>>> .java:455) >>>>>> [java] ... 2 more >>>>>> >>>>>> Any help would be greatly appreciated! >>>>>> >>>>>> Cheers! >>>>>> >>>>>> MG >>>>>> >>>>>> >>>> >>>> >> >> From hlapp at gnf.org Tue Aug 17 12:07:24 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Aug 17 12:08:06 2004 Subject: [BioSQL-l] Re: MORE Oracle BioSQL & BioJava problems In-Reply-To: Message-ID: <874AC3D7-F067-11D8-8FB9-000A959EB4C4@gnf.org> I can imagine that the biojava folks will be very appreciative if you can submit a patch, also given that you successfully solved this very problem before :-) I believe the biojava folks don't have a notion of different driver code for different RDBMSs, so that fix may need to be enclosed in something that tests for Oracle being the JDBC driver. -hilmar On Tuesday, August 17, 2004, at 08:39 AM, Michael Griffith wrote: > Hilmar, > > > The problem is definitely related to clob handling. I adjusted the > schema > with the sql statements below, and ran the test. It made no difference. > > I think my only hope of getting this to work is if the BioJava group > recognizes this as a bug and changes the code to first insert an > empty_clob(), then to update the clob using the record locator. > > MG > > > On 8/10/04 3:48 PM, "Hilmar Lapp" wrote: > >> Can you reduce this to a simple test case using a dummy table with >> e.g. >> a single column of type CLOB, and then try to insert a value > 4000 >> chars through the JDBC driver? >> >> There's a remote possibility that the driver is trying to do behind >> the >> scenes what you describe if it determines the target column to be of a >> LOB type, and it may fail to determine that when the table is in fact >> a >> view (which it is here). To exclude that possibility, do the >> following: >> >> SQL> DROP VIEW biosequence; >> SQL> RENAME SG_Biosequence TO biosequence; >> SQL> ALTER TABLE biosequence RENAME ent_oid TO bioentry_id; >> >> and then try again the offending sequence. (I don't give this a high >> likelihood though. What you describe as the way to insert LOBs is e.g. >> the same way Tim Bunce wrote it for DBD::Oracle, so chances are this >> is >> where the problem is.) >> >> -hilm >> >> On Aug 10, 2004, at 1:29 PM, Michael Griffith wrote: >> >>> Hi Hilmar, >>> >>> The DB target is Oracle 9i (9.2.0.4) running on SUSE Linux 9x. >>> >>> I do believe the problem occurs on sequences that are > 4000 chars. >>> >>> The offending Java code appears to be: >>> >>> PreparedStatement create_biosequence = conn.prepareStatement("insert >>> into >>> biosequence " + "(bioentry_id, version, length, seq, alphabet) " + >>> "values >>> (?, ?, ?, ?, ?)"); >>> >>> String seqstr = seqToke.tokenizeSymbolList(seq); >>> create_biosequence.setCharacterStream(4, new StringReader(seqstr), >>> seqstr.length()); >>> >>> In all Java/Oracle applications we've developed, we've always >>> inserted >>> an >>> empty_clob(), and then updated the clob separately using the record >>> locator. >>> >>> I am a little apprehensive to hack the Opensource code, just because >>> I >>> want >>> to stay in sync with the BioJava releases... >>> >>> Has anyone else using BioJava/BioSQL in Oracle run into this problem? >>> >>> Thanks in advance! >>> >>> MG >>> >>> >>> >>> On 8/9/04 2:28 PM, "Hilmar Lapp" wrote: >>> >>>> Which rel. is the target Oracle DB? >>>> >>>> What is the length of the sequence string causing trouble? If it is >>>> indeed longer than 4000 chars, does the problem disappear when you >>>> make >>>> the sequence shorter than 4000 chars? Which JDBC API call is used to >>>> set the sequence string in the biojava language binding? If it is >>>> indeed setString(), what happens if you change that to the streaming >>>> API? >>>> >>>> -hilmar >>>> >>>> On Aug 9, 2004, at 10:08 AM, Michael Griffith wrote: >>>> >>>>> Hilmar, >>>>> >>>>> Thanks for the reply. >>>>> >>>>> Just to make sure I had the latest and greatest JDBC driver, I >>>>> downloaded >>>>> 9.2.0.3 from Oracle's web site. I got the same exact error, in the >>>>> same >>>>> exact order. >>>>> >>>>> I am still puzzled as to what is going on. >>>>> >>>>> MG >>>>> >>>>> >>>>> On 8/9/04 11:25 AM, "Hilmar Lapp" wrote: >>>>> >>>>>> This smells like a problem with one of the LOB columns, which is >>>>>> Anncomment.Comment_Text and Biosequence.Seq, and the stack trace >>>>>> looks >>>>>> like it's the Seq column (which holds the sequence). >>>>>> >>>>>> LOB columns in Oracle need to be streamed if they are over 4000 >>>>>> chars >>>>>> (otherwise the server can do the conversion). I believe the more >>>>>> recent >>>>>> versions of the Oracle JDBC driver do that transparently behind >>>>>> the >>>>>> scenes if you call {set,get}String() on a column that in reality >>>>>> is >>>>>> a >>>>>> LOB. >>>>>> >>>>>> Are you by any chance trying to communicate with a 9i+ database >>>>>> using >>>>>> an 8i driver? >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On Monday, August 9, 2004, at 09:03 AM, Michael Griffith wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have been trying to get the latest BioSQL (Oracle) and BioJava >>>>>>> to >>>>>>> play >>>>>>> nicely -- and I feel that I am close, but I am still getting >>>>>>> errors. >>>>>>> I am >>>>>>> trying to read a GenBank file to the Oracle BioSQL schema with >>>>>>> the >>>>>>> following >>>>>>> code: >>>>>>> >>>>>>> SequenceDB db = new BioSQLSequenceDB(dbDriver, dbURL, dbUser, >>>>>>> dbPass >>>>>>> biodatabase, create); >>>>>>> >>>>>>> SequenceIterator iter = >>>>>>> (SequenceIterator)SeqIOTools.fileToBiojava(format, >>>>>>> alpha, >>>>>>> br); >>>>>>> int counter= 0; >>>>>>> >>>>>>> while (iter.hasNext()) { >>>>>>> >>>>>>> Sequence seq = iter.nextSequence(); >>>>>>> >>>>>>> try { >>>>>>> db.addSequence(seq); >>>>>>> } >>>>>>> catch (Exception e) { >>>>>>> e.printStackTrace(); >>>>>>> } >>>>>>> ... >>>>>>> } >>>>>>> >>>>>>> This code works perfectly well with the mySQL version of the >>>>>>> bio-sql >>>>>>> schema, >>>>>>> however with the oracle version, I get the following SQLException >>>>>>> stack. >>>>>>> >>>>>>> The loop loads about 65 of the first 70 records, and hangs on >>>>>>> record >>>>>>> #71, >>>>>>> every time. What is puzzling, is I have never had any sort of >>>>>>> these >>>>>>> kinds >>>>>>> of errors with any other Java/Oracle application. >>>>>>> >>>>>>> [java] org.biojava.bio.BioRuntimeException: Error adding >>>>>>> sequence: >>>>>>> NM_019764 >>>>>>> [java] at >>>>>>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQ >>>>>>> LS >>>>>>> eq >>>>>>> ue >>>>>>> nceDB >>>>>>> .java:498) >>>>>>> [java] Trying to add: NM_021274 to the database -- insert >>>>>>> attemp >>>>>>> #:71 >>>>>>> [java] at >>>>>>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQL >>>>>>> Se >>>>>>> qu >>>>>>> en >>>>>>> ceDB. >>>>>>> java:365) >>>>>>> [java] at >>>>>>> com.gts.genebank.GeneralReader.main(GeneralReader.java:74) >>>>>>> [java] Caused by: java.sql.SQLException: No more data to >>>>>>> read >>>>>>> from >>>>>>> socket >>>>>>> [java] at >>>>>>> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) >>>>>>> [java] at >>>>>>> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179) >>>>>>> [java] at >>>>>>> oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160) >>>>>>> [java] at >>>>>>> oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:963) >>>>>>> [java] at >>>>>>> oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:893) >>>>>>> [java] at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:369) >>>>>>> [java] at >>>>>>> oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1891) >>>>>>> [java] at >>>>>>> oracle.jdbc.ttc7.TTC7Protocol.parseExecuteFetch(TTC7Protocol.java >>>>>>> : >>>>>>> 1093) >>>>>>> [java] at >>>>>>> oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatemen >>>>>>> t. >>>>>>> ja >>>>>>> va >>>>>>> :2047 >>>>>>> ) >>>>>>> [java] at >>>>>>> oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement >>>>>>> .j >>>>>>> av >>>>>>> a: >>>>>>> 1940) >>>>>>> [java] at >>>>>>> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleSta >>>>>>> te >>>>>>> me >>>>>>> nt >>>>>>> .java >>>>>>> :2709) >>>>>>> [java] at >>>>>>> oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePr >>>>>>> ep >>>>>>> ar >>>>>>> ed >>>>>>> State >>>>>>> ment.java:589) >>>>>>> [java] at >>>>>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate >>>>>>> (D >>>>>>> el >>>>>>> eg >>>>>>> ating >>>>>>> PreparedStatement.java:233) >>>>>>> [java] at >>>>>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate >>>>>>> (D >>>>>>> el >>>>>>> eg >>>>>>> ating >>>>>>> PreparedStatement.java:233) >>>>>>> [java] at >>>>>>> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQ >>>>>>> LS >>>>>>> eq >>>>>>> ue >>>>>>> nceDB >>>>>>> .java:455) >>>>>>> [java] ... 2 more >>>>>>> >>>>>>> Any help would be greatly appreciated! >>>>>>> >>>>>>> Cheers! >>>>>>> >>>>>>> MG >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From rbauer at informatik.hu-berlin.de Wed Aug 18 03:52:59 2004 From: rbauer at informatik.hu-berlin.de (Raphael A. Bauer) Date: Wed Aug 18 03:53:54 2004 Subject: [BioSQL-l] Swissprot Problems Message-ID: <41230ADB.60707@informatik.hu-berlin.de> Hi, just an interesting thing from Swiss-Prot: If we want to load the latest Swiss-Prot flatfile with load_seqdatabase.pl we get the following error (normally our load_seqdatabase.pl works fine): -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were ("",""A multicenter comparison of methods for typing strains of Pseudomonas aeruginosa predominantly from patients with cystic fibrosis."","J. Infect. Dis. 169:134-142(1994).","CRC-237261AF859664D3","","") FKs (613823) ERROR: null value in column "authors" violates not-null constraint --------------------------------------------------- Could not store Q53391: ------------- EXCEPTION ------------- ... And that is what I would expect, because Q53391 has no RA line. The Swiss-Prot manual says: RG Reference group Once or more (Optional if RA line) RA Reference authors Once or more (Optional if RG line) ... ... so I don't know how one can deal with this, because it is a clear violation of the Swiss-Prot manual statements and therefore a violation of the BioSQL schema definition (authors NOT NULL)... We will remove the NOT NULL statements from the authors line in the BioSQL schema to deal with this.. Any better ideas? Raphael.... From matthew_pocock at yahoo.co.uk Wed Aug 18 06:53:41 2004 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Wed Aug 18 06:54:45 2004 Subject: [BioSQL-l] Re: MORE Oracle BioSQL & BioJava problems In-Reply-To: <874AC3D7-F067-11D8-8FB9-000A959EB4C4@gnf.org> References: <874AC3D7-F067-11D8-8FB9-000A959EB4C4@gnf.org> Message-ID: <41233535.4000701@yahoo.co.uk> Hi, Hilmar Lapp wrote: > I can imagine that the biojava folks will be very appreciative if you > can submit a patch, also given that you successfully solved this very > problem before :-) We would. > > I believe the biojava folks don't have a notion of different driver > code for different RDBMSs, so that fix may need to be enclosed in > something that tests for Oracle being the JDBC driver. There is a class for each odbc driver - a DBHelper implementation. You could add the hooks to that interface for initialising strings. Anybody think this is silly? Matthew > > -hilmar From hlapp at gnf.org Wed Aug 18 13:08:35 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Aug 18 13:08:59 2004 Subject: [BioSQL-l] Swissprot Problems In-Reply-To: <41230ADB.60707@informatik.hu-berlin.de> Message-ID: <3D4AF9AA-F139-11D8-A71D-000A959EB4C4@gnf.org> On Wednesday, August 18, 2004, at 12:52 AM, Raphael A. Bauer wrote: > We will remove the NOT NULL statements from the authors line in the > BioSQL schema to deal with this.. > Yep, I'll do this in the repository too. > Any better ideas? > No. There's not much you can do if people violate their own specs. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From dhoworth at mrc-lmb.cam.ac.uk Thu Aug 19 04:38:59 2004 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Thu Aug 19 04:39:48 2004 Subject: [BioSQL-l] Swissprot Problems In-Reply-To: <41230ADB.60707@informatik.hu-berlin.de> References: <41230ADB.60707@informatik.hu-berlin.de> Message-ID: <41246723.6080708@mrc-lmb.cam.ac.uk> Raphael A. Bauer wrote: > just an interesting thing from Swiss-Prot: > If we want to load the latest Swiss-Prot flatfile with > load_seqdatabase.pl we get the following error (normally our > load_seqdatabase.pl works fine): > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values > were > ("",""A multicenter comparison of methods for typing strains of Pseudomonas > aeruginosa predominantly from patients with cystic fibrosis."","J. Infect. > Dis. 169:134-142(1994).","CRC-237261AF859664D3","","") FKs (613823) > ERROR: null value in column "authors" violates not-null constraint > --------------------------------------------------- > Could not store Q53391: > ------------- EXCEPTION ------------- > ... > > And that is what I would expect, because Q53391 has no RA line. > The Swiss-Prot manual says: > > RG Reference group Once or more (Optional if RA line) > RA Reference authors Once or more (Optional if RG line) > ... > ... > so I don't know how one can deal with this, because it is a clear > violation of the Swiss-Prot manual statements and therefore a violation > of the BioSQL schema definition (authors NOT NULL)... > > We will remove the NOT NULL statements from the authors line in the > BioSQL schema to deal with this.. Hilmar Lapp replied: >> We will remove the NOT NULL statements from the authors line in the >> BioSQL schema to deal with this.. > Yep, I'll do this in the repository too. I'm a little confused by this. I'm interested in learning a bit about these entries so I went to browse the entry The relevant section seems to be: RN [1] RP SEQUENCE FROM N.A. RC STRAIN=KB7; RX MEDLINE=94103636; PubMed=7903973; RG INTERNATIONAL PSEUDOMONAS AERUGINOSA TYPING STUDY GROUP; RT "A multicenter comparison of methods for typing strains of Pseudomonas RT aeruginosa predominantly from patients with cystic fibrosis."; RL J. Infect. Dis. 169:134-142(1994). Then I went to the user manual where the relevant text seems to be: 3.10.5. The RG line The Reference Group (RG) line lists the consortium name associated with a given citation. The RG line is mainly used in submission reference blocks, but can also be used in paper references, if the working group is cited as an author in the paper. RG line and RA line (Reference Author) can be present in the same reference block; at least one RG or RA line is mandatory per reference block. An example of the use of RG lines is shown below: RG The mouse genome sequencing consortium; 3.10.6. The RA line The RA (Reference Author) lines list the authors of the paper (or other work) cited. The RA line is present in most references, but might be missing in references that cite a reference group (see RG line). At least one RG or RA line is mandatory per reference block. -------------------- So it seems to me that the record is valid according to the spec and records do not need to have an RA line if they do have an RG line. It is probably appropriate to use the value of the RG field as the authors field in the database. Or am I missing something? >> Any better ideas? > No. There's not much you can do if people violate their own specs. There is another possible way to deal with errant records that violate the spec. That is to maintain an exception dictionary. That is, for each record that would fail validation, make a curated patch that can be applied to the record before validation. Clearly this can be a lot of work unless the initial record quality is already high. Submitting the exceptions back to the originating institution is good to do as well :) Cheers, Dave -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From hlapp at gnf.org Thu Aug 19 15:51:19 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Aug 19 15:52:15 2004 Subject: [BioSQL-l] Swissprot Problems In-Reply-To: <41246723.6080708@mrc-lmb.cam.ac.uk> References: <41230ADB.60707@informatik.hu-berlin.de> <41246723.6080708@mrc-lmb.cam.ac.uk> Message-ID: <239FAD60-F219-11D8-AD45-000A95AE92B0@gnf.org> Thanks, this is helpful. So it appears we need to add recognition of the RG line to the swissprot SeqIO parser. I just committed a fix to the main trunk. There should be a test as well, didn't have the time yet - everybody feel free to step in. I also added code to deal with this on writing out and therefore also to Bio::Annotation::Reference. I was being lazy and didn't check the manual myself, and sure enough paid the price ... -hilmar On Aug 19, 2004, at 1:38 AM, Dave Howorth wrote: > Raphael A. Bauer wrote: >> just an interesting thing from Swiss-Prot: >> If we want to load the latest Swiss-Prot flatfile with >> load_seqdatabase.pl we get the following error (normally our >> load_seqdatabase.pl works fine): >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, >> values >> were >> ("",""A multicenter comparison of methods for typing strains of >> Pseudomonas >> aeruginosa predominantly from patients with cystic fibrosis."","J. >> Infect. >> Dis. 169:134-142(1994).","CRC-237261AF859664D3","","") FKs (613823) >> ERROR: null value in column "authors" violates not-null constraint >> --------------------------------------------------- >> Could not store Q53391: >> ------------- EXCEPTION ------------- >> ... >> And that is what I would expect, because Q53391 has no RA line. >> The Swiss-Prot manual says: >> RG Reference group Once or more (Optional if RA line) RA >> Reference authors Once or more (Optional if RG line) >> ... >> ... >> so I don't know how one can deal with this, because it is a clear >> violation of the Swiss-Prot manual statements and therefore a >> violation >> of the BioSQL schema definition (authors NOT NULL)... >> We will remove the NOT NULL statements from the authors line in the >> BioSQL schema to deal with this.. > > Hilmar Lapp replied: > >> We will remove the NOT NULL statements from the authors line in the > >> BioSQL schema to deal with this.. > > Yep, I'll do this in the repository too. > > > I'm a little confused by this. I'm interested in learning a bit about > these entries so I went to browse the entry > proteinId=FMK7_PSEAE&pager.offset=0> > The relevant section seems to be: > > RN [1] > RP SEQUENCE FROM N.A. > RC STRAIN=KB7; > RX MEDLINE=94103636; PubMed=7903973; > RG INTERNATIONAL PSEUDOMONAS AERUGINOSA TYPING STUDY GROUP; > RT "A multicenter comparison of methods for typing strains of > Pseudomonas > RT aeruginosa predominantly from patients with cystic fibrosis."; > RL J. Infect. Dis. 169:134-142(1994). > > Then I went to the user manual > where the relevant > text seems to be: > > 3.10.5. The RG line > > The Reference Group (RG) line lists the consortium name associated > with a given citation. The RG line is mainly used in submission > reference blocks, but can also be used in paper references, if the > working group is cited as an author in the paper. RG line and RA line > (Reference Author) can be present in the same reference block; at > least one RG or RA line is mandatory per reference block. An example > of the use of RG lines is shown below: > > RG The mouse genome sequencing consortium; > > 3.10.6. The RA line > > The RA (Reference Author) lines list the authors of the paper (or > other work) cited. The RA line is present in most references, but > might be missing in references that cite a reference group (see RG > line). At least one RG or RA line is mandatory per reference block. > -------------------- > > So it seems to me that the record is valid according to the spec and > records do not need to have an RA line if they do have an RG line. It > is probably appropriate to use the value of the RG field as the > authors field in the database. Or am I missing something? > > > >> Any better ideas? > > No. There's not much you can do if people violate their own specs. > > There is another possible way to deal with errant records that violate > the spec. That is to maintain an exception dictionary. That is, for > each record that would fail validation, make a curated patch that can > be applied to the record before validation. Clearly this can be a lot > of work unless the initial record quality is already high. Submitting > the exceptions back to the originating institution is good to do as > well :) > > Cheers, Dave > -- > Dave Howorth > MRC Centre for Protein Engineering > Hills Road, Cambridge, CB2 2QH > 01223 252960 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From rbauer at informatik.hu-berlin.de Fri Aug 20 05:31:16 2004 From: rbauer at informatik.hu-berlin.de (Raphael A. Bauer) Date: Fri Aug 20 05:32:09 2004 Subject: [BioSQL-l] Swissprot Problems In-Reply-To: <41246723.6080708@mrc-lmb.cam.ac.uk> References: <41230ADB.60707@informatik.hu-berlin.de> <41246723.6080708@mrc-lmb.cam.ac.uk> Message-ID: <4125C4E4.4040603@informatik.hu-berlin.de> Dave Howorth wrote: .. > The RA (Reference Author) lines list the authors of the paper (or other > work) cited. The RA line is present in most references, but might be > missing in references that cite a reference group (see RG line). At > least one RG or RA line is mandatory per reference block. > -------------------- > > So it seems to me that the record is valid according to the spec and > records do not need to have an RA line if they do have an RG line. It is > probably appropriate to use the value of the RG field as the authors > field in the database. Or am I missing something? I am a little bit cofused about myself - but that's okay... I cited exactly that in my first mail: RG Reference group Once or more (Optional if RA line) RA Reference authors Once or more (Optional if RG line) But I skipped that "Optional if R* line" brackets. Dave is completely right, and these entries are within the Swiss-Prot specs... Sorry about my wrong conclusion... Raphael From mg at base-pair.com Tue Aug 24 10:55:43 2004 From: mg at base-pair.com (Michael Griffith) Date: Tue Aug 24 10:59:08 2004 Subject: [BioSQL-l] FW: [Biojava-dev] Re: MORE Oracle BioSQL & BioJava problems In-Reply-To: Message-ID: Hello all, For some reason this message was rejected by the moderator of the bio-sql mail list. I believe it may because I had a zip file attachment of the bio-sql/oracle scripts attached. I was asked by the moderator to resend it. I am doing so, and posting to the bioJava group as well. I would like to help Shepard this problem to resolution, but have had no real experience working on Open source projects to make fixes, etc. Please let me know what else I may do. Cheers! MG ------ Forwarded Message From: Michael Griffith Date: Thu, 19 Aug 2004 16:00:02 -0500 To: Michael Heuer Cc: , Subject: Re: [Biojava-dev] Re: MORE Oracle BioSQL & BioJava problems Michael, The BioSQL DDL is the current version that is in the CVS for bio-sql project. It is also attached here as a zip file. In the create objects section there is a missing table that needs to get created after all the other objects are created: CREATE TABLE term_relationship_term ( term_relationship_id int NOT NULL, term_id int NOT NULL ) In the class org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.java, at line 446, the Prepare statement reads: PreparedStatement create_biosequence = conn.prepareStatement("insert into biosequence " + "(bioentry_id, version, length, seq, alphabet) " + "values(?, ?, ?, ?, ?)"); I think it should be something like: PreparedStatement create_biosequence = conn.prepareStatement("insert into biosequence " + "(bioentry_id, version, length, seq, alphabet) " + "values(?, ?, ?, empty_clob(), ?)"); Then -- I think you can do this with the JDBC clob classes, but I have always used the Oracle JDBC/CLOB classes. I would think you'd want to keep all the oracle stuff segregated out... After your insert is successful in biosequence, you'll want to update the clob table. You have to select the record for update using the newly inserted id. String sqlLock= "SELECT seq FROM biosequence WHERE bioentry_id = ? FOR UPDATE"; String sqlUpdate="UPDATE biosequence SET seq=? where bioentry_id=?"; try { // First get the locator (necessary for Oracle) // // Get newly created record from DB stNew = conn.prepareStatement(sqlLock); stNew.setInt(1, bioentry_id); rsNew = stNew.executeQuery(); rsNew.next(); oracle.sql.CLOB dbClob = (oracle.sql.CLOB) rsNew.getClob(1); pst = conn.prepareStatement(sqlUpdate); dbClob.putString(1, seqstr); pst.setClob(1, dbClob); pst.setInt(2, bioentry_id); result = pst.executeUpdate(); if (result < 0) { conn.rollback(); } else { conn.commit(); } } catch (Exception e) { // Log the error... } finally { try { // Close all resources... // } catch (Exception e) { // Log any errors } } } On 8/18/04 2:49 PM, "Michael Heuer" wrote: > Hello Michael, > > I found a couple of instances of Oracle on which to test, 8.1.7.4.0 on > Solaris and 9.2.0.1.0 on Solaris. > > If you could send me your version of the biosql DDL and a source code > example of what you mean by > >> In all Java/Oracle applications we've developed, we've always inserted >> an empty_clob(), and then updated the clob separately using the record >> locator. > > then I'll give it a go. > > Or alternatively, if you know what changes need to be made to the biojava > codebase, then you just need to do > > $ cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/biojava login > (when prompted, the password is 'cvs') > > $ cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/biojava co > biojava-live > $ cd biojava-live > > (make changes to source code) > > $ ant > (to make sure it builds properly) > > $ cvs diff -u . > patch.txt > (to generate the patch) > > michael > > ------ End of Forwarded Message From jlegato at helix.nih.gov Tue Aug 24 15:01:10 2004 From: jlegato at helix.nih.gov (John Legato) Date: Tue Aug 24 15:02:16 2004 Subject: [BioSQL-l] BioSQL with Oracle 10g Message-ID: I am having trouble installing the BioSQL schema under Oracle 10g. I am running into permission problems. I have modified BS-defs.sql and created BS-defs-local.sql. I have created CB_MEMBER,sg_user,sg_loader,sg_admin and cb_user roles, I have not created any users as I assumed BS-create-all would take care of that. I have given biosql_owner SYSDA privs in an attempt to diagnose the problem but I am still getting permission errors such as: CREATE PUBLIC SYNONYM SGLD_BIOENTRIES FOR SGLD_BIOENTRIES * ERROR at line 1: ORA-01031: insufficient privileges I am using biosql checked out from CVS as of today. I suspect I am missing some permissions I need to set before running BS-create-all, which I am running as sysdba. I've include BS-defs-local and BS-create-all below. What have I overlooked? John BS-defs-local: -- where do the datafiles for the tablespaces go define datalocation='/u02/oradata/coredb' -- how do you want to name the table tablespace define biosql_data=SYMGENE_DATA -- how do you want to name the index tablespace define biosql_index=SYMGENE_INDEX -- how to you want to name the LOB tablespace define biosql_lob=SYMGENE_LOB -- what is the name of the role enabling all permissions necessary -- for schema creation define schema_creator=CB_MEMBER -- what shall be name and (initial) pwd of the schema owner define biosql_owner=sgowner define biosql_pwd=sgbio -- the user role (usually read-only, on views) to be created for the schema define biosql_user=sg_user -- the upload-permitted role (INSERT permissions for load API views) to be -- created for the schema define biosql_loader=sg_loader -- the admin-permitted role (INSERT, UPDATE, DELETE on most things) to be -- created for the schema define biosql_admin=sg_admin -- the base role you have for users connecting to the database define base_user=cb_user -- load definitions @BS-defs-local -- 1) login as DBA connect sysdba/password as sysdba -- 2) create the tablespaces @BS-create-tablespaces -- 3) create the schema user @BS-create-schema-user -- 4) Now we're ready to create our own schema. Connect as the schema owner. connect &biosql_owner/&biosql_pwd -- 5) create the schema @BS-DDL -- 6) create the PL/SQL package API and the load API @BS-create-API -- 7) create select-views @BS-create-views -- 8) Security: create roles and synonyms, issue grants @BS-create-roles @BS-create-synonyms @BS-grants -- 9) create additional users --connect &sysdba/&dbapwd as sysdba --@BS-create-users --connect &biosql_owner/&biosql_pwd -- 10) pre-populate database as necessary -- Note: there is a high chance that the seed data is not suitable for you -- or is not exactly what you want. Check out the script and make sure you -- really want the seed data, possibly after editing it, before you uncomment -- the following command. -- --@BS-prepopulate-db ---------------------------- From mark.schreiber at group.novartis.com Tue Aug 24 21:52:43 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Tue Aug 24 21:53:29 2004 Subject: [BioSQL-l] Re: FW: [Biojava-dev] Re: MORE Oracle BioSQL & BioJava problems Message-ID: I would suggest asking Hilmar for a CVS account for the BioSQL repository. Michael Griffith Sent by: biojava-dev-bounces@portal.open-bio.org 08/24/2004 10:55 PM To: , , cc: Michael Heuer Subject: FW: [Biojava-dev] Re: MORE Oracle BioSQL & BioJava problems Hello all, For some reason this message was rejected by the moderator of the bio-sql mail list. I believe it may because I had a zip file attachment of the bio-sql/oracle scripts attached. I was asked by the moderator to resend it. I am doing so, and posting to the bioJava group as well. I would like to help Shepard this problem to resolution, but have had no real experience working on Open source projects to make fixes, etc. Please let me know what else I may do. Cheers! MG ------ Forwarded Message From: Michael Griffith Date: Thu, 19 Aug 2004 16:00:02 -0500 To: Michael Heuer Cc: , Subject: Re: [Biojava-dev] Re: MORE Oracle BioSQL & BioJava problems Michael, The BioSQL DDL is the current version that is in the CVS for bio-sql project. It is also attached here as a zip file. In the create objects section there is a missing table that needs to get created after all the other objects are created: CREATE TABLE term_relationship_term ( term_relationship_id int NOT NULL, term_id int NOT NULL ) In the class org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.java, at line 446, the Prepare statement reads: PreparedStatement create_biosequence = conn.prepareStatement("insert into biosequence " + "(bioentry_id, version, length, seq, alphabet) " + "values(?, ?, ?, ?, ?)"); I think it should be something like: PreparedStatement create_biosequence = conn.prepareStatement("insert into biosequence " + "(bioentry_id, version, length, seq, alphabet) " + "values(?, ?, ?, empty_clob(), ?)"); Then -- I think you can do this with the JDBC clob classes, but I have always used the Oracle JDBC/CLOB classes. I would think you'd want to keep all the oracle stuff segregated out... After your insert is successful in biosequence, you'll want to update the clob table. You have to select the record for update using the newly inserted id. String sqlLock= "SELECT seq FROM biosequence WHERE bioentry_id = ? FOR UPDATE"; String sqlUpdate="UPDATE biosequence SET seq=? where bioentry_id=?"; try { // First get the locator (necessary for Oracle) // // Get newly created record from DB stNew = conn.prepareStatement(sqlLock); stNew.setInt(1, bioentry_id); rsNew = stNew.executeQuery(); rsNew.next(); oracle.sql.CLOB dbClob = (oracle.sql.CLOB) rsNew.getClob(1); pst = conn.prepareStatement(sqlUpdate); dbClob.putString(1, seqstr); pst.setClob(1, dbClob); pst.setInt(2, bioentry_id); result = pst.executeUpdate(); if (result < 0) { conn.rollback(); } else { conn.commit(); } } catch (Exception e) { // Log the error... } finally { try { // Close all resources... // } catch (Exception e) { // Log any errors } } } On 8/18/04 2:49 PM, "Michael Heuer" wrote: > Hello Michael, > > I found a couple of instances of Oracle on which to test, 8.1.7.4.0 on > Solaris and 9.2.0.1.0 on Solaris. > > If you could send me your version of the biosql DDL and a source code > example of what you mean by > >> In all Java/Oracle applications we've developed, we've always inserted >> an empty_clob(), and then updated the clob separately using the record >> locator. > > then I'll give it a go. > > Or alternatively, if you know what changes need to be made to the biojava > codebase, then you just need to do > > $ cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/biojava login > (when prompted, the password is 'cvs') > > $ cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/biojava co > biojava-live > $ cd biojava-live > > (make changes to source code) > > $ ant > (to make sure it builds properly) > > $ cvs diff -u . > patch.txt > (to generate the patch) > > michael > > ------ End of Forwarded Message _______________________________________________ biojava-dev mailing list biojava-dev@biojava.org http://biojava.org/mailman/listinfo/biojava-dev From hlapp at gnf.org Thu Aug 26 20:00:57 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Aug 26 20:01:49 2004 Subject: [BioSQL-l] BioSQL with Oracle 10g In-Reply-To: References: Message-ID: <2C089C6A-F7BC-11D8-8493-000A95AE92B0@gnf.org> On Aug 24, 2004, at 12:01 PM, John Legato wrote: > > I am having trouble installing the BioSQL schema under Oracle 10g. I am > running into permission problems. > > I have modified BS-defs.sql and created BS-defs-local.sql. I have > created > CB_MEMBER,sg_user,sg_loader,sg_admin and cb_user roles, I have not > created > any users as I assumed BS-create-all would take care of that. I have > given biosql_owner SYSDA privs in an attempt to diagnose the problem > but > I am still getting permission errors such as: > > CREATE PUBLIC SYNONYM SGLD_BIOENTRIES FOR SGLD_BIOENTRIES > * > ERROR at line 1: > ORA-01031: insufficient privileges Creating public synonyms needs a special privilege. Execute SQL> GRANT CREATE PUBLIC SYNONYM TO ; as SYS/SYSDBA. (according to your BS-defs-local, SGOWNER is the schema user; if you have granted this user a specific role reserved for users that create and own schemas [which is what CB_MEMBER stands for here], you can also put the name of that role instead. note that you need to re-connect for this to take effect.) > > I am using biosql checked out from CVS as of today. I suspect I am > missing > some permissions I need to set before running BS-create-all, which I am > running as sysdba. I've include BS-defs-local and BS-create-all below. You should not be running this as sysdba. Run it as the designated schema owner. If you uncomment the tablespace creation script then it will connect as SYS first. The command you quoted above should actually read CREATE PUBLIC SYNONYM SGLD_BIOENTRIES FOR SGOWNER.SGLD_BIOENTRIES Check the dynamically created file _syns.lst for whether the schema owner got substituted in here. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Thu Aug 26 20:07:18 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Aug 26 20:07:58 2004 Subject: [BioSQL-l] FW: [Biojava-dev] Re: MORE Oracle BioSQL & BioJava problems In-Reply-To: References: Message-ID: <0F80536E-F7BD-11D8-8493-000A95AE92B0@gnf.org> Michael, sorry for the delay in responding. I'll add the table to biosql. Thanks a lot for collecting this and getting it to work. (I assume it does work now?) As for fixing Biojava, that Matt's/Thomas'/Mark's domain. -hilmar On Aug 24, 2004, at 7:55 AM, Michael Griffith wrote: > Hello all, > > For some reason this message was rejected by the moderator of the > bio-sql > mail list. I believe it may because I had a zip file attachment of the > bio-sql/oracle scripts attached. I was asked by the moderator to > resend it. > > I am doing so, and posting to the bioJava group as well. I would like > to > help Shepard this problem to resolution, but have had no real > experience > working on Open source projects to make fixes, etc. Please let me > know what > else I may do. > > Cheers! > > MG > ------ Forwarded Message > From: Michael Griffith > Date: Thu, 19 Aug 2004 16:00:02 -0500 > To: Michael Heuer > Cc: , > Subject: Re: [Biojava-dev] Re: MORE Oracle BioSQL & BioJava problems > > Michael, > > The BioSQL DDL is the current version that is in the CVS for bio-sql > project. It is also attached here as a zip file. In the create objects > section there is a missing table that needs to get created after all > the > other objects are created: > > CREATE TABLE term_relationship_term ( > term_relationship_id int NOT NULL, > term_id int NOT NULL > ) > > In the class org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.java, at > line > 446, the Prepare statement reads: > > PreparedStatement create_biosequence = conn.prepareStatement("insert > into > biosequence " + "(bioentry_id, version, length, seq, alphabet) " + > "values(?, ?, ?, ?, ?)"); > > I think it should be something like: > > PreparedStatement create_biosequence = conn.prepareStatement("insert > into > biosequence " + "(bioentry_id, version, length, seq, alphabet) " + > "values(?, ?, ?, empty_clob(), ?)"); > > Then -- I think you can do this with the JDBC clob classes, but I have > always used the Oracle JDBC/CLOB classes. I would think you'd want to > keep > all the oracle stuff segregated out... > > After your insert is successful in biosequence, you'll want to update > the > clob table. You have to select the record for update using the newly > inserted id. > > String sqlLock= "SELECT seq FROM biosequence WHERE bioentry_id = ? > FOR > UPDATE"; > String sqlUpdate="UPDATE biosequence SET seq=? where > bioentry_id=?"; > try { > // First get the locator (necessary for Oracle) > // > // Get newly created record from DB > stNew = conn.prepareStatement(sqlLock); > stNew.setInt(1, bioentry_id); > rsNew = stNew.executeQuery(); > rsNew.next(); > > oracle.sql.CLOB dbClob = (oracle.sql.CLOB) > rsNew.getClob(1); > pst = conn.prepareStatement(sqlUpdate); > dbClob.putString(1, seqstr); > pst.setClob(1, dbClob); > pst.setInt(2, bioentry_id); > result = pst.executeUpdate(); > if (result < 0) { > conn.rollback(); > } else { > conn.commit(); > } > } catch (Exception e) { > // Log the error... > } finally { > try { > // Close all resources... > // > } > catch (Exception e) { > // Log any errors > } > } > > } > > > On 8/18/04 2:49 PM, "Michael Heuer" wrote: > >> Hello Michael, >> >> I found a couple of instances of Oracle on which to test, 8.1.7.4.0 on >> Solaris and 9.2.0.1.0 on Solaris. >> >> If you could send me your version of the biosql DDL and a source code >> example of what you mean by >> >>> In all Java/Oracle applications we've developed, we've always >>> inserted >>> an empty_clob(), and then updated the clob separately using the >>> record >>> locator. >> >> then I'll give it a go. >> >> Or alternatively, if you know what changes need to be made to the >> biojava >> codebase, then you just need to do >> >> $ cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/biojava login >> (when prompted, the password is 'cvs') >> >> $ cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/biojava co >> biojava-live >> $ cd biojava-live >> >> (make changes to source code) >> >> $ ant >> (to make sure it builds properly) >> >> $ cvs diff -u . > patch.txt >> (to generate the patch) >> >> michael >> >> > > ------ End of Forwarded Message > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------