From hlapp at gnf.org Fri Apr 1 15:25:55 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Apr 1 15:22:07 2005 Subject: [BioSQL-l] dates and terms In-Reply-To: References: Message-ID: I agree but I'm also a reluctant to expand the 'official' schema, because 1) it would necessarily introduce redundancy, 2) it is solely needed for query optimization, and 3) for most RDBMSs you can probably come up with a non-invasive optimization on top of the official schema that is easy enough to implement, like I showed for Oracle. Actually, depending on the level of individual expertise argument 3) may border on arrogance so I'll retract it immediately. So maybe what could efficiently help people solve this and similar problems is supplementary SQL code in the repository, for instance organized by problem or use case. So for the date problem, there would be SQL scripts for each supported platform that implement a solution without altering the core schema, like the one I suggested for Oracle and augmented with a function-based index DDL so that even if you didn't know PL/SQL and your DBA were yourself you could just run the script and be on your way. BTW doing something like SQL> ALTER TABLE bioentry_qualifier_value ADD (date_value DATE); followed by properly populating the column from the VARCHAR-type value column in my opinion still counts as a non-invasive optimization, because unless someone used SELECT * (which is always a bad idea anyway) this won't break anything. If the RDBMS supports triggers, you can write a trigger that automatically creates and maintains the value of the additional column depending on the value of the value column. And to make the separation tidy and obvious, you could also create a table CREATE TABLE bioentry_date ( bioentry_id INTEGER NOT NULL, term_id INTEGER NOT NULL, rank NUMBER(3,0) NOT NULL, date_value DATE) ) and then use the same method as before to populate and maintain the table automatically. (for brevity I obviously left out UK and FK constraints, but they'd be analogous to bioentry_qualifier_value.) So, bottom line of what I'm saying is that usually if a problem pertains to optimization rather than a model deficiency, there'll be an array of options to solve the problem without altering the model, and so I'll be reluctant to alter the model. If people disagree (or agree) with this view please let me know, it'd be good to know where people generally stand on such questions, and what poses a problem and what doesn't. -hilmar On Mar 31, 2005, at 5:00 PM, mark.schreiber@novartis.com wrote: > Hello - > > I guess this is the nearest approximation to a date field. It might be > something worth considering for a later version of bioSQL as pretty > much > all records have one or more dates attached to them. > > - Mark > > > > > > Hilmar Lapp > 04/01/2005 04:39 AM > > > To: Mark Schreiber/GP/Novartis@PH > cc: biosql-l@open-bio.org > Subject: Re: [BioSQL-l] dates and terms > > > Bioperl-db stores these similarly, but the term is 'date_changed' which > basically comes from Bioperl's Bio::Seq::RichSeq. > > You can compare these dates but it's hard to do so universally for a > search against the database. There is a scriptlet > scripts/biosql/update-on-new-date.pl in the bioperl-db repository that > shows a pretty straightforward approach for comparison. It uses > Date::Parse which does a nice job of detecting most date formats > automatically. > > The formats being used are actually I believe not dramatically > different. In UniProt, they look like the following: > > DT 01-NOV-1995 (Rel. 32, Created) > DT 01-OCT-1996 (Rel. 34, Last sequence update) > DT 28-FEB-2003 (Rel. 41, Last annotation update) > > and these get stored as an array with the following elements > > 01-NOV-1995 (Rel. 32, Created) > 01-OCT-1996 (Rel. 34, Last sequence update) > 28-FEB-2003 (Rel. 41, Last annotation update) > > Date::Parse will just ignore the non-date stuff in parentheses. I don't > know whether there's a similarly convenient library in Java. > > In Oracle you can specify the date format when converting. So, the > following would take everything up to the first space character and > convert it assuming the format used above: > > 1 select to_date(decode(instr('01-NOV-1995 (Rel. 32, Created)',' > '), > 2 0, > 3 '01-NOV-1995 (Rel. 32, Created)', > 4 substr('01-NOV-1995 (Rel. 32, Created)', > 5 1, > 6 instr('01-NOV-1995 (Rel. 32, > Created)', > 7 ' ') > 8 ) > 9 ), > 10 'dd-mon-yyyy') > 11* from dual > SQL> / > > TO_DATE(DECODE(IN > ----------------- > 11/01/95 00:00:00 > > 1 row selected. > > The DECODE() protects from cases when there is nothing following the > date. > > If this looks too messy, hide it in a function: > > 1 CREATE OR REPLACE > 2 FUNCTION biosql_to_date(qual_value IN VARCHAR2, > 3 date_format IN VARCHAR2 DEFAULT > 'dd-mon-yyyy') > 4 RETURN DATE > 5 IS > 6 spacepos INTEGER; > 7 BEGIN > 8 spacepos := INSTR(qual_value,' '); > 9 IF spacepos = 0 THEN > 10 RETURN TO_DATE(qual_value,date_format); > 11 END IF; > 12 RETURN TO_DATE(SUBSTR(qual_value,1,spacepos), > 13 date_format); > 14* END; > SQL> / > > Function created. > > Elapsed: 00:00:00.60 > SQL> select biosql_to_date('01-NOV-1995 (Rel. 32, Created)') from dual; > > BIOSQL_TO_DATE('0 > ----------------- > 11/01/95 00:00:00 > > 1 row selected. > > Elapsed: 00:00:00.01 > SQL> select biosql_to_date('01-NOV-1995') from dual; > > BIOSQL_TO_DATE('0 > ----------------- > 11/01/95 00:00:00 > > 1 row selected. > > Elapsed: 00:00:00.01 > > Obviously, if you query using this or a similar function, the query > optimizer will do a full table scan and not use an index on > bioentry_qualifier_value. However, you can create a function index on > bioentry_qualifier_value.value using the above function, and queries > using the same function will then be indexed. In that case you would > need to make a small amendment to the function above by catching the > exception that results from parsing strings that aren't dates and then > return NULL instead. (Oracle does not index NULLs. Unlike in > PostgreSQL, you cannot have partial indexes in Oracle AFAIK, i.e., in > Oracle the index creation statement cannot contain a WHERE clause.) > > Does this help? > > -hilmar > > On Mar 31, 2005, at 1:07 AM, mark.schreiber@novartis.com wrote: > >> Hello - >> >> Many records that might be stored in BioSQL have associated date >> fields. >> Biojava stores these as value in bioentry_qualifier_value with the >> term_id >> pointing to the Term for date. >> >> This seems to place a serious limitation on searching by date. I would >> like to be able to search for sequences entered between X and Y or >> before >> X etc. Has anyone come up with a workaround for date operations on >> VarChar2 or Strings? >> >> Thanks >> >> Mark Schreiber >> Principal Scientist (Bioinformatics) >> >> Novartis Institute for Tropical Diseases (NITD) >> 10 Biopolis Road >> #05-01 Chromos >> Singapore 138670 >> www.nitd.novartis.com >> >> phone +65 6722 2973 >> fax +65 6722 2910 >> >> >> ______________________________________________________________________ >> The Novartis email address format has changed to >> firstname.lastname@novartis.com. Please update your address book >> accordingly. >> ______________________________________________________________________ >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l@open-bio.org >> http://open-bio.org/mailman/listinfo/biosql-l >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From caritov at gmail.com Mon Apr 4 15:44:01 2005 From: caritov at gmail.com (carito vargas) Date: Mon Apr 4 15:37:49 2005 Subject: [BioSQL-l] Trying to get BioSql Message-ID: <15a9a89705040412443abd1e0@mail.gmail.com> Hello, I want to change our database model (of genetic sequences, involving the proces of annotation and genome sequence) and want to use BioSql. I have understood that BioSql is a schema. How can I get it? It works with BioPerl and MySql? Carito From hollandr at gis.a-star.edu.sg Mon Apr 4 21:10:27 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Apr 4 21:06:43 2005 Subject: [BioSQL-l] Trying to get BioSql Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601950A31@BIONIC.biopolis.one-north.com> You can install the BioSQL schema in Oracle, MySQL, or Postgres, and maybe more I don't know about. There is a website on the subject here: http://obda.open-bio.org/ The schema itself is available here: Currently BioPerl, BioJava and BioPython all have interfaces to BioSQL, although there are a few pecularities that make them not entirely 100% compatible with each other when doing so. Hopefully these will be sorted soon. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biosql-l-bounces@portal.open-bio.org > [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of > carito vargas > Sent: Tuesday, April 05, 2005 3:44 AM > To: biosql-l@open-bio.org > Subject: [BioSQL-l] Trying to get BioSql > > > Hello, > > I want to change our database model (of genetic sequences, involving > the proces of annotation and genome sequence) and want to use BioSql. > I have understood that BioSql is a schema. How can I get it? It works > with BioPerl and MySql? > > Carito > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > From caritov at gmail.com Wed Apr 6 11:44:28 2005 From: caritov at gmail.com (carito vargas) Date: Wed Apr 6 11:38:26 2005 Subject: [BioSQL-l] last version of the schema Message-ID: <15a9a897050406084412322b5e@mail.gmail.com> Hi, I am trying to load a swissprot database and get this error: DBD::mysql::st execute failed: Unknown column 'display_id' in 'field > list' This means - as I read in other posts- that I didn't get the last version of the schema. I download it from http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/sql/biosqldb-mysql.sql?cvsroot=biosql Revision 1.40 and it still doesn't work ... anyone can tell me where can I find a version that works?? Carito Vargas From hlapp at gnf.org Wed Apr 6 14:05:08 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Apr 6 13:58:36 2005 Subject: [BioSQL-l] last version of the schema In-Reply-To: <15a9a897050406084412322b5e@mail.gmail.com> References: <15a9a897050406084412322b5e@mail.gmail.com> Message-ID: <4dc55e7673882455658d8c24dbd3aff9@gnf.org> You may have the right version of the schema (there should indeed be no column display_id), but the wrong version of bioperl-db. Did you download the CVS head from bioperl-db? If you did, did you run the tests and what where the test results? All tests should pass; if they don't then that's not a good sign. -hilmar On Apr 6, 2005, at 8:44 AM, carito vargas wrote: > Hi, > I am trying to load a swissprot database and get this error: > > DBD::mysql::st execute failed: Unknown column 'display_id' in 'field >> list' > > This means - as I read in other posts- that I didn't get the last > version of the schema. I download it from > http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/sql/ > biosqldb-mysql.sql?cvsroot=biosql > Revision 1.40 and it still doesn't work ... anyone can tell me where > can I find a version that works?? > > Carito Vargas > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Wed Apr 6 15:36:48 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Apr 6 15:30:17 2005 Subject: [BioSQL-l] Re: last version of the schema In-Reply-To: <15a9a897050406120925479812@mail.gmail.com> References: <15a9a897050406084412322b5e@mail.gmail.com> <4dc55e7673882455658d8c24dbd3aff9@gnf.org> <15a9a897050406120925479812@mail.gmail.com> Message-ID: On Apr 6, 2005, at 12:09 PM, carito vargas wrote: > I had another question, with the last version of BioSql Schema we are > capable to store diferent types of Annotations? Yes. If you look at the ERD you'll see that there are references, dbxrefs, comments, and ontology terms associated with bioentries. In bioperl, these map to Bio::Annotation::Reference, Bio::Annotation::DBLink, Bio::Annotation::Comment, and Bio::Annotation::OntologyTerm (w/o qualifier value) or Bio::Annotation::SimpleValue (w/ qualifier value). -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Fri Apr 8 19:51:07 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Apr 8 19:44:47 2005 Subject: [BioSQL-l] Re: [Bioperl-l] biosql.html In-Reply-To: References: Message-ID: Thanks a lot Brian. This will help. -hilmar On Apr 7, 2005, at 5:09 AM, Brian Osborne wrote: > Hilmar, > > I've updated biosql.html in biosql-schema. Postgres installation in > Cygwin > was no longer the 2 minute exercise it was a while back but it's still > pretty easy, biosql installation was as easy as ever. > > Brian O. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From ankitson at gmail.com Sun Apr 10 14:04:03 2005 From: ankitson at gmail.com (ankit soni) Date: Sun Apr 10 13:57:42 2005 Subject: [BioSQL-l] getting exon information from genbank files Message-ID: Hi all, I have just started using BioSQL for one of my projects and I have loaded few genbank files in the MySQL database using BioPerl and the standard schema. I wanted to ask how can I get the information about the exons, introns from the database. If I use the following querry I get the start and end position but I am not able to find out what limits(start_pos and end-pos) stand for i.e. gene or exon or intron. mysql> select * from location where seqfeature_id='XXXX'; +-------------+---------------+-----------+---------+-----------+---------+--------+------+ | location_id | seqfeature_id | dbxref_id | term_id | start_pos | end_pos | strand | rank | +-------------+---------------+-----------+---------+-----------+---------+--------+------+ | YYYY | XXXX | NULL | NULL | ABC | EFG | 1 | 1 | +-------------+---------------+-----------+---------+-----------+---------+--------+------+ It would be very helpful if somebody can guide me. I am sorry if I am unable to use the correct biological terms as I know very little of biology. Ankit Soni Junior Undergraduate Dept. of Computer Science IIT kanpur India From s0460205 at sms.ed.ac.uk Mon Apr 11 04:28:32 2005 From: s0460205 at sms.ed.ac.uk (SG Edwards) Date: Mon Apr 11 04:23:45 2005 Subject: [BioSQL-l] Examples of queries - help for beginners!! Message-ID: <1113208112.425a35308186b@sms.ed.ac.uk> Hi everyone, (following on from the post about genbank exons) I would be interested to know if there is somewhere that contains a list of example queries for use with BioSQL, for example, if you want all proteins that were entered before 2000 and have a PDB structure => "do this query". This would be an extremely helpful tool for people who are new to the SQL language (like me!) and would make BioSQL a far more accessible tool. Thanks, Stephen From hlapp at gnf.org Mon Apr 11 14:55:09 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Apr 11 14:48:21 2005 Subject: [BioSQL-l] getting exon information from genbank files In-Reply-To: Message-ID: <3A25F44B-AABB-11D9-81AB-000A959EB4C4@gnf.org> Ankit, the values you're showing in your sample record, did you make them up entirely or is this an actual query result? Note that all columns in the location table are numeric, so it only creates confusion if you choose letters as characters to mask the real values. If they are the real values that you must have changed the schema and not used load_seqdatabase.pl to load records. Note also that generally what's in biosql will closely resemble the object model that was built by the SeqIO bioperl parser run on your input record(s) - provided you used load_seqdatabase.pl to load the record(s). So, what ends up in biosql as the result of loading a genbank record greatly depends on the genbank record itself. As a rule, what the genbank record had in its feature table you'll also find in biosql as a seqfeature record, and what wasn't in the feature table you also won't find in biosql. Introns are usually not annotated in Genbank explicitly, they are only implicit as the region between exons, so unless the genbank record you loaded were exceptions you . How to find exons again depends on the feature table of the original records: some have a single cDNA feature with a composite ('split') location, which will end up in biosql as one seqfeature that has many locations attached. Genomic contigs sometimes have the exons annotated as individual features, and then this is what you'll find in biosql too: one seqfeature per exon, each with a single location. The bottom line is, if you load through load_seqdatabase.pl the content in biosql will closely match the object tree in bioperl - which often times will be close to the data structure of the original input record. Features that weren't there to begin with you won't find magically added. So, to come back to your question, there is no good answer because it greatly depends on what your input was. Most likely though you'll have to impute introns by fetching the locations of the cDNA (or mRNA) feature or the locations of the exon features, order them properly, and then infer introns between consecutive exons. If this is what you need to do all the time I'd write a script that does this in an automated fashion against all newly loaded records and inserts the introns as features back into the database. -hilmar On Sunday, April 10, 2005, at 11:04 AM, ankit soni wrote: > Hi all, > I have just started using BioSQL for one of my projects and I have > loaded few genbank files in the MySQL database using BioPerl and the > standard schema. > I wanted to ask how can I get the information about the exons, introns > from the database. > If I use the following querry I get the start and end position but I > am not able to find out what limits(start_pos and end-pos) stand for > i.e. gene or exon or intron. > mysql> select * from location where seqfeature_id='XXXX'; > +-------------+---------------+-----------+---------+----------- > +---------+--------+------+ > | location_id | seqfeature_id | dbxref_id | term_id | start_pos | > end_pos | strand | rank | > +-------------+---------------+-----------+---------+----------- > +---------+--------+------+ > | YYYY | XXXX | NULL | NULL | ABC | > EFG | 1 | 1 | > +-------------+---------------+-----------+---------+----------- > +---------+--------+------+ > > It would be very helpful if somebody can guide me. > I am sorry if I am unable to use the correct biological terms as I > know very little of biology. > > Ankit Soni > Junior Undergraduate > Dept. of Computer Science > IIT kanpur > India > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Mon Apr 11 14:59:44 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Apr 11 14:52:56 2005 Subject: [BioSQL-l] Examples of queries - help for beginners!! In-Reply-To: <1113208112.425a35308186b@sms.ed.ac.uk> Message-ID: I agree - but the answers aren't necessarily simple. For instance, to take your example, you'd have to write a bioperl-db adaptor first for the structure modules in bioperl to get them serialized to biosql, adapt load_seqdatabase.pl or create a clone that would load structures instead of Bio::Seq's, establish a method that links structures to their protein entries in biosql, and then you could finally look at how to constrain for the date of entry. -hilmar On Monday, April 11, 2005, at 01:28 AM, SG Edwards wrote: > > Hi everyone, > > (following on from the post about genbank exons) I would be interested > to know > if there is somewhere that contains a list of example queries for use > with > BioSQL, for example, if you want all proteins that were entered before > 2000 and > have a PDB structure => "do this query". > > This would be an extremely helpful tool for people who are new to the > SQL > language (like me!) and would make BioSQL a far more accessible tool. > > Thanks, > > Stephen > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From ankitson at gmail.com Tue Apr 12 05:09:38 2005 From: ankitson at gmail.com (ankit soni) Date: Tue Apr 12 05:06:35 2005 Subject: [BioSQL-l] Re: getting exon information from genbank files In-Reply-To: <3A25F44B-AABB-11D9-81AB-000A959EB4C4@gnf.org> References: <3A25F44B-AABB-11D9-81AB-000A959EB4C4@gnf.org> Message-ID: Sorry for the confusion the values were masked they were not actual values . Later I was able to figure out how to do the stuff what I needed. I am developing few example SQL queries which I will post on the list soon. Thanks for helping. Ankit Soni On Mon, 11 Apr 2005 11:55:09 -0700, Hilmar Lapp wrote: > Ankit, the values you're showing in your sample record, did you make > them up entirely or is this an actual query result? > > Note that all columns in the location table are numeric, so it only > creates confusion if you choose letters as characters to mask the real > values. If they are the real values that you must have changed the > schema and not used load_seqdatabase.pl to load records. > > Note also that generally what's in biosql will closely resemble the > object model that was built by the SeqIO bioperl parser run on your > input record(s) - provided you used load_seqdatabase.pl to load the > record(s). So, what ends up in biosql as the result of loading a > genbank record greatly depends on the genbank record itself. As a rule, > what the genbank record had in its feature table you'll also find in > biosql as a seqfeature record, and what wasn't in the feature table you > also won't find in biosql. Introns are usually not annotated in Genbank > explicitly, they are only implicit as the region between exons, so > unless the genbank record you loaded were exceptions you . How to find > exons again depends on the feature table of the original records: some > have a single cDNA feature with a composite ('split') location, which > will end up in biosql as one seqfeature that has many locations > attached. Genomic contigs sometimes have the exons annotated as > individual features, and then this is what you'll find in biosql too: > one seqfeature per exon, each with a single location. > > The bottom line is, if you load through load_seqdatabase.pl the content > in biosql will closely match the object tree in bioperl - which often > times will be close to the data structure of the original input record. > Features that weren't there to begin with you won't find magically > added. > > So, to come back to your question, there is no good answer because it > greatly depends on what your input was. Most likely though you'll have > to impute introns by fetching the locations of the cDNA (or mRNA) > feature or the locations of the exon features, order them properly, and > then infer introns between consecutive exons. > > If this is what you need to do all the time I'd write a script that > does this in an automated fashion against all newly loaded records and > inserts the introns as features back into the database. > > -hilmar > > On Sunday, April 10, 2005, at 11:04 AM, ankit soni wrote: > > > Hi all, > > I have just started using BioSQL for one of my projects and I have > > loaded few genbank files in the MySQL database using BioPerl and the > > standard schema. > > I wanted to ask how can I get the information about the exons, introns > > from the database. > > If I use the following querry I get the start and end position but I > > am not able to find out what limits(start_pos and end-pos) stand for > > i.e. gene or exon or intron. > > mysql> select * from location where seqfeature_id='XXXX'; > > +-------------+---------------+-----------+---------+----------- > > +---------+--------+------+ > > | location_id | seqfeature_id | dbxref_id | term_id | start_pos | > > end_pos | strand | rank | > > +-------------+---------------+-----------+---------+----------- > > +---------+--------+------+ > > | YYYY | XXXX | NULL | NULL | ABC | > > EFG | 1 | 1 | > > +-------------+---------------+-----------+---------+----------- > > +---------+--------+------+ > > > > It would be very helpful if somebody can guide me. > > I am sorry if I am unable to use the correct biological terms as I > > know very little of biology. > > > > Ankit Soni > > Junior Undergraduate > > Dept. of Computer Science > > IIT kanpur > > India > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l@open-bio.org > > http://open-bio.org/mailman/listinfo/biosql-l > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > From caritov at gmail.com Tue Apr 12 10:48:02 2005 From: caritov at gmail.com (carito vargas) Date: Tue Apr 12 10:42:01 2005 Subject: [BioSQL-l] annotation bundle Message-ID: <15a9a897050412074855715f33@mail.gmail.com> Hi, I want to store the important results of the annotation process of a sequence. We already have a Data Base model, but we wanted to study the factibility of migrating it to BioSql Schema. I don't know which tables I should use to store the posible CDS I obtain from a new sequence. Still I don't understand well the use / meaning of the tables of the Annotation Bundle. Carito Vargas From hlapp at gnf.org Tue Apr 12 13:17:55 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Apr 12 13:12:52 2005 Subject: [BioSQL-l] Re: getting exon information from genbank files In-Reply-To: References: <3A25F44B-AABB-11D9-81AB-000A959EB4C4@gnf.org> Message-ID: Thanks. Help is always appreciated and sample queries will surely be helpful to people. -hilmar On Apr 12, 2005, at 2:09 AM, ankit soni wrote: > Sorry for the confusion the values were masked they were not actual > values . > Later I was able to figure out how to do the stuff what I needed. > I am developing few example SQL queries which I will post on the list > soon. > > Thanks for helping. > Ankit Soni > > > > On Mon, 11 Apr 2005 11:55:09 -0700, Hilmar Lapp wrote: >> Ankit, the values you're showing in your sample record, did you make >> them up entirely or is this an actual query result? >> >> Note that all columns in the location table are numeric, so it only >> creates confusion if you choose letters as characters to mask the real >> values. If they are the real values that you must have changed the >> schema and not used load_seqdatabase.pl to load records. >> >> Note also that generally what's in biosql will closely resemble the >> object model that was built by the SeqIO bioperl parser run on your >> input record(s) - provided you used load_seqdatabase.pl to load the >> record(s). So, what ends up in biosql as the result of loading a >> genbank record greatly depends on the genbank record itself. As a >> rule, >> what the genbank record had in its feature table you'll also find in >> biosql as a seqfeature record, and what wasn't in the feature table >> you >> also won't find in biosql. Introns are usually not annotated in >> Genbank >> explicitly, they are only implicit as the region between exons, so >> unless the genbank record you loaded were exceptions you . How to find >> exons again depends on the feature table of the original records: some >> have a single cDNA feature with a composite ('split') location, which >> will end up in biosql as one seqfeature that has many locations >> attached. Genomic contigs sometimes have the exons annotated as >> individual features, and then this is what you'll find in biosql too: >> one seqfeature per exon, each with a single location. >> >> The bottom line is, if you load through load_seqdatabase.pl the >> content >> in biosql will closely match the object tree in bioperl - which often >> times will be close to the data structure of the original input >> record. >> Features that weren't there to begin with you won't find magically >> added. >> >> So, to come back to your question, there is no good answer because it >> greatly depends on what your input was. Most likely though you'll >> have >> to impute introns by fetching the locations of the cDNA (or mRNA) >> feature or the locations of the exon features, order them properly, >> and >> then infer introns between consecutive exons. >> >> If this is what you need to do all the time I'd write a script that >> does this in an automated fashion against all newly loaded records and >> inserts the introns as features back into the database. >> >> -hilmar >> >> On Sunday, April 10, 2005, at 11:04 AM, ankit soni wrote: >> >>> Hi all, >>> I have just started using BioSQL for one of my projects and I have >>> loaded few genbank files in the MySQL database using BioPerl and the >>> standard schema. >>> I wanted to ask how can I get the information about the exons, >>> introns >>> from the database. >>> If I use the following querry I get the start and end position but I >>> am not able to find out what limits(start_pos and end-pos) stand for >>> i.e. gene or exon or intron. >>> mysql> select * from location where seqfeature_id='XXXX'; >>> +-------------+---------------+-----------+---------+----------- >>> +---------+--------+------+ >>> | location_id | seqfeature_id | dbxref_id | term_id | start_pos | >>> end_pos | strand | rank | >>> +-------------+---------------+-----------+---------+----------- >>> +---------+--------+------+ >>> | YYYY | XXXX | NULL | NULL | ABC | >>> EFG | 1 | 1 | >>> +-------------+---------------+-----------+---------+----------- >>> +---------+--------+------+ >>> >>> It would be very helpful if somebody can guide me. >>> I am sorry if I am unable to use the correct biological terms as I >>> know very little of biology. >>> >>> Ankit Soni >>> Junior Undergraduate >>> Dept. of Computer Science >>> IIT kanpur >>> India >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l@open-bio.org >>> http://open-bio.org/mailman/listinfo/biosql-l >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> >> >> -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Tue Apr 12 13:26:35 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Apr 12 13:21:21 2005 Subject: [BioSQL-l] annotation bundle In-Reply-To: <15a9a897050412074855715f33@mail.gmail.com> References: <15a9a897050412074855715f33@mail.gmail.com> Message-ID: From which end are you coming programmatically? That is, do you capture your annotation in the object model of either bioperl, biojava, or biopython, or do you try to insert it directly into the database? I'm asking because if you capture the annotation in one of the supporting object models then updating the database may be done for you if you re-serialize the modified (annotated) object. This is for instance how it will work with bioperl. As for understanding the roles of particular tables in biosql, have you read doc/schema-overview.txt in the biosql download? It was written by Aaron with the 'end-user' in mind, and there is a section on the annotation bundle representation. If this document doesn't help you, could you be specific on what remains unclear and I'll try to answer as best as I can. -hilmar On Apr 12, 2005, at 7:48 AM, carito vargas wrote: > Hi, > > I want to store the important results of the annotation process of a > sequence. We already have a Data Base model, but we wanted to study > the factibility of migrating it to BioSql Schema. I don't know which > tables I should use to store the posible CDS I obtain from a new > sequence. Still I don't understand well the use / meaning of the > tables of the Annotation Bundle. > > Carito Vargas > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Apr 16 16:02:20 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Apr 16 15:55:54 2005 Subject: [BioSQL-l] biosql.org website Message-ID: <711085F1-AEB2-11D9-8911-000A959EB4C4@gmx.net> [for those on biosql-l or others who weren't aware - after the domain has been squatted on for years the OBF 2 days ago finally was able to assume control over the biosql.org domain - thanks Chris for the swift registration and thanks Andrew for noticing availability in the first place] Chris, how can we instate and/or populate the website for www.biosql.org? I suggest that until we have something separate that the domain point to (be synonymous with) obda.open-bio.org. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Apr 16 16:31:19 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Apr 16 16:24:55 2005 Subject: [BioSQL-l] release preparation Message-ID: <7D619058-AEB6-11D9-8911-000A959EB4C4@gmx.net> I've issued this call earlier and I believe have implemented all suggestions. To be sure, please let me know if you have any issues with the schema or instantiation or if you know of any that should be addressed before releasing 1.0. Other than that Brian has updated the PostgreSQL generated ERD HTML document so that everything should be up to date and ready to go. So please let me know and otherwise I'll target release for the end of this month. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Apr 16 16:49:02 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Apr 16 16:42:27 2005 Subject: [BioSQL-l] term synonym Message-ID: There is one small issue with the naming used in the Mysql and PostgreSQL schemas, namely this definition of the term_synonym table: -- ontology terms have synonyms, here is how to store them CREATE TABLE term_synonym ( synonym VARCHAR(255) BINARY NOT NULL, term_id INT(10) UNSIGNED NOT NULL, PRIMARY KEY (term_id,synonym) ) TYPE=INNODB; Synonym is a reserved word in many RDBMSs, so the column synonym may eventually be renamed to name, which is its name already in the HSQL and Oracle versions. Is the Mysql or PostgreSQL term_synonym table used with the current naming anywhere outside of bioperl? Does anybody have an opinion on whether this should be changed now or in a later release only of found necessary? I'm leaning towards leaving it as it is for now but am open if people feel differently. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hollandr at gis.a-star.edu.sg Sun Apr 17 21:07:40 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Sun Apr 17 21:02:39 2005 Subject: [BioSQL-l] release preparation Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601950F8A@BIONIC.biopolis.one-north.com> The only issues I have are with the Oracle installation, which I came across whilst writing the Oracle BioSQL howto at http://www.biojava.org/docs/bj_in_anger/bj_and_bsql_oracle_howto.htm - the issues are mentioned in that article. If they have been resolved or are no longer relevant, then I'd consider it ready for release. However as part of the release I'd really appreciate a document describing exactly what is supposed to be stored in each column/table (just supposed to be - doesn't have to be the way any particular Bio* project actually does it). This would be very helpful in the efforts to unite the various Bio* projects and make them all use the same tables for the same things (which is not always the case at present). cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biosql-l-bounces@portal.open-bio.org > [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Sunday, April 17, 2005 4:31 AM > To: Biosql > Subject: [BioSQL-l] release preparation > > > I've issued this call earlier and I believe have implemented all > suggestions. To be sure, please let me know if you have any > issues with > the schema or instantiation or if you know of any that should be > addressed before releasing 1.0. > > Other than that Brian has updated the PostgreSQL generated ERD HTML > document so that everything should be up to date and ready to go. > > So please let me know and otherwise I'll target release for > the end of > this month. > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > From hlapp at gmx.net Mon Apr 18 01:38:53 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Apr 18 01:32:41 2005 Subject: [BioSQL-l] release preparation In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601950F8A@BIONIC.biopolis.one-north.com> Message-ID: <26861DBC-AFCC-11D9-9FB4-000A959EB4C4@gmx.net> First off, before going through your HowTo document, as for the description of which content is supposed to go where, have you read the doc/schema-overview.txt in the biosql repository? Could you list the questions that that document leaves open? I'd rather expand that document than writing another one from scratch; I thought Aaron did a pretty good job towards your request, but certainly this can improved or spiked with more details or whatever you find it could do better on. Now to the HowTo. BTW is there a reason this should not be included in the distribution? > /BioJava and BioSQL/Oracle HOWTO > > What you'll need > > Bio* > > You'll need the latest version of BioJava to take advantage of the > full functionality of BioSQL. This can be downloaded from biojava.org > . You'll also need the latest Oracle BioSQL schema. Here you have a > choice of two options: > Original : by Hilmar Lapp, the original BioSQL schema takes full > advantage of Oracle's security mechanisms and produces a complex but > high quality schema. You'll need sysdba access to your database to > install it. I'd appreciate if this could be straightened out a bit, as you really do not need sysdba access if you're not going to create tablespaces and users, and not doing these steps is a simple matter of commenting out the respective lines. If you are though then having access to sysdba or access to someone who does (i.e., pair-programming with your DBA for this task) is kind of unavoidable ... Also, the distinction of a 'complex schema' coming out of the original and 'simplified structure' of Len's version sounds a bit too misleading for me, since the schema is no different between either version; there is no difference in number of tables or constraints or whatever (or is there?). What simplified structure might refer to is that Len's version leaves out the PL/SQL packages etc? Again, just as a note, this is trivial to disable in BS-create-all, just comment out the respective steps. As another note, in most Oracle environments an installer will not have sysdba access nor will she be supposed to create tablespaces or users; the DBA will do it for her. In those environments, the scriptlet that does this step will serve merely as an instructional template for the DBA for what to create. I.e., in usual Oracle environments tablespace, user, and role creation will be commented out because the DBA does them (has done them already). > Go to cvs.open-bio.org , select the biosql project, and navigate to > and download the entire biosql-schema/sql/biosql-ora folder. > Simplifed : by Len Trigg, this version is simplified in structure and > sits entirely inside a single user account, requiring no sysdba access > to install. You'll have to ask for a copy of the script from the > biosql-l mailing lists. > Both options are fully functional and compatible with both BioJava and > BioPerl. > > Oracle > > Obviously, you'll need an Oracle database. For the Original schema, > you'll also need sysdba access, or get your DBA to help you if you do > not have this yourself. > For the Simplified schema you just need your own login to Oracle, and > the permissions to create tables. You'll also need to know the > tablespace name to use, ask your DBA. > > Bugfixing > > NOTE: Some of these fixes may already have been made by the time you > read this, so be careful and check they have not already been done! > > Original schema > > Before you do anything else, you'll need to ensure that all the > scripts in the folder refer to the correct local settings file. This > is not always the case, so be careful. The best thing to do is a > global search on all the files you downloaded, and replace all > references to BS-defs with BS-defs-local . I've done this a while ago and think there's no instances left where this hasn't been changed. Please check. > Of course, don't do this in BS-defs.sql itself. > > Now you'll need to find the CREATE TABLE SG_Biosequence statement in > BS-DDL.sql . You'll notice there is a constraint there called > Alphabet4 . The values in the constraint ( dna ,protein etc.) are all > in lower case. BioJava uses upper case values for these fields, but > BioPerl uses lower case! To make it work with BioJava, you'll have to > modify the constraint line so that it reads like this: > CONSTRAINT Alphabet4 > CHECK (lower(Alphabet) IN ('dna', 'protein', 'protein-term', > 'rna')), I've changed this but by enumerating all allowed terms so case-mixing within a term isn't allowed. I haven't included 'protein-term' yet; what is this? Is it necessary? What does it denote? > > This of course will make BioJava work, but will stop BioPerl from > being able to retrieve records correctly as it will not recognise the > upper-case versions of these values. One day hopefuly the two projects > will come up with a resolution to this issue. I've changed this in bioperl-db so that a retrieved alphabet term is converted to lower case. (This doesn't make Biojava work with Bioperl-db-inserted data yet though :-) > > In BS-create-Biosql-usersyns.sql you need to add another command under > the list of set commands at the top. This command should read: > set lines 200 Fixed, thanks for reporting. > What this does is to temporarily increase the maximum length of am > output line in Oracle, whilst it is creating the usersyns.sql script. > If you do not do this, the generated script will contain linebreaks > midway through names of tables, which will cause the script to fail. > > Last of all, unless this has already been fixed in the CVS versions of > BioSQL by the time you read this, there is a section at the end of > BS-grants.sql which grants permissions to the various BioSQL users to > see the SG_User table. The statement currently reads like this: > -- > -- Biosql grants for SG_USER: needs select on all views and synonyms > -- that don't follow the SG% convention. > -- > SELECT 'GRANT SELECT ON ' || object_name || ' TO &biosql_user;' > FROM user_objects > WHERE object_name NOT LIKE 'SG_%' > AND object_name NOT LIKE '%$%' > AND object_name NOT LIKE '%_PK_SEQ' > AND object_type IN ('VIEW','SYNONYM') > ; > You need to comment out the line that reads AND object_name NOT LIKE > '%_PK_SEQ' by putting two dashes ( -- ) before it. This allows the > users to see the sequence required to allow them to generate new > records in the database. Note that the original statement is correct because SG_USER (or whatever you define biosql_user to be) is supposed to be read-only and should never generate new records in the database. SG_LOADER, or whatever you set biosql_loader to be, is for r/w access and should get proper permissions to the sequences. Of course you are free to dispose of the distinction between a read-only and a r/w user for your instance, but I don't think that should be the default ... BTW there is nothing that stops you from defining biosql_user and biosql_loader to the exact same user to achieve this very effect. Let me know if I'm missing something here ... > > Simplified schema > > The only fix to make here is to do with the maximum value allowed in a > bioentry qualifier. Find the statement that creates the table > BioEntry_Qualifier_Value and alter the definition for the VALUE column > so that it has a maximum size of 300. Note that in the standard schema this is a VARCHAR2(4000) meanwhile. > > Installation > > Original schema > > Make sure you have set the $ORACLE_SID environment variable to the > correct database before running the scripts, as they > connect/disconnect several times and if it is not set, you may end up > running them against the wrong database. Again, if the roles, user, and tablespace creation steps are commented out there should be no reconnecting. At least theoretically ... > > The installation requires the creation of three tablespaces - one for > data, one for indexes, one for LOB objects. Again note that there is nothing that stops you from defining all three in BS-defs-local to the same tablespace (or two) which already exist. (If you define them to the same it should exist already as the tablespace creation script does assume that they are different.) I kind of tried to write it such that you can do it 'complicated' if you want and simple if you don't ... maybe I should have pointed that out better. > Decide where you will be keeping the database files for these, and > what you will call the tablespaces. Don't create them yet though, just > write down the names. As always it is good practice to keep the data > and index tablespaces on separate disks to prevent IO bottlenecks, but > you can probably safely put the data and LOB tablespaces on the same > disk. > > You will also need to decide on names for the two basic roles that > BioSQL uses - the base_user role which contains just enough privileges > to connect to the database, and the schema_creator role, which > contains the privileges required to create database objects in a > schema. Again, don't create them just yet. > > Now, copy BS-defs.sql to BS-defs-local.sql and edit it. You should > check every entry in it carefully, particularly the names and > locations of the tablespace files to be created, and the names of the > two roles you just decided on above. You will also choose names for > the various default BioSQL roles. biosql_owner is not a role but the > actual owner of the schema that will have the schema_creator role > granted to it, you'll need to define its password here too. > biosql_user is a role to be granted to people who need read-only > access to the BioSQL database, biosql_loader is a role designed for > batch upload processes, whilst biosql_admin has full read-write > permission on the schema. I guess I need to update the comments here. I ended up never using the biosql_admin role but using the biosql_loader role instead as the r/w user. This is pretty much how permissions are granted. So maybe do I need to include a sample BS-defs-local and BS-create-all with 'simplified' settings? -hilmar > > Once you have edited the BS-defs-local.sql script appropriately, you > need to create the two base roles of base_user and schema_creator > manually. Create them by running something similar to the following > script whilst logged in as sysdba, from inside the biosql-ora > directory: > @BS-defs-local > create role &base_user; > grant > CREATE SESSION, > CREATE SYNONYM, > CREATE VIEW > to &base_user; > create role &schema_creator; > grant > CREATE PROCEDURE, > CREATE ROLE, > CREATE SEQUENCE, > CREATE SESSION, > CREATE SYNONYM, > CREATE TRIGGER, > CREATE TYPE, > CREATE VIEW, > CREATE TABLE > to &schema_creator > with admin option; > > If you want some basic users set up, edit the BS-create-users.sql > script to look at the sample users it will create for you > automatically. If you don't want them, or want different names etc., > comment them out or edit them. > > The final stage before actual installation is to edit the > BS-create-all.sql script to ensure that only the steps you require are > carried out. If you already have predefined tablespaces and don't want > it to create new ones, comment out the line that reads > @BS-create-tablespaces . Likewise if you don't want any default data > loaded into the database, comment out the line near the end that reads > @BS-prepopulate-db . > > Under section 8 of BS-create-all.sql you need to make sure the > following commands appear in the order below. If they appear in any > other order, you will not be able to create other users to access the > database later! The commands should read: > @BS-create-roles > @BS-create-synonyms > @BS-create-Biosql-API2 > @BS-create-Biosql-usersyns > @BS-grants > (NOTE: The BS-create-Biosql-API2 script is an alternative to > BS-create-Biosql-API which works much better with BioJava. This is > because BioJava has no flexibility about column names in tables. The > API2 version of the script ensures that the column names are exactly > the same as what BioJava expects by using synonyms. But, no matter > which you run, everything will still work fine with BioPerl). > > Now, log in to the database as sysdba from inside the biosql-ora > directory. Create the BioSQL database by typing: > @BS-create-all > . You might want to spool the output to see what happens, but you'll > find that half of it doesn't appear in the spool file, because BioSQL > is using spool itself to generate dynamic scripts on the fly. If > you've done everything right, the only messages you should get are a > few Table or view does not exist style messages, referring to the > attempts by the script to drop old objects before recreating new ones. > > During installation you will be prompted for the sysdba username and > password several times. This is required to create tablespaces and > users. > > If something goes wrong, you can safely rerun the script without > dropping anything first as it will drop the database objects from the > previous attempt first. It will however leave behind the tablespaces, > users, and roles. You can always just drop the users and tablespaces > that have been created if it really messes up, and start again from > scratch. > > Now, your database has been installed! The only remaining step is to > log in to each user who will be using BioSQL, and run the usersyns.sql > script that the installation generated for you in the biosql-ora > directory. This script creates the synonyms for the BioSQL objects and > allows the users to see them. This script should not have any errors > at all. If it does, edit it and check it closely for things like > misplaced linebreaks etc. > > Note that Oracle sometimes has issues with roles and does not > apparently grant them correctly. If this happens, you will need to > grant the appropriate roles to the individual users manually (see the > short create role script above) and rerun the usersyns.sql script. > Sometimes you will find they don't even have the appropriate > tablespace quotas on the three BioSQL tablespaces. You'll need to > grant these tablespace quotas using the alter user quota > unlimited on command. > > Simplified schema > > NOTE: You will have to do a global search-and-replace on this script > to replace the two tablespace names with the ones you will actually be > using. Check with your DBA. This version of the schema only has two > tablespaces - one for data, the other for indexes. > > This is much easier to set up than the Original schema. Simply log in > as the user you wish to install BioSQL as, ensure that your DBA has > granted that user the same rights as for the schema_creator role > described in the Original installation instructions above, then > execute the single script that defines the schema. You should have no > problems. You can spool the output to a file if you like to be able to > check the results. > > This schema is a one-user-only schema, where all users log in as the > schema owner and have full read/write access to the entire database. > This is the most important difference between this schema and the > Original . > > Testing > > Any BioJava script should work fine! > > THE END! > > Richard Holland, hollandr at gis dot a-star dot edu dot sg, December > 2004 On Sunday, April 17, 2005, at 06:07 PM, Richard HOLLAND wrote: > The only issues I have are with the Oracle installation, which I came > across whilst writing the Oracle BioSQL howto at > http://www.biojava.org/docs/bj_in_anger/bj_and_bsql_oracle_howto.htm - > the issues are mentioned in that article. If they have been resolved or > are no longer relevant, then I'd consider it ready for release. > > However as part of the release I'd really appreciate a document > describing exactly what is supposed to be stored in each column/table > (just supposed to be - doesn't have to be the way any particular Bio* > project actually does it). This would be very helpful in the efforts to > unite the various Bio* projects and make them all use the same tables > for the same things (which is not always the case at present). > > cheers, > Richard > > Richard Holland > Bioinformatics Specialist > GIS extension 8199 > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its content to any > other person. Thank you. > --------------------------------------------- > > >> -----Original Message----- >> From: biosql-l-bounces@portal.open-bio.org >> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp >> Sent: Sunday, April 17, 2005 4:31 AM >> To: Biosql >> Subject: [BioSQL-l] release preparation >> >> >> I've issued this call earlier and I believe have implemented all >> suggestions. To be sure, please let me know if you have any >> issues with >> the schema or instantiation or if you know of any that should be >> addressed before releasing 1.0. >> >> Other than that Brian has updated the PostgreSQL generated ERD HTML >> document so that everything should be up to date and ready to go. >> >> So please let me know and otherwise I'll target release for >> the end of >> this month. >> >> -hilmar >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l@open-bio.org >> http://open-bio.org/mailman/listinfo/biosql-l >> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hollandr at gis.a-star.edu.sg Mon Apr 18 01:49:15 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Apr 18 01:44:02 2005 Subject: [BioSQL-l] release preparation Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601950FC2@BIONIC.biopolis.one-north.com> I will read schema-overview.txt and see what needs changing, if anything. Do you have a deadline for the release that I should work towards? I don't see why the HowTo shouldn't be included. It went on the BioJava site at the time as that seemed the logical home for it, but it is of course equally at home on the BioSQL site. Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gmx.net] > Sent: Monday, April 18, 2005 1:39 PM > To: Richard HOLLAND > Cc: Biosql > Subject: Re: [BioSQL-l] release preparation > > > First off, before going through your HowTo document, as for the > description of which content is supposed to go where, have > you read the > doc/schema-overview.txt in the biosql repository? Could you list the > questions that that document leaves open? I'd rather expand that > document than writing another one from scratch; I thought Aaron did a > pretty good job towards your request, but certainly this can improved > or spiked with more details or whatever you find it could do > better on. > > Now to the HowTo. BTW is there a reason this should not be > included in > the distribution? > > > /BioJava and BioSQL/Oracle HOWTO > > > > What you'll need > > > > Bio* > > > > You'll need the latest version of BioJava to take advantage of the > > full functionality of BioSQL. This can be downloaded from > biojava.org > > . You'll also need the latest Oracle BioSQL schema. Here you have a > > choice of two options: > > Original : by Hilmar Lapp, the original BioSQL schema takes full > > advantage of Oracle's security mechanisms and produces a > complex but > > high quality schema. You'll need sysdba access to your database to > > install it. > > I'd appreciate if this could be straightened out a bit, as you really > do not need sysdba access if you're not going to create > tablespaces and > users, and not doing these steps is a simple matter of commenting out > the respective lines. > > If you are though then having access to sysdba or access to > someone who > does (i.e., pair-programming with your DBA for this task) is kind of > unavoidable ... > > Also, the distinction of a 'complex schema' coming out of the > original > and 'simplified structure' of Len's version sounds a bit too > misleading > for me, since the schema is no different between either > version; there > is no difference in number of tables or constraints or > whatever (or is > there?). > > What simplified structure might refer to is that Len's version leaves > out the PL/SQL packages etc? Again, just as a note, this is > trivial to > disable in BS-create-all, just comment out the respective steps. > > As another note, in most Oracle environments an installer > will not have > sysdba access nor will she be supposed to create tablespaces > or users; > the DBA will do it for her. In those environments, the scriptlet that > does this step will serve merely as an instructional template for the > DBA for what to create. I.e., in usual Oracle environments > tablespace, > user, and role creation will be commented out because the DBA > does them > (has done them already). > > > Go to cvs.open-bio.org , select the biosql project, and > navigate to > > and download the entire biosql-schema/sql/biosql-ora folder. > > Simplifed : by Len Trigg, this version is simplified in > structure and > > sits entirely inside a single user account, requiring no > sysdba access > > to install. You'll have to ask for a copy of the script from the > > biosql-l mailing lists. > > Both options are fully functional and compatible with both > BioJava and > > BioPerl. > > > > Oracle > > > > Obviously, you'll need an Oracle database. For the Original schema, > > you'll also need sysdba access, or get your DBA to help you > if you do > > not have this yourself. > > For the Simplified schema you just need your own login to > Oracle, and > > the permissions to create tables. You'll also need to know the > > tablespace name to use, ask your DBA. > > > > Bugfixing > > > > NOTE: Some of these fixes may already have been made by the > time you > > read this, so be careful and check they have not already been done! > > > > Original schema > > > > Before you do anything else, you'll need to ensure that all the > > scripts in the folder refer to the correct local settings > file. This > > is not always the case, so be careful. The best thing to do is a > > global search on all the files you downloaded, and replace all > > references to BS-defs with BS-defs-local . > > I've done this a while ago and think there's no instances left where > this hasn't been changed. Please check. > > > Of course, don't do this in BS-defs.sql itself. > > > > Now you'll need to find the CREATE TABLE SG_Biosequence > statement in > > BS-DDL.sql . You'll notice there is a constraint there called > > Alphabet4 . The values in the constraint ( dna ,protein > etc.) are all > > in lower case. BioJava uses upper case values for these fields, but > > BioPerl uses lower case! To make it work with BioJava, > you'll have to > > modify the constraint line so that it reads like this: > > CONSTRAINT Alphabet4 > > CHECK (lower(Alphabet) IN ('dna', 'protein', 'protein-term', > > 'rna')), > > I've changed this but by enumerating all allowed terms so case-mixing > within a term isn't allowed. I haven't included 'protein-term' yet; > what is this? Is it necessary? What does it denote? > > > > > This of course will make BioJava work, but will stop BioPerl from > > being able to retrieve records correctly as it will not > recognise the > > upper-case versions of these values. One day hopefuly the > two projects > > will come up with a resolution to this issue. > > I've changed this in bioperl-db so that a retrieved alphabet term is > converted to lower case. (This doesn't make Biojava work with > Bioperl-db-inserted data yet though :-) > > > > > In BS-create-Biosql-usersyns.sql you need to add another > command under > > the list of set commands at the top. This command should read: > > set lines 200 > > Fixed, thanks for reporting. > > > What this does is to temporarily increase the maximum length of am > > output line in Oracle, whilst it is creating the > usersyns.sql script. > > If you do not do this, the generated script will contain linebreaks > > midway through names of tables, which will cause the script to fail. > > > > Last of all, unless this has already been fixed in the CVS > versions of > > BioSQL by the time you read this, there is a section at the end of > > BS-grants.sql which grants permissions to the various > BioSQL users to > > see the SG_User table. The statement currently reads like this: > > -- > > -- Biosql grants for SG_USER: needs select on all views > and synonyms > > -- that don't follow the SG% convention. > > -- > > SELECT 'GRANT SELECT ON ' || object_name || ' TO &biosql_user;' > > FROM user_objects > > WHERE object_name NOT LIKE 'SG_%' > > AND object_name NOT LIKE '%$%' > > AND object_name NOT LIKE '%_PK_SEQ' > > AND object_type IN ('VIEW','SYNONYM') > > ; > > You need to comment out the line that reads AND > object_name NOT LIKE > > '%_PK_SEQ' by putting two dashes ( -- ) before it. This allows the > > users to see the sequence required to allow them to generate new > > records in the database. > > Note that the original statement is correct because SG_USER (or > whatever you define biosql_user to be) is supposed to be > read-only and > should never generate new records in the database. SG_LOADER, or > whatever you set biosql_loader to be, is for r/w access and > should get > proper permissions to the sequences. > > Of course you are free to dispose of the distinction between a > read-only and a r/w user for your instance, but I don't think that > should be the default ... BTW there is nothing that stops you from > defining biosql_user and biosql_loader to the exact same user to > achieve this very effect. > > Let me know if I'm missing something here ... > > > > > Simplified schema > > > > The only fix to make here is to do with the maximum value > allowed in a > > bioentry qualifier. Find the statement that creates the table > > BioEntry_Qualifier_Value and alter the definition for the > VALUE column > > so that it has a maximum size of 300. > > Note that in the standard schema this is a VARCHAR2(4000) meanwhile. > > > > > Installation > > > > Original schema > > > > Make sure you have set the $ORACLE_SID environment variable to the > > correct database before running the scripts, as they > > connect/disconnect several times and if it is not set, you > may end up > > running them against the wrong database. > > Again, if the roles, user, and tablespace creation steps are > commented > out there should be no reconnecting. At least theoretically ... > > > > > The installation requires the creation of three tablespaces > - one for > > data, one for indexes, one for LOB objects. > > Again note that there is nothing that stops you from defining > all three > in BS-defs-local to the same tablespace (or two) which already exist. > (If you define them to the same it should exist already as the > tablespace creation script does assume that they are different.) > > I kind of tried to write it such that you can do it 'complicated' if > you want and simple if you don't ... maybe I should have pointed that > out better. > > > Decide where you will be keeping the database files for these, and > > what you will call the tablespaces. Don't create them yet > though, just > > write down the names. As always it is good practice to keep > the data > > and index tablespaces on separate disks to prevent IO > bottlenecks, but > > you can probably safely put the data and LOB tablespaces on > the same > > disk. > > > > You will also need to decide on names for the two basic roles that > > BioSQL uses - the base_user role which contains just enough > privileges > > to connect to the database, and the schema_creator role, which > > contains the privileges required to create database objects in a > > schema. Again, don't create them just yet. > > > > Now, copy BS-defs.sql to BS-defs-local.sql and edit it. You should > > check every entry in it carefully, particularly the names and > > locations of the tablespace files to be created, and the > names of the > > two roles you just decided on above. You will also choose names for > > the various default BioSQL roles. biosql_owner is not a > role but the > > actual owner of the schema that will have the schema_creator role > > granted to it, you'll need to define its password here too. > > biosql_user is a role to be granted to people who need read-only > > access to the BioSQL database, biosql_loader is a role designed for > > batch upload processes, whilst biosql_admin has full read-write > > permission on the schema. > > I guess I need to update the comments here. I ended up never > using the > biosql_admin role but using the biosql_loader role instead as the r/w > user. This is pretty much how permissions are granted. > > So maybe do I need to include a sample BS-defs-local and > BS-create-all > with 'simplified' settings? > > -hilmar > > > > > Once you have edited the BS-defs-local.sql script > appropriately, you > > need to create the two base roles of base_user and schema_creator > > manually. Create them by running something similar to the following > > script whilst logged in as sysdba, from inside the biosql-ora > > directory: > > @BS-defs-local > > create role &base_user; > > grant > > CREATE SESSION, > > CREATE SYNONYM, > > CREATE VIEW > > to &base_user; > > create role &schema_creator; > > grant > > CREATE PROCEDURE, > > CREATE ROLE, > > CREATE SEQUENCE, > > CREATE SESSION, > > CREATE SYNONYM, > > CREATE TRIGGER, > > CREATE TYPE, > > CREATE VIEW, > > CREATE TABLE > > to &schema_creator > > with admin option; > > > > If you want some basic users set up, edit the BS-create-users.sql > > script to look at the sample users it will create for you > > automatically. If you don't want them, or want different > names etc., > > comment them out or edit them. > > > > The final stage before actual installation is to edit the > > BS-create-all.sql script to ensure that only the steps you > require are > > carried out. If you already have predefined tablespaces and > don't want > > it to create new ones, comment out the line that reads > > @BS-create-tablespaces . Likewise if you don't want any > default data > > loaded into the database, comment out the line near the end > that reads > > @BS-prepopulate-db . > > > > Under section 8 of BS-create-all.sql you need to make sure the > > following commands appear in the order below. If they appear in any > > other order, you will not be able to create other users to > access the > > database later! The commands should read: > > @BS-create-roles > > @BS-create-synonyms > > @BS-create-Biosql-API2 > > @BS-create-Biosql-usersyns > > @BS-grants > > (NOTE: The BS-create-Biosql-API2 script is an alternative to > > BS-create-Biosql-API which works much better with BioJava. This is > > because BioJava has no flexibility about column names in > tables. The > > API2 version of the script ensures that the column names > are exactly > > the same as what BioJava expects by using synonyms. But, no matter > > which you run, everything will still work fine with BioPerl). > > > > Now, log in to the database as sysdba from inside the biosql-ora > > directory. Create the BioSQL database by typing: > > @BS-create-all > > . You might want to spool the output to see what happens, > but you'll > > find that half of it doesn't appear in the spool file, > because BioSQL > > is using spool itself to generate dynamic scripts on the fly. If > > you've done everything right, the only messages you should > get are a > > few Table or view does not exist style messages, referring to the > > attempts by the script to drop old objects before > recreating new ones. > > > > During installation you will be prompted for the sysdba > username and > > password several times. This is required to create tablespaces and > > users. > > > > If something goes wrong, you can safely rerun the script without > > dropping anything first as it will drop the database > objects from the > > previous attempt first. It will however leave behind the > tablespaces, > > users, and roles. You can always just drop the users and > tablespaces > > that have been created if it really messes up, and start again from > > scratch. > > > > Now, your database has been installed! The only remaining > step is to > > log in to each user who will be using BioSQL, and run the > usersyns.sql > > script that the installation generated for you in the biosql-ora > > directory. This script creates the synonyms for the BioSQL > objects and > > allows the users to see them. This script should not have > any errors > > at all. If it does, edit it and check it closely for things like > > misplaced linebreaks etc. > > > > Note that Oracle sometimes has issues with roles and does not > > apparently grant them correctly. If this happens, you will need to > > grant the appropriate roles to the individual users > manually (see the > > short create role script above) and rerun the usersyns.sql script. > > Sometimes you will find they don't even have the appropriate > > tablespace quotas on the three BioSQL tablespaces. You'll need to > > grant these tablespace quotas using the alter user quota > > unlimited on command. > > > > Simplified schema > > > > NOTE: You will have to do a global search-and-replace on > this script > > to replace the two tablespace names with the ones you will > actually be > > using. Check with your DBA. This version of the schema only has two > > tablespaces - one for data, the other for indexes. > > > > This is much easier to set up than the Original schema. > Simply log in > > as the user you wish to install BioSQL as, ensure that your DBA has > > granted that user the same rights as for the schema_creator role > > described in the Original installation instructions above, then > > execute the single script that defines the schema. You > should have no > > problems. You can spool the output to a file if you like to > be able to > > check the results. > > > > This schema is a one-user-only schema, where all users log > in as the > > schema owner and have full read/write access to the entire > database. > > This is the most important difference between this schema and the > > Original . > > > > Testing > > > > Any BioJava script should work fine! > > > > THE END! > > > > Richard Holland, hollandr at gis dot a-star dot edu dot sg, > December > > 2004 > > On Sunday, April 17, 2005, at 06:07 PM, Richard HOLLAND wrote: > > > The only issues I have are with the Oracle installation, > which I came > > across whilst writing the Oracle BioSQL howto at > > > http://www.biojava.org/docs/bj_in_anger/bj_and_bsql_oracle_howto.htm - > > the issues are mentioned in that article. If they have been > resolved or > > are no longer relevant, then I'd consider it ready for release. > > > > However as part of the release I'd really appreciate a document > > describing exactly what is supposed to be stored in each > column/table > > (just supposed to be - doesn't have to be the way any > particular Bio* > > project actually does it). This would be very helpful in > the efforts to > > unite the various Bio* projects and make them all use the > same tables > > for the same things (which is not always the case at present). > > > > cheers, > > Richard > > > > Richard Holland > > Bioinformatics Specialist > > GIS extension 8199 > > --------------------------------------------- > > This email is confidential and may be privileged. If you are not the > > intended recipient, please delete it and notify us > immediately. Please > > do not copy or use it for any purpose, or disclose its > content to any > > other person. Thank you. > > --------------------------------------------- > > > > > >> -----Original Message----- > >> From: biosql-l-bounces@portal.open-bio.org > >> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of > Hilmar Lapp > >> Sent: Sunday, April 17, 2005 4:31 AM > >> To: Biosql > >> Subject: [BioSQL-l] release preparation > >> > >> > >> I've issued this call earlier and I believe have implemented all > >> suggestions. To be sure, please let me know if you have any > >> issues with > >> the schema or instantiation or if you know of any that should be > >> addressed before releasing 1.0. > >> > >> Other than that Brian has updated the PostgreSQL generated ERD HTML > >> document so that everything should be up to date and ready to go. > >> > >> So please let me know and otherwise I'll target release for > >> the end of > >> this month. > >> > >> -hilmar > >> -- > >> ------------------------------------------------------------- > >> Hilmar Lapp email: lapp at gnf.org > >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >> ------------------------------------------------------------- > >> > >> _______________________________________________ > >> BioSQL-l mailing list > >> BioSQL-l@open-bio.org > >> http://open-bio.org/mailman/listinfo/biosql-l > >> > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > From hlapp at gmx.net Mon Apr 18 02:00:35 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Apr 18 01:53:58 2005 Subject: [BioSQL-l] release preparation In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601950FC2@BIONIC.biopolis.one-north.com> Message-ID: <2E2D45D2-AFCF-11D9-9FB4-000A959EB4C4@gmx.net> On Sunday, April 17, 2005, at 10:49 PM, Richard HOLLAND wrote: > I will read schema-overview.txt and see what needs changing, if > anything. Do you have a deadline for the release that I should work > towards? I'm aiming for the end of this month. Depending on your conclusions we will need more or less time to include more details; if it's more than a few sentences I'll need some time in advance though unless somebody (maybe Aaron?) can share the work. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mark.schreiber at novartis.com Mon Apr 18 02:01:53 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Apr 18 01:55:25 2005 Subject: [BioSQL-l] release preparation Message-ID: Hello Hilmar, As this document is on a site I maintain I should have updated it before now, my bad! Agreed that from a SQL query perspective the schemas are the same, one just has more complexity (if I can call it that) under the hood. I would prefer to keep instructions for the less complex version up for the time being as we are having difficulties getting biojava to work seamlessly with the more complex version. This is almost certainly a failing of biojava for which the oracle support seems to have been compiled against the 'simple' schema not the 'complex schema'. I expect we will soon have biojava supporting your version and we can drop the 'simple' schema. After all, there is not much point using oracle if you don't make use of the features. - Mark Hilmar Lapp Sent by: biosql-l-bounces@portal.open-bio.org 04/18/2005 01:38 PM To: "Richard HOLLAND" cc: Biosql , (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [BioSQL-l] release preparation First off, before going through your HowTo document, as for the description of which content is supposed to go where, have you read the doc/schema-overview.txt in the biosql repository? Could you list the questions that that document leaves open? I'd rather expand that document than writing another one from scratch; I thought Aaron did a pretty good job towards your request, but certainly this can improved or spiked with more details or whatever you find it could do better on. Now to the HowTo. BTW is there a reason this should not be included in the distribution? > /BioJava and BioSQL/Oracle HOWTO > > What you'll need > > Bio* > > You'll need the latest version of BioJava to take advantage of the > full functionality of BioSQL. This can be downloaded from biojava.org > . You'll also need the latest Oracle BioSQL schema. Here you have a > choice of two options: > Original : by Hilmar Lapp, the original BioSQL schema takes full > advantage of Oracle's security mechanisms and produces a complex but > high quality schema. You'll need sysdba access to your database to > install it. I'd appreciate if this could be straightened out a bit, as you really do not need sysdba access if you're not going to create tablespaces and users, and not doing these steps is a simple matter of commenting out the respective lines. If you are though then having access to sysdba or access to someone who does (i.e., pair-programming with your DBA for this task) is kind of unavoidable ... Also, the distinction of a 'complex schema' coming out of the original and 'simplified structure' of Len's version sounds a bit too misleading for me, since the schema is no different between either version; there is no difference in number of tables or constraints or whatever (or is there?). What simplified structure might refer to is that Len's version leaves out the PL/SQL packages etc? Again, just as a note, this is trivial to disable in BS-create-all, just comment out the respective steps. As another note, in most Oracle environments an installer will not have sysdba access nor will she be supposed to create tablespaces or users; the DBA will do it for her. In those environments, the scriptlet that does this step will serve merely as an instructional template for the DBA for what to create. I.e., in usual Oracle environments tablespace, user, and role creation will be commented out because the DBA does them (has done them already). > Go to cvs.open-bio.org , select the biosql project, and navigate to > and download the entire biosql-schema/sql/biosql-ora folder. > Simplifed : by Len Trigg, this version is simplified in structure and > sits entirely inside a single user account, requiring no sysdba access > to install. You'll have to ask for a copy of the script from the > biosql-l mailing lists. > Both options are fully functional and compatible with both BioJava and > BioPerl. > > Oracle > > Obviously, you'll need an Oracle database. For the Original schema, > you'll also need sysdba access, or get your DBA to help you if you do > not have this yourself. > For the Simplified schema you just need your own login to Oracle, and > the permissions to create tables. You'll also need to know the > tablespace name to use, ask your DBA. > > Bugfixing > > NOTE: Some of these fixes may already have been made by the time you > read this, so be careful and check they have not already been done! > > Original schema > > Before you do anything else, you'll need to ensure that all the > scripts in the folder refer to the correct local settings file. This > is not always the case, so be careful. The best thing to do is a > global search on all the files you downloaded, and replace all > references to BS-defs with BS-defs-local . I've done this a while ago and think there's no instances left where this hasn't been changed. Please check. > Of course, don't do this in BS-defs.sql itself. > > Now you'll need to find the CREATE TABLE SG_Biosequence statement in > BS-DDL.sql . You'll notice there is a constraint there called > Alphabet4 . The values in the constraint ( dna ,protein etc.) are all > in lower case. BioJava uses upper case values for these fields, but > BioPerl uses lower case! To make it work with BioJava, you'll have to > modify the constraint line so that it reads like this: > CONSTRAINT Alphabet4 > CHECK (lower(Alphabet) IN ('dna', 'protein', 'protein-term', > 'rna')), I've changed this but by enumerating all allowed terms so case-mixing within a term isn't allowed. I haven't included 'protein-term' yet; what is this? Is it necessary? What does it denote? > > This of course will make BioJava work, but will stop BioPerl from > being able to retrieve records correctly as it will not recognise the > upper-case versions of these values. One day hopefuly the two projects > will come up with a resolution to this issue. I've changed this in bioperl-db so that a retrieved alphabet term is converted to lower case. (This doesn't make Biojava work with Bioperl-db-inserted data yet though :-) > > In BS-create-Biosql-usersyns.sql you need to add another command under > the list of set commands at the top. This command should read: > set lines 200 Fixed, thanks for reporting. > What this does is to temporarily increase the maximum length of am > output line in Oracle, whilst it is creating the usersyns.sql script. > If you do not do this, the generated script will contain linebreaks > midway through names of tables, which will cause the script to fail. > > Last of all, unless this has already been fixed in the CVS versions of > BioSQL by the time you read this, there is a section at the end of > BS-grants.sql which grants permissions to the various BioSQL users to > see the SG_User table. The statement currently reads like this: > -- > -- Biosql grants for SG_USER: needs select on all views and synonyms > -- that don't follow the SG% convention. > -- > SELECT 'GRANT SELECT ON ' || object_name || ' TO &biosql_user;' > FROM user_objects > WHERE object_name NOT LIKE 'SG_%' > AND object_name NOT LIKE '%$%' > AND object_name NOT LIKE '%_PK_SEQ' > AND object_type IN ('VIEW','SYNONYM') > ; > You need to comment out the line that reads AND object_name NOT LIKE > '%_PK_SEQ' by putting two dashes ( -- ) before it. This allows the > users to see the sequence required to allow them to generate new > records in the database. Note that the original statement is correct because SG_USER (or whatever you define biosql_user to be) is supposed to be read-only and should never generate new records in the database. SG_LOADER, or whatever you set biosql_loader to be, is for r/w access and should get proper permissions to the sequences. Of course you are free to dispose of the distinction between a read-only and a r/w user for your instance, but I don't think that should be the default ... BTW there is nothing that stops you from defining biosql_user and biosql_loader to the exact same user to achieve this very effect. Let me know if I'm missing something here ... > > Simplified schema > > The only fix to make here is to do with the maximum value allowed in a > bioentry qualifier. Find the statement that creates the table > BioEntry_Qualifier_Value and alter the definition for the VALUE column > so that it has a maximum size of 300. Note that in the standard schema this is a VARCHAR2(4000) meanwhile. > > Installation > > Original schema > > Make sure you have set the $ORACLE_SID environment variable to the > correct database before running the scripts, as they > connect/disconnect several times and if it is not set, you may end up > running them against the wrong database. Again, if the roles, user, and tablespace creation steps are commented out there should be no reconnecting. At least theoretically ... > > The installation requires the creation of three tablespaces - one for > data, one for indexes, one for LOB objects. Again note that there is nothing that stops you from defining all three in BS-defs-local to the same tablespace (or two) which already exist. (If you define them to the same it should exist already as the tablespace creation script does assume that they are different.) I kind of tried to write it such that you can do it 'complicated' if you want and simple if you don't ... maybe I should have pointed that out better. > Decide where you will be keeping the database files for these, and > what you will call the tablespaces. Don't create them yet though, just > write down the names. As always it is good practice to keep the data > and index tablespaces on separate disks to prevent IO bottlenecks, but > you can probably safely put the data and LOB tablespaces on the same > disk. > > You will also need to decide on names for the two basic roles that > BioSQL uses - the base_user role which contains just enough privileges > to connect to the database, and the schema_creator role, which > contains the privileges required to create database objects in a > schema. Again, don't create them just yet. > > Now, copy BS-defs.sql to BS-defs-local.sql and edit it. You should > check every entry in it carefully, particularly the names and > locations of the tablespace files to be created, and the names of the > two roles you just decided on above. You will also choose names for > the various default BioSQL roles. biosql_owner is not a role but the > actual owner of the schema that will have the schema_creator role > granted to it, you'll need to define its password here too. > biosql_user is a role to be granted to people who need read-only > access to the BioSQL database, biosql_loader is a role designed for > batch upload processes, whilst biosql_admin has full read-write > permission on the schema. I guess I need to update the comments here. I ended up never using the biosql_admin role but using the biosql_loader role instead as the r/w user. This is pretty much how permissions are granted. So maybe do I need to include a sample BS-defs-local and BS-create-all with 'simplified' settings? -hilmar > > Once you have edited the BS-defs-local.sql script appropriately, you > need to create the two base roles of base_user and schema_creator > manually. Create them by running something similar to the following > script whilst logged in as sysdba, from inside the biosql-ora > directory: > @BS-defs-local > create role &base_user; > grant > CREATE SESSION, > CREATE SYNONYM, > CREATE VIEW > to &base_user; > create role &schema_creator; > grant > CREATE PROCEDURE, > CREATE ROLE, > CREATE SEQUENCE, > CREATE SESSION, > CREATE SYNONYM, > CREATE TRIGGER, > CREATE TYPE, > CREATE VIEW, > CREATE TABLE > to &schema_creator > with admin option; > > If you want some basic users set up, edit the BS-create-users.sql > script to look at the sample users it will create for you > automatically. If you don't want them, or want different names etc., > comment them out or edit them. > > The final stage before actual installation is to edit the > BS-create-all.sql script to ensure that only the steps you require are > carried out. If you already have predefined tablespaces and don't want > it to create new ones, comment out the line that reads > @BS-create-tablespaces . Likewise if you don't want any default data > loaded into the database, comment out the line near the end that reads > @BS-prepopulate-db . > > Under section 8 of BS-create-all.sql you need to make sure the > following commands appear in the order below. If they appear in any > other order, you will not be able to create other users to access the > database later! The commands should read: > @BS-create-roles > @BS-create-synonyms > @BS-create-Biosql-API2 > @BS-create-Biosql-usersyns > @BS-grants > (NOTE: The BS-create-Biosql-API2 script is an alternative to > BS-create-Biosql-API which works much better with BioJava. This is > because BioJava has no flexibility about column names in tables. The > API2 version of the script ensures that the column names are exactly > the same as what BioJava expects by using synonyms. But, no matter > which you run, everything will still work fine with BioPerl). > > Now, log in to the database as sysdba from inside the biosql-ora > directory. Create the BioSQL database by typing: > @BS-create-all > . You might want to spool the output to see what happens, but you'll > find that half of it doesn't appear in the spool file, because BioSQL > is using spool itself to generate dynamic scripts on the fly. If > you've done everything right, the only messages you should get are a > few Table or view does not exist style messages, referring to the > attempts by the script to drop old objects before recreating new ones. > > During installation you will be prompted for the sysdba username and > password several times. This is required to create tablespaces and > users. > > If something goes wrong, you can safely rerun the script without > dropping anything first as it will drop the database objects from the > previous attempt first. It will however leave behind the tablespaces, > users, and roles. You can always just drop the users and tablespaces > that have been created if it really messes up, and start again from > scratch. > > Now, your database has been installed! The only remaining step is to > log in to each user who will be using BioSQL, and run the usersyns.sql > script that the installation generated for you in the biosql-ora > directory. This script creates the synonyms for the BioSQL objects and > allows the users to see them. This script should not have any errors > at all. If it does, edit it and check it closely for things like > misplaced linebreaks etc. > > Note that Oracle sometimes has issues with roles and does not > apparently grant them correctly. If this happens, you will need to > grant the appropriate roles to the individual users manually (see the > short create role script above) and rerun the usersyns.sql script. > Sometimes you will find they don't even have the appropriate > tablespace quotas on the three BioSQL tablespaces. You'll need to > grant these tablespace quotas using the alter user quota > unlimited on command. > > Simplified schema > > NOTE: You will have to do a global search-and-replace on this script > to replace the two tablespace names with the ones you will actually be > using. Check with your DBA. This version of the schema only has two > tablespaces - one for data, the other for indexes. > > This is much easier to set up than the Original schema. Simply log in > as the user you wish to install BioSQL as, ensure that your DBA has > granted that user the same rights as for the schema_creator role > described in the Original installation instructions above, then > execute the single script that defines the schema. You should have no > problems. You can spool the output to a file if you like to be able to > check the results. > > This schema is a one-user-only schema, where all users log in as the > schema owner and have full read/write access to the entire database. > This is the most important difference between this schema and the > Original . > > Testing > > Any BioJava script should work fine! > > THE END! > > Richard Holland, hollandr at gis dot a-star dot edu dot sg, December > 2004 On Sunday, April 17, 2005, at 06:07 PM, Richard HOLLAND wrote: > The only issues I have are with the Oracle installation, which I came > across whilst writing the Oracle BioSQL howto at > http://www.biojava.org/docs/bj_in_anger/bj_and_bsql_oracle_howto.htm - > the issues are mentioned in that article. If they have been resolved or > are no longer relevant, then I'd consider it ready for release. > > However as part of the release I'd really appreciate a document > describing exactly what is supposed to be stored in each column/table > (just supposed to be - doesn't have to be the way any particular Bio* > project actually does it). This would be very helpful in the efforts to > unite the various Bio* projects and make them all use the same tables > for the same things (which is not always the case at present). > > cheers, > Richard > > Richard Holland > Bioinformatics Specialist > GIS extension 8199 > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its content to any > other person. Thank you. > --------------------------------------------- > > >> -----Original Message----- >> From: biosql-l-bounces@portal.open-bio.org >> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp >> Sent: Sunday, April 17, 2005 4:31 AM >> To: Biosql >> Subject: [BioSQL-l] release preparation >> >> >> I've issued this call earlier and I believe have implemented all >> suggestions. To be sure, please let me know if you have any >> issues with >> the schema or instantiation or if you know of any that should be >> addressed before releasing 1.0. >> >> Other than that Brian has updated the PostgreSQL generated ERD HTML >> document so that everything should be up to date and ready to go. >> >> So please let me know and otherwise I'll target release for >> the end of >> this month. >> >> -hilmar >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l@open-bio.org >> http://open-bio.org/mailman/listinfo/biosql-l >> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From hlapp at gmx.net Mon Apr 18 02:04:13 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Apr 18 01:57:33 2005 Subject: [BioSQL-l] release preparation In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601950FC2@BIONIC.biopolis.one-north.com> Message-ID: On Sunday, April 17, 2005, at 10:49 PM, Richard HOLLAND wrote: > I don't see why the HowTo shouldn't be included. It went on the BioJava > site at the time as that seemed the logical home for it, but it is of > course equally at home on the BioSQL site. OK. Would you want to make modifications according to my comments, or would you rather that I do this myself? I'd rather have you do it because I haven't used Biojava with Biosql yet and obviously haven't struggled with any of the things you have, so I'm lacking the 'end-user' viewpoint or experience. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hollandr at gis.a-star.edu.sg Mon Apr 18 02:10:32 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Apr 18 02:05:54 2005 Subject: [BioSQL-l] release preparation Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601950FCA@BIONIC.biopolis.one-north.com> I'll make the changes and send the updated version to Mark so that he can update the BioJava in Anger website. From there you can take a copy in whatever format you need. I'll let you know when it is done. Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gmx.net] > Sent: Monday, April 18, 2005 2:04 PM > To: Richard HOLLAND > Cc: Biosql > Subject: Re: [BioSQL-l] release preparation > > > > On Sunday, April 17, 2005, at 10:49 PM, Richard HOLLAND wrote: > > > I don't see why the HowTo shouldn't be included. It went on > the BioJava > > site at the time as that seemed the logical home for it, > but it is of > > course equally at home on the BioSQL site. > > OK. Would you want to make modifications according to my comments, or > would you rather that I do this myself? I'd rather have you do it > because I haven't used Biojava with Biosql yet and obviously haven't > struggled with any of the things you have, so I'm lacking the > 'end-user' viewpoint or experience. > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > From len at reeltwo.com Mon Apr 18 04:58:17 2005 From: len at reeltwo.com (Len Trigg) Date: Mon Apr 18 04:52:37 2005 Subject: [BioSQL-l] release preparation In-Reply-To: References: Message-ID: Mark Schreiber wrote: > now, my bad! Agreed that from a SQL query perspective the schemas are the > same, one just has more complexity (if I can call it that) under the hood. Indeed, the complexity is more to do with the complexity of installing and understanding what's going on in all those files :-) (particularly if you are not an oracle expert and have only been looking at the BioSQL schemas for the other supported databases), and that's why I did the simple version. That's partly confirmed by the fact that the bjia description of how to use the original schema is about 8KB, while the description for the simple schema is about 1KB. I'm all for dumping the simple one if the barrier for entry for the original schema is lowered (maybe it already has been). > I would prefer to keep instructions for the less complex version up for > the time being as we are having difficulties getting biojava to work > seamlessly with the more complex version. This is almost certainly a > failing of biojava for which the oracle support seems to have been > compiled against the 'simple' schema not the 'complex schema'. It certainly was only tested against the simple version, because that's the only schema I had working when I wrote the Oracle support. I am a little surprised that you are having major difficulties though, since the original package has a compatibility layer that (supposedly) presents the same schema as the simple version. > I expect we will soon have biojava supporting your version and we can drop > the 'simple' schema. After all, there is not much point using oracle if > you don't make use of the features. In my case, it was a matter of using Oracle because that was what was already installed :-) Cheers, Len. From mark.schreiber at novartis.com Mon Apr 18 05:08:23 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Apr 18 05:06:59 2005 Subject: [BioSQL-l] release preparation Message-ID: We tracked down the error to the fact that the Hilmar Oracle version (no assertion is made about the complexity) uses CLOBs to store sequence while the Len Oracle (no assertion about simplicity etc) version uses LONGs to store sequence. The biojava support code seems to assume LONGs and strangely until very recently the JDBC oracle dirver seems to let you write LONGs to CLOBs although the data that comes out again is completely munged. It would be possible to modify the biojava adapters to check for LONG or CLOB and behaive appropriately but this would cause lots of maintenance problems later. Given this situtation, unless some one complains very loudly, the biojava oracle adapters will be changed to assume CLOBs. Note: This means if you are using biojava and oracle and biosql now then it will break unless you adopts Hilmar's version. It will not cause any changes to biojava users of MySQL etc. - Mark Len Trigg 04/18/2005 04:58 PM To: Mark Schreiber/GP/Novartis@PH cc: Hilmar Lapp , Biosql Subject: Re: [BioSQL-l] release preparation Mark Schreiber wrote: > now, my bad! Agreed that from a SQL query perspective the schemas are the > same, one just has more complexity (if I can call it that) under the hood. Indeed, the complexity is more to do with the complexity of installing and understanding what's going on in all those files :-) (particularly if you are not an oracle expert and have only been looking at the BioSQL schemas for the other supported databases), and that's why I did the simple version. That's partly confirmed by the fact that the bjia description of how to use the original schema is about 8KB, while the description for the simple schema is about 1KB. I'm all for dumping the simple one if the barrier for entry for the original schema is lowered (maybe it already has been). > I would prefer to keep instructions for the less complex version up for > the time being as we are having difficulties getting biojava to work > seamlessly with the more complex version. This is almost certainly a > failing of biojava for which the oracle support seems to have been > compiled against the 'simple' schema not the 'complex schema'. It certainly was only tested against the simple version, because that's the only schema I had working when I wrote the Oracle support. I am a little surprised that you are having major difficulties though, since the original package has a compatibility layer that (supposedly) presents the same schema as the simple version. > I expect we will soon have biojava supporting your version and we can drop > the 'simple' schema. After all, there is not much point using oracle if > you don't make use of the features. In my case, it was a matter of using Oracle because that was what was already installed :-) Cheers, Len. From mark.schreiber at novartis.com Mon Apr 18 05:08:23 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Apr 18 05:09:03 2005 Subject: [BioSQL-l] release preparation Message-ID: We tracked down the error to the fact that the Hilmar Oracle version (no assertion is made about the complexity) uses CLOBs to store sequence while the Len Oracle (no assertion about simplicity etc) version uses LONGs to store sequence. The biojava support code seems to assume LONGs and strangely until very recently the JDBC oracle dirver seems to let you write LONGs to CLOBs although the data that comes out again is completely munged. It would be possible to modify the biojava adapters to check for LONG or CLOB and behaive appropriately but this would cause lots of maintenance problems later. Given this situtation, unless some one complains very loudly, the biojava oracle adapters will be changed to assume CLOBs. Note: This means if you are using biojava and oracle and biosql now then it will break unless you adopts Hilmar's version. It will not cause any changes to biojava users of MySQL etc. - Mark Len Trigg 04/18/2005 04:58 PM To: Mark Schreiber/GP/Novartis@PH cc: Hilmar Lapp , Biosql Subject: Re: [BioSQL-l] release preparation Mark Schreiber wrote: > now, my bad! Agreed that from a SQL query perspective the schemas are the > same, one just has more complexity (if I can call it that) under the hood. Indeed, the complexity is more to do with the complexity of installing and understanding what's going on in all those files :-) (particularly if you are not an oracle expert and have only been looking at the BioSQL schemas for the other supported databases), and that's why I did the simple version. That's partly confirmed by the fact that the bjia description of how to use the original schema is about 8KB, while the description for the simple schema is about 1KB. I'm all for dumping the simple one if the barrier for entry for the original schema is lowered (maybe it already has been). > I would prefer to keep instructions for the less complex version up for > the time being as we are having difficulties getting biojava to work > seamlessly with the more complex version. This is almost certainly a > failing of biojava for which the oracle support seems to have been > compiled against the 'simple' schema not the 'complex schema'. It certainly was only tested against the simple version, because that's the only schema I had working when I wrote the Oracle support. I am a little surprised that you are having major difficulties though, since the original package has a compatibility layer that (supposedly) presents the same schema as the simple version. > I expect we will soon have biojava supporting your version and we can drop > the 'simple' schema. After all, there is not much point using oracle if > you don't make use of the features. In my case, it was a matter of using Oracle because that was what was already installed :-) Cheers, Len. From hollandr at gis.a-star.edu.sg Mon Apr 18 05:08:41 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Apr 18 05:09:07 2005 Subject: [BioSQL-l] release preparation Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560195100A@BIONIC.biopolis.one-north.com> I looked into this in a bit more detail earlier today and found that, since some version of Oracle around the 9i point in time, the official Oracle JDBC driver API for accessing LOBs in changed. This means that whereas before the same code could be used in BioJava to access both Hilmar's and Len's versions of the database, since the 9i drivers this has no longer been possible, and BioJava only works with Len's version. The problem is due to the way in which Len's schema uses LONG values for biosequence.seq, but Hilmar's uses CLOBs. (The nitty gritty - before 9i, Oracle JDBC allowed you to access both LONG and CLOB columns using getString()/setString() methods to manipulate them. Now, these methods only work with LONG columns, and you have to do fancy tricks to get anything useful into/out of CLOBs). After discussing this with Mark earlier this afternoon, I am planning on changing BioJava to use the new Oracle CLOB API, at which point it will no longer work with schemas set up using Len's version. No change to BioSQL is required. This, from a BioJava point of view, would make the simple schema redundant. I am not sure if there are people in the other Bio* projects who use the simple schema though so we probably can't just drop it. Are there any objections? I have crossposted this to the BioJava list to make sure everyone who might be affected gets a say. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biosql-l-bounces@portal.open-bio.org > [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of Len Trigg > Sent: Monday, April 18, 2005 4:58 PM > To: mark.schreiber@novartis.com > Cc: Hilmar Lapp; Biosql > Subject: Re: [BioSQL-l] release preparation > > > > Mark Schreiber wrote: > > now, my bad! Agreed that from a SQL query perspective the > schemas are the > > same, one just has more complexity (if I can call it that) > under the hood. > > Indeed, the complexity is more to do with the complexity of installing > and understanding what's going on in all those files :-) (particularly > if you are not an oracle expert and have only been looking at the > BioSQL schemas for the other supported databases), and that's why I > did the simple version. That's partly confirmed by the fact that the > bjia description of how to use the original schema is about 8KB, while > the description for the simple schema is about 1KB. I'm all for > dumping the simple one if the barrier for entry for the original > schema is lowered (maybe it already has been). > > > > I would prefer to keep instructions for the less complex > version up for > > the time being as we are having difficulties getting > biojava to work > > seamlessly with the more complex version. This is almost > certainly a > > failing of biojava for which the oracle support seems to have been > > compiled against the 'simple' schema not the 'complex schema'. > > It certainly was only tested against the simple version, because > that's the only schema I had working when I wrote the Oracle support. > I am a little surprised that you are having major difficulties though, > since the original package has a compatibility layer that (supposedly) > presents the same schema as the simple version. > > > > I expect we will soon have biojava supporting your version > and we can drop > > the 'simple' schema. After all, there is not much point > using oracle if > > you don't make use of the features. > > In my case, it was a matter of using Oracle because that was what was > already installed :-) > > > Cheers, > Len. > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > From amackey at pcbi.upenn.edu Mon Apr 18 09:24:48 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Apr 18 09:21:11 2005 Subject: [BioSQL-l] release preparation In-Reply-To: <2E2D45D2-AFCF-11D9-9FB4-000A959EB4C4@gmx.net> References: <2E2D45D2-AFCF-11D9-9FB4-000A959EB4C4@gmx.net> Message-ID: I'm happy to field questions/improvements to the overview (seeing as it's my only real contribution to BioSQL). -Aaron On Apr 18, 2005, at 2:00 AM, Hilmar Lapp wrote: > > On Sunday, April 17, 2005, at 10:49 PM, Richard HOLLAND wrote: > >> I will read schema-overview.txt and see what needs changing, if >> anything. Do you have a deadline for the release that I should work >> towards? > > I'm aiming for the end of this month. Depending on your conclusions we > will need more or less time to include more details; if it's more than > a few sentences I'll need some time in advance though unless somebody > (maybe Aaron?) can share the work. > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From dag at sonsorol.org Mon Apr 18 08:01:13 2005 From: dag at sonsorol.org (Chris Dagdigian) Date: Mon Apr 18 09:43:48 2005 Subject: [BioSQL-l] Re: [Root-l] biosql.org website In-Reply-To: <711085F1-AEB2-11D9-8911-000A959EB4C4@gmx.net> References: <711085F1-AEB2-11D9-8911-000A959EB4C4@gmx.net> Message-ID: <4263A189.7070701@sonsorol.org> Hello Doing it now but it will take a day or so for the DNS changes I had to make to the biosql.org nameserver entry to propagate. Once the nameservers are pointing to ones I control I'll point biosql.org to the odba site. chris Hilmar Lapp wrote: > [for those on biosql-l or others who weren't aware - after the domain > has been squatted on for years the OBF 2 days ago finally was able to > assume control over the biosql.org domain - thanks Chris for the swift > registration and thanks Andrew for noticing availability in the first > place] > > Chris, > > how can we instate and/or populate the website for www.biosql.org? I > suggest that until we have something separate that the domain point to > (be synonymous with) obda.open-bio.org. > > -hilmar -- Chris Dagdigian, BioTeam - Independent life science IT & informatics consulting Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193 PGP KeyID: 83D4310E iChat/AIM: bioteamdag Web: http://bioteam.net From hlapp at gmx.net Mon Apr 18 12:56:20 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Apr 18 12:50:05 2005 Subject: [BioSQL-l] release preparation In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D560195100A@BIONIC.biopolis.one-north.com> Message-ID: CLOB is IMHO actually easier to handle. Also, LONG is really odd to deal with in SQL whereas the Oracle server will nicely on-the-fly convert strings to CLOB and vice versa so long as they are shorter than 4000 chars. Some of the type-generic functions that come with Oracle will not accept LONG but do accept CLOB. Just as another anecdotal piece, the built-in BLAST searcher available in Oracle 10g expects a cursor returning CLOBs, not LONGs. With the java.sql.Clob interface to get at the full value as a string is as simple as Clob clob = resultSet.getClob(); String clobValue = clob.getSubString(0, clob.length()); Inserting a new value in reality is a two-step process: PreparedStatement pst = conn.prepareStatement("INSERT INTO Biosequence (Bioentry_Id, Seq) VALUES (?, EMPTY_CLOB())"); pst.executeUpdate(idValue); pst = conn.prepareStatement("SELECT Seq FROM Biosequence WHERE Bioentry_Id = ? FOR UPDATE", ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_UPDATABLE); ResultSet rs = pst.executeQuery(idValue); Clob clob = rs.getClob(1); clob.setString(0, theSeq); // not sure this is necessary rs.updateClob(1, clob); rs.close(); // don't forget to release lock conn.commit(); I vaguely remember that Len or somebody else from the Biojava crowd had this all figured out? -hilmar On Monday, April 18, 2005, at 02:08 AM, Richard HOLLAND wrote: > I looked into this in a bit more detail earlier today and found that, > since some version of Oracle around the 9i point in time, the official > Oracle JDBC driver API for accessing LOBs in changed. This means that > whereas before the same code could be used in BioJava to access both > Hilmar's and Len's versions of the database, since the 9i drivers this > has no longer been possible, and BioJava only works with Len's version. > The problem is due to the way in which Len's schema uses LONG values > for > biosequence.seq, but Hilmar's uses CLOBs. > > (The nitty gritty - before 9i, Oracle JDBC allowed you to access both > LONG and CLOB columns using getString()/setString() methods to > manipulate them. Now, these methods only work with LONG columns, and > you > have to do fancy tricks to get anything useful into/out of CLOBs). > > After discussing this with Mark earlier this afternoon, I am planning > on > changing BioJava to use the new Oracle CLOB API, at which point it will > no longer work with schemas set up using Len's version. No change to > BioSQL is required. This, from a BioJava point of view, would make the > simple schema redundant. I am not sure if there are people in the other > Bio* projects who use the simple schema though so we probably can't > just > drop it. > > Are there any objections? I have crossposted this to the BioJava list > to > make sure everyone who might be affected gets a say. > > cheers, > Richard > > Richard Holland > Bioinformatics Specialist > GIS extension 8199 > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its content to any > other person. Thank you. > --------------------------------------------- > > >> -----Original Message----- >> From: biosql-l-bounces@portal.open-bio.org >> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of Len Trigg >> Sent: Monday, April 18, 2005 4:58 PM >> To: mark.schreiber@novartis.com >> Cc: Hilmar Lapp; Biosql >> Subject: Re: [BioSQL-l] release preparation >> >> >> >> Mark Schreiber wrote: >>> now, my bad! Agreed that from a SQL query perspective the >> schemas are the >>> same, one just has more complexity (if I can call it that) >> under the hood. >> >> Indeed, the complexity is more to do with the complexity of installing >> and understanding what's going on in all those files :-) (particularly >> if you are not an oracle expert and have only been looking at the >> BioSQL schemas for the other supported databases), and that's why I >> did the simple version. That's partly confirmed by the fact that the >> bjia description of how to use the original schema is about 8KB, while >> the description for the simple schema is about 1KB. I'm all for >> dumping the simple one if the barrier for entry for the original >> schema is lowered (maybe it already has been). >> >> >>> I would prefer to keep instructions for the less complex >> version up for >>> the time being as we are having difficulties getting >> biojava to work >>> seamlessly with the more complex version. This is almost >> certainly a >>> failing of biojava for which the oracle support seems to have been >>> compiled against the 'simple' schema not the 'complex schema'. >> >> It certainly was only tested against the simple version, because >> that's the only schema I had working when I wrote the Oracle support. >> I am a little surprised that you are having major difficulties though, >> since the original package has a compatibility layer that (supposedly) >> presents the same schema as the simple version. >> >> >>> I expect we will soon have biojava supporting your version >> and we can drop >>> the 'simple' schema. After all, there is not much point >> using oracle if >>> you don't make use of the features. >> >> In my case, it was a matter of using Oracle because that was what was >> already installed :-) >> >> >> Cheers, >> Len. >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l@open-bio.org >> http://open-bio.org/mailman/listinfo/biosql-l >> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Mon Apr 18 13:42:16 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Apr 18 13:36:01 2005 Subject: [BioSQL-l] Re: [Root-l] biosql.org website In-Reply-To: <4263A189.7070701@sonsorol.org> Message-ID: <3483A5B8-B031-11D9-83EB-000A959EB4C4@gmx.net> Thanks Chris. -hilmar On Monday, April 18, 2005, at 05:01 AM, Chris Dagdigian wrote: > Hello > > Doing it now but it will take a day or so for the DNS changes I had to > make to the biosql.org nameserver entry to propagate. Once the > nameservers are pointing to ones I control I'll point biosql.org to > the odba site. > > chris > > > Hilmar Lapp wrote: > >> [for those on biosql-l or others who weren't aware - after the domain >> has been squatted on for years the OBF 2 days ago finally was able to >> assume control over the biosql.org domain - thanks Chris for the >> swift registration and thanks Andrew for noticing availability in the >> first place] >> Chris, >> how can we instate and/or populate the website for www.biosql.org? I >> suggest that until we have something separate that the domain point >> to (be synonymous with) obda.open-bio.org. >> -hilmar > > -- > Chris Dagdigian, > BioTeam - Independent life science IT & informatics consulting > Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193 > PGP KeyID: 83D4310E iChat/AIM: bioteamdag Web: http://bioteam.net > _______________________________________________ > Root-l mailing list > Root-l@open-bio.org > http://open-bio.org/mailman/listinfo/root-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Mon Apr 18 13:58:02 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Apr 18 13:51:30 2005 Subject: [BioSQL-l] release preparation In-Reply-To: Message-ID: <6825ADDA-B033-11D9-83EB-000A959EB4C4@gmx.net> Thanks for offering help, it'll certainly help. -hilmar On Monday, April 18, 2005, at 06:24 AM, Aaron J. Mackey wrote: > > I'm happy to field questions/improvements to the overview (seeing as > it's my only real contribution to BioSQL). > > -Aaron > > On Apr 18, 2005, at 2:00 AM, Hilmar Lapp wrote: > >> >> On Sunday, April 17, 2005, at 10:49 PM, Richard HOLLAND wrote: >> >>> I will read schema-overview.txt and see what needs changing, if >>> anything. Do you have a deadline for the release that I should work >>> towards? >> >> I'm aiming for the end of this month. Depending on your conclusions >> we will need more or less time to include more details; if it's more >> than a few sentences I'll need some time in advance though unless >> somebody (maybe Aaron?) can share the work. >> >> -hilmar >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l@open-bio.org >> http://open-bio.org/mailman/listinfo/biosql-l >> >> > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hollandr at gis.a-star.edu.sg Mon Apr 18 21:45:33 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Apr 18 21:41:18 2005 Subject: [BioSQL-l] release preparation Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601951039@BIONIC.biopolis.one-north.com> Don't worry, I do know how to do it, it's just that in the existing BioJava-live code it hasn't been done, and I'll need to be careful to add the usual checks to see if we're using Oracle or not before choosing the appropriate SQL to update biosequence with. CLOBs under 4000 chars are certainly easier, but over 4000 you have to be careful, and there is a bug which prevents clob.getSubstring() working for any position greater than can be described in 16 bits (although I know I experienced this one before, I can't find a reference to it now....). You then have to use the clob's Stream accessors instead, but it's not a problem really. Yes, I know 16-bits (65k bases or so) is huge, but in our current BioSQL all our sequences are around the 10000 base length so the 4000-char limited accessor methods are not an option. Len's suggestion of having table helpers in BioJava to check which version to use and therefore maintain backwards compatibility is a good one. It's slightly more work, but not too much to warrant a major panic attack. I'll let you know when biojava-live has been updated. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gmx.net] > Sent: Tuesday, April 19, 2005 12:56 AM > To: Richard HOLLAND > Cc: Len Trigg; mark.schreiber@novartis.com; Biosql; biojava-list List > Subject: Re: [BioSQL-l] release preparation > > > CLOB is IMHO actually easier to handle. Also, LONG is really odd to > deal with in SQL whereas the Oracle server will nicely on-the-fly > convert strings to CLOB and vice versa so long as they are > shorter than > 4000 chars. Some of the type-generic functions that come with Oracle > will not accept LONG but do accept CLOB. Just as another anecdotal > piece, the built-in BLAST searcher available in Oracle 10g expects a > cursor returning CLOBs, not LONGs. > > With the java.sql.Clob interface to get at the full value as a string > is as simple as > > Clob clob = resultSet.getClob(); > String clobValue = clob.getSubString(0, clob.length()); > > Inserting a new value in reality is a two-step process: > > PreparedStatement pst = conn.prepareStatement("INSERT > INTO Biosequence > (Bioentry_Id, Seq) VALUES (?, EMPTY_CLOB())"); > pst.executeUpdate(idValue); > pst = conn.prepareStatement("SELECT Seq FROM Biosequence WHERE > Bioentry_Id = ? FOR UPDATE", ResultSet.TYPE_FORWARD_ONLY, > ResultSet.CONCUR_UPDATABLE); > > ResultSet rs = pst.executeQuery(idValue); > Clob clob = rs.getClob(1); > clob.setString(0, theSeq); > // not sure this is necessary > rs.updateClob(1, clob); > rs.close(); > // don't forget to release lock > conn.commit(); > > I vaguely remember that Len or somebody else from the Biojava > crowd had > this all figured out? > > -hilmar > > On Monday, April 18, 2005, at 02:08 AM, Richard HOLLAND wrote: > > > I looked into this in a bit more detail earlier today and > found that, > > since some version of Oracle around the 9i point in time, > the official > > Oracle JDBC driver API for accessing LOBs in changed. This > means that > > whereas before the same code could be used in BioJava to access both > > Hilmar's and Len's versions of the database, since the 9i > drivers this > > has no longer been possible, and BioJava only works with > Len's version. > > The problem is due to the way in which Len's schema uses > LONG values > > for > > biosequence.seq, but Hilmar's uses CLOBs. > > > > (The nitty gritty - before 9i, Oracle JDBC allowed you to > access both > > LONG and CLOB columns using getString()/setString() methods to > > manipulate them. Now, these methods only work with LONG > columns, and > > you > > have to do fancy tricks to get anything useful into/out of CLOBs). > > > > After discussing this with Mark earlier this afternoon, I > am planning > > on > > changing BioJava to use the new Oracle CLOB API, at which > point it will > > no longer work with schemas set up using Len's version. No change to > > BioSQL is required. This, from a BioJava point of view, > would make the > > simple schema redundant. I am not sure if there are people > in the other > > Bio* projects who use the simple schema though so we probably can't > > just > > drop it. > > > > Are there any objections? I have crossposted this to the > BioJava list > > to > > make sure everyone who might be affected gets a say. > > > > cheers, > > Richard > > > > Richard Holland > > Bioinformatics Specialist > > GIS extension 8199 > > --------------------------------------------- > > This email is confidential and may be privileged. If you are not the > > intended recipient, please delete it and notify us > immediately. Please > > do not copy or use it for any purpose, or disclose its > content to any > > other person. Thank you. > > --------------------------------------------- > > > > > >> -----Original Message----- > >> From: biosql-l-bounces@portal.open-bio.org > >> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of > Len Trigg > >> Sent: Monday, April 18, 2005 4:58 PM > >> To: mark.schreiber@novartis.com > >> Cc: Hilmar Lapp; Biosql > >> Subject: Re: [BioSQL-l] release preparation > >> > >> > >> > >> Mark Schreiber wrote: > >>> now, my bad! Agreed that from a SQL query perspective the > >> schemas are the > >>> same, one just has more complexity (if I can call it that) > >> under the hood. > >> > >> Indeed, the complexity is more to do with the complexity > of installing > >> and understanding what's going on in all those files :-) > (particularly > >> if you are not an oracle expert and have only been looking at the > >> BioSQL schemas for the other supported databases), and that's why I > >> did the simple version. That's partly confirmed by the > fact that the > >> bjia description of how to use the original schema is > about 8KB, while > >> the description for the simple schema is about 1KB. I'm all for > >> dumping the simple one if the barrier for entry for the original > >> schema is lowered (maybe it already has been). > >> > >> > >>> I would prefer to keep instructions for the less complex > >> version up for > >>> the time being as we are having difficulties getting > >> biojava to work > >>> seamlessly with the more complex version. This is almost > >> certainly a > >>> failing of biojava for which the oracle support seems to have been > >>> compiled against the 'simple' schema not the 'complex schema'. > >> > >> It certainly was only tested against the simple version, because > >> that's the only schema I had working when I wrote the > Oracle support. > >> I am a little surprised that you are having major > difficulties though, > >> since the original package has a compatibility layer that > (supposedly) > >> presents the same schema as the simple version. > >> > >> > >>> I expect we will soon have biojava supporting your version > >> and we can drop > >>> the 'simple' schema. After all, there is not much point > >> using oracle if > >>> you don't make use of the features. > >> > >> In my case, it was a matter of using Oracle because that > was what was > >> already installed :-) > >> > >> > >> Cheers, > >> Len. > >> > >> _______________________________________________ > >> BioSQL-l mailing list > >> BioSQL-l@open-bio.org > >> http://open-bio.org/mailman/listinfo/biosql-l > >> > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > From hlapp at gnf.org Mon Apr 18 22:19:30 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Apr 18 22:13:13 2005 Subject: [BioSQL-l] release preparation In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601951039@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601951039@BIONIC.biopolis.one-north.com> Message-ID: Sounds good. BTW note that 65k is not large at all even on the transcript level; the infamous titin and some similar genes have longer transcripts. Maybe Dengue doesn't have titin but still ... ;) -hilmar On Apr 18, 2005, at 6:45 PM, Richard HOLLAND wrote: > Don't worry, I do know how to do it, it's just that in the existing > BioJava-live code it hasn't been done, and I'll need to be careful to > add the usual checks to see if we're using Oracle or not before > choosing > the appropriate SQL to update biosequence with. > > CLOBs under 4000 chars are certainly easier, but over 4000 you have to > be careful, and there is a bug which prevents clob.getSubstring() > working for any position greater than can be described in 16 bits > (although I know I experienced this one before, I can't find a > reference > to it now....). You then have to use the clob's Stream accessors > instead, but it's not a problem really. Yes, I know 16-bits (65k bases > or so) is huge, but in our current BioSQL all our sequences are around > the 10000 base length so the 4000-char limited accessor methods are not > an option. > > Len's suggestion of having table helpers in BioJava to check which > version to use and therefore maintain backwards compatibility is a good > one. It's slightly more work, but not too much to warrant a major panic > attack. I'll let you know when biojava-live has been updated. > > cheers, > Richard > > Richard Holland > Bioinformatics Specialist > GIS extension 8199 > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its content to any > other person. Thank you. > --------------------------------------------- > > >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp@gmx.net] >> Sent: Tuesday, April 19, 2005 12:56 AM >> To: Richard HOLLAND >> Cc: Len Trigg; mark.schreiber@novartis.com; Biosql; biojava-list List >> Subject: Re: [BioSQL-l] release preparation >> >> >> CLOB is IMHO actually easier to handle. Also, LONG is really odd to >> deal with in SQL whereas the Oracle server will nicely on-the-fly >> convert strings to CLOB and vice versa so long as they are >> shorter than >> 4000 chars. Some of the type-generic functions that come with Oracle >> will not accept LONG but do accept CLOB. Just as another anecdotal >> piece, the built-in BLAST searcher available in Oracle 10g expects a >> cursor returning CLOBs, not LONGs. >> >> With the java.sql.Clob interface to get at the full value as a string >> is as simple as >> >> Clob clob = resultSet.getClob(); >> String clobValue = clob.getSubString(0, clob.length()); >> >> Inserting a new value in reality is a two-step process: >> >> PreparedStatement pst = conn.prepareStatement("INSERT >> INTO Biosequence >> (Bioentry_Id, Seq) VALUES (?, EMPTY_CLOB())"); >> pst.executeUpdate(idValue); >> pst = conn.prepareStatement("SELECT Seq FROM Biosequence WHERE >> Bioentry_Id = ? FOR UPDATE", ResultSet.TYPE_FORWARD_ONLY, >> ResultSet.CONCUR_UPDATABLE); >> >> ResultSet rs = pst.executeQuery(idValue); >> Clob clob = rs.getClob(1); >> clob.setString(0, theSeq); >> // not sure this is necessary >> rs.updateClob(1, clob); >> rs.close(); >> // don't forget to release lock >> conn.commit(); >> >> I vaguely remember that Len or somebody else from the Biojava >> crowd had >> this all figured out? >> >> -hilmar >> >> On Monday, April 18, 2005, at 02:08 AM, Richard HOLLAND wrote: >> >>> I looked into this in a bit more detail earlier today and >> found that, >>> since some version of Oracle around the 9i point in time, >> the official >>> Oracle JDBC driver API for accessing LOBs in changed. This >> means that >>> whereas before the same code could be used in BioJava to access both >>> Hilmar's and Len's versions of the database, since the 9i >> drivers this >>> has no longer been possible, and BioJava only works with >> Len's version. >>> The problem is due to the way in which Len's schema uses >> LONG values >>> for >>> biosequence.seq, but Hilmar's uses CLOBs. >>> >>> (The nitty gritty - before 9i, Oracle JDBC allowed you to >> access both >>> LONG and CLOB columns using getString()/setString() methods to >>> manipulate them. Now, these methods only work with LONG >> columns, and >>> you >>> have to do fancy tricks to get anything useful into/out of CLOBs). >>> >>> After discussing this with Mark earlier this afternoon, I >> am planning >>> on >>> changing BioJava to use the new Oracle CLOB API, at which >> point it will >>> no longer work with schemas set up using Len's version. No change to >>> BioSQL is required. This, from a BioJava point of view, >> would make the >>> simple schema redundant. I am not sure if there are people >> in the other >>> Bio* projects who use the simple schema though so we probably can't >>> just >>> drop it. >>> >>> Are there any objections? I have crossposted this to the >> BioJava list >>> to >>> make sure everyone who might be affected gets a say. >>> >>> cheers, >>> Richard >>> >>> Richard Holland >>> Bioinformatics Specialist >>> GIS extension 8199 >>> --------------------------------------------- >>> This email is confidential and may be privileged. If you are not the >>> intended recipient, please delete it and notify us >> immediately. Please >>> do not copy or use it for any purpose, or disclose its >> content to any >>> other person. Thank you. >>> --------------------------------------------- >>> >>> >>>> -----Original Message----- >>>> From: biosql-l-bounces@portal.open-bio.org >>>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >> Len Trigg >>>> Sent: Monday, April 18, 2005 4:58 PM >>>> To: mark.schreiber@novartis.com >>>> Cc: Hilmar Lapp; Biosql >>>> Subject: Re: [BioSQL-l] release preparation >>>> >>>> >>>> >>>> Mark Schreiber wrote: >>>>> now, my bad! Agreed that from a SQL query perspective the >>>> schemas are the >>>>> same, one just has more complexity (if I can call it that) >>>> under the hood. >>>> >>>> Indeed, the complexity is more to do with the complexity >> of installing >>>> and understanding what's going on in all those files :-) >> (particularly >>>> if you are not an oracle expert and have only been looking at the >>>> BioSQL schemas for the other supported databases), and that's why I >>>> did the simple version. That's partly confirmed by the >> fact that the >>>> bjia description of how to use the original schema is >> about 8KB, while >>>> the description for the simple schema is about 1KB. I'm all for >>>> dumping the simple one if the barrier for entry for the original >>>> schema is lowered (maybe it already has been). >>>> >>>> >>>>> I would prefer to keep instructions for the less complex >>>> version up for >>>>> the time being as we are having difficulties getting >>>> biojava to work >>>>> seamlessly with the more complex version. This is almost >>>> certainly a >>>>> failing of biojava for which the oracle support seems to have been >>>>> compiled against the 'simple' schema not the 'complex schema'. >>>> >>>> It certainly was only tested against the simple version, because >>>> that's the only schema I had working when I wrote the >> Oracle support. >>>> I am a little surprised that you are having major >> difficulties though, >>>> since the original package has a compatibility layer that >> (supposedly) >>>> presents the same schema as the simple version. >>>> >>>> >>>>> I expect we will soon have biojava supporting your version >>>> and we can drop >>>>> the 'simple' schema. After all, there is not much point >>>> using oracle if >>>>> you don't make use of the features. >>>> >>>> In my case, it was a matter of using Oracle because that >> was what was >>>> already installed :-) >>>> >>>> >>>> Cheers, >>>> Len. >>>> >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l@open-bio.org >>>> http://open-bio.org/mailman/listinfo/biosql-l >>>> >>> >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> >> > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From dlondon at ebi.ac.uk Tue Apr 19 05:16:30 2005 From: dlondon at ebi.ac.uk (Darin London) Date: Tue Apr 19 07:12:13 2005 Subject: [BioSQL-l] Re: BOSC 2005 In-Reply-To: <20050120175859.GA7254@parrot.ebi.ac.uk> References: <20050120175859.GA7254@parrot.ebi.ac.uk> Message-ID: <20050419091628.GN17377@parrot.ebi.ac.uk> {Please pass the word!} SECOND CALL FOR SPEAKERS The 6th annual Bioinformatics Open Source Conference (BOSC'2005) is organized by the not-for-profit Open Bioinformatics Foundation. The meeting will take place June 23-24, 2005 in Detroit, Michigan, USA, and is one of several Special Interest Group (SIG) meetings occurring in conjunction with the 13th International Conference on Intelligent Systems for Molecular Biology. see http://www.iscb.org/ismb2005 for more information. Because of the power of many Open Source bioinformatics packages in use by the Research Community today, it is not too presumptuous to say that the work of the Open Source Bioinformatics Community represents the cutting edge of Bioinformatics in general. This has been repeatedly demonstrated by the quality of presentations at previous BOSC conferences. This year, at BOSC 2005, we want to continue this tradition of excellence, while presenting this message to a wider part of the Research Community. Please, pass this message on to anyone you know that is interested in Bioinformatics software. BOSC PROGRAM & CONTACT INFO * Web: http://www.open-bio.org/bosc2005/ * Online Registration: https://www.cteusa.com/iscb4/ * Email: bosc@open-bio.org FEES * Corporate : $195 ($245 after May 16th) * Academic : $170 ($220 after May 16th) * Student : $145 ($195 after May 16th) SPEAKERS & ABSTRACTS WANTED The program committee is currently seeking abstracts for talks at BOSC 2005. BOSC is a great opportunity for you to tell the community about your use, development, or philosophy of open source software development in bioinformatics. The committee will select several submitted abstracts for 25-minute talks and others for shorter "lightning" talks. Accepted abstracts will be published on the BOSC web site. If you are interested in speaking at BOSC 2005, please send us before April 26, 2005: * an abstract (no more than a few paragraphs) * a URL for the project page, if applicable * information about the open source license used for your software or your release plans. Abstracts will be accepted for submission until April 26, 2005. Abstracts chosen for presentation will be announced May 12, 2005 (before the ISMB Early Registration Deadline). LIGHTNING-TALK SPEAKERS WANTED! The program committee is currently seeking speakers for the lightning talks at BOSC 2005. Lightning talks are quick - only five minutes long - and a great opportunity for you to give people a quick summary of your open source project, code, idea, or vision of the future. If you are interested in giving a lightning talk at BOSC 2005, please send us: * a brief title and summary (one or two lines) * a URL for the project page, if applicable * information about the open source license used for your software or your release plans. We will accept entries on-line until BOSC starts, but space for demos and lightning talks is limited.
I've committed some changes to biojava-live which make BioJava compatible with BioSQL when the latter is running on Oracle 9i or greater and using the official schema as per the biosql-schema CVS. This involved adding an autodetect function to detect whether Clobs were used in biosequence or not, and then creating code to work with Clobs where necessary. Two extensions to OracleDBHelper might be useful in other tasks - stringToClob() and clobToString(). I also made some changes to the Ontology part of the BioJava/BioSQL interface, which was not persisting Triples correctly. It would attempt to reference the triple by its unique ID before actually giving it one, which of course fails. This should now be fixed. cheers, Richard Richard Holland Bioinformatics Specialist Genome Institute of Singapore 60 Biopolis Street, #02-01 Genome, Singapore 138672 Tel: (65) 6478 8000 DID: (65) 6478 8199 Email: hollandr@gis.a-star.edu.sg --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- From hollandr at gis.a-star.edu.sg Wed Apr 27 02:57:02 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Wed Apr 27 02:51:39 2005 Subject: [BioSQL-l] BioSQL documentation Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56019513F2@BIONIC.biopolis.one-north.com> Hilmar, I read through the doc/schema_overview.txt and it looks fine, nothing has changed much since it was written. It's fine to leave it as it is. Now that BioJava will play nicely with the Clobs in the official BioSQL schema for Oracle, I will rewrite the BioJava/BioSQL/Oracle HowTo and remove references to Len's schema as it is no longer relevant. The official schema will now function perfectly well with BioJava out-of-the-box (but only if you are using biojava-live, for now, until the change gets into the main release branch). I will post the URL to this list when it is complete and updated. Mark Schreiber and I have asked if we might attend the Open Bio Hackathon this year. If we are accepted, one of our projects is to get all the Bio* projects to play nicely with BioSQL and store various bits of information in the same columns of the same tables as each other. If this does not happen, we still intend to do it, but it might take longer. If you or anyone else working with BioSQL interfaces in the Bio* projects will also be attending then we'd love to work with you on this. There are three stages: (1) identify where things should be going for all the common data formats (Genbank, Swissprot, plain fasta etc.), then (2) identify where they are actually going at the moment when loaded into BioSQL by the various Bio* projects, and finally (3) modify the various Bio* projects to use the correct locations (and hopefully retain checks for backwards compatibility so that if they can't find that information in its correct location, they'll check the old one just in case). Hopefully that's not too much work for a small group of people to finish together in a couple of days. I was wondering if it would be a good idea to delay the official BioSQL 1.0 release until after the above standardisations have taken place. Then we can include in the distribution a document detailing exactly what goes where when loading various data formats, both for reference and for the guidance of future projects not yet written. cheers, Richard Richard Holland Bioinformatics Specialist Genome Institute of Singapore 60 Biopolis Street, #02-01 Genome, Singapore 138672 Tel: (65) 6478 8000 DID: (65) 6478 8199 Email: hollandr@gis.a-star.edu.sg --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- From mark.schreiber at novartis.com Wed Apr 27 05:32:35 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Apr 27 05:26:53 2005 Subject: [BioSQL-l] BioSQL documentation Message-ID: I don't know if this means there cannot be a 1.0 release. The BioSQL 1.0 will be a standard. It's up to the bio* projects to play well with it. - Mark "Richard HOLLAND" Sent by: biosql-l-bounces@portal.open-bio.org 04/27/2005 02:57 PM To: "Hilmar Lapp" cc: biosql-l@open-bio.org, (bcc: Mark Schreiber/GP/Novartis) Subject: [BioSQL-l] BioSQL documentation Hilmar, I read through the doc/schema_overview.txt and it looks fine, nothing has changed much since it was written. It's fine to leave it as it is. Now that BioJava will play nicely with the Clobs in the official BioSQL schema for Oracle, I will rewrite the BioJava/BioSQL/Oracle HowTo and remove references to Len's schema as it is no longer relevant. The official schema will now function perfectly well with BioJava out-of-the-box (but only if you are using biojava-live, for now, until the change gets into the main release branch). I will post the URL to this list when it is complete and updated. Mark Schreiber and I have asked if we might attend the Open Bio Hackathon this year. If we are accepted, one of our projects is to get all the Bio* projects to play nicely with BioSQL and store various bits of information in the same columns of the same tables as each other. If this does not happen, we still intend to do it, but it might take longer. If you or anyone else working with BioSQL interfaces in the Bio* projects will also be attending then we'd love to work with you on this. There are three stages: (1) identify where things should be going for all the common data formats (Genbank, Swissprot, plain fasta etc.), then (2) identify where they are actually going at the moment when loaded into BioSQL by the various Bio* projects, and finally (3) modify the various Bio* projects to use the correct locations (and hopefully retain checks for backwards compatibility so that if they can't find that information in its correct location, they'll check the old one just in case).! Hopefully that's not too much work for a small group of people to finish together in a couple of days. I was wondering if it would be a good idea to delay the official BioSQL 1.0 release until after the above standardisations have taken place. Then we can include in the distribution a document detailing exactly what goes where when loading various data formats, both for reference and for the guidance of future projects not yet written. cheers, Richard Richard Holland Bioinformatics Specialist Genome Institute of Singapore 60 Biopolis Street, #02-01 Genome, Singapore 138672 Tel: (65) 6478 8000 DID: (65) 6478 8199 Email: hollandr@gis.a-star.edu.sg --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From hollandr at gis.a-star.edu.sg Wed Apr 27 05:34:27 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Wed Apr 27 05:30:48 2005 Subject: [BioSQL-l] BioSQL documentation Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601951412@BIONIC.biopolis.one-north.com> I just think the BioSQL 1.0 standard should include a reference as to the 'official' way to store the different bits of various file formats within the schema, which all apps talking to BioSQL can be expected to comply with (and hence behave well with each other's data). Richard. Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: mark.schreiber@novartis.com > [mailto:mark.schreiber@novartis.com] > Sent: Wednesday, April 27, 2005 5:33 PM > To: Richard HOLLAND > Cc: biosql-l@open-bio.org; > biosql-l-bounces@portal.open-bio.org; Hilmar Lapp > Subject: Re: [BioSQL-l] BioSQL documentation > > > I don't know if this means there cannot be a 1.0 release. The > BioSQL 1.0 > will be a standard. It's up to the bio* projects to play well with it. > > - Mark > > > > > > "Richard HOLLAND" > Sent by: biosql-l-bounces@portal.open-bio.org > 04/27/2005 02:57 PM > > > To: "Hilmar Lapp" > cc: biosql-l@open-bio.org, (bcc: Mark > Schreiber/GP/Novartis) > Subject: [BioSQL-l] BioSQL documentation > > > Hilmar, > > I read through the doc/schema_overview.txt and it looks fine, > nothing has > changed much since it was written. It's fine to leave it as it is. > > Now that BioJava will play nicely with the Clobs in the > official BioSQL > schema for Oracle, I will rewrite the BioJava/BioSQL/Oracle HowTo and > remove references to Len's schema as it is no longer relevant. The > official schema will now function perfectly well with BioJava > out-of-the-box (but only if you are using biojava-live, for > now, until the > change gets into the main release branch). I will post the > URL to this > list when it is complete and updated. > > Mark Schreiber and I have asked if we might attend the Open > Bio Hackathon > this year. If we are accepted, one of our projects is to get > all the Bio* > projects to play nicely with BioSQL and store various bits of > information > in the same columns of the same tables as each other. If this > does not > happen, we still intend to do it, but it might take longer. If you or > anyone else working with BioSQL interfaces in the Bio* > projects will also > be attending then we'd love to work with you on this. There are three > stages: (1) identify where things should be going for all the > common data > formats (Genbank, Swissprot, plain fasta etc.), then (2) > identify where > they are actually going at the moment when loaded into BioSQL by the > various Bio* projects, and finally (3) modify the various > Bio* projects to > use the correct locations (and hopefully retain checks for backwards > compatibility so that if they can't find that information in > its correct > location, they'll check the old one just in case).! > Hopefully that's not too much work for a small group of > people to finish > together in a couple of days. > > I was wondering if it would be a good idea to delay the > official BioSQL > 1.0 release until after the above standardisations have taken > place. Then > we can include in the distribution a document detailing > exactly what goes > where when loading various data formats, both for reference > and for the > guidance of future projects not yet written. > > cheers, > Richard > > Richard Holland > Bioinformatics Specialist > Genome Institute of Singapore > 60 Biopolis Street, #02-01 Genome, Singapore 138672 > Tel: (65) 6478 8000 DID: (65) 6478 8199 > Email: hollandr@gis.a-star.edu.sg > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us > immediately. Please do > not copy or use it for any purpose, or disclose its content > to any other > person. Thank you. > --------------------------------------------- > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > > > > From boehme at mpiib-berlin.mpg.de Wed Apr 27 11:27:09 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Wed Apr 27 11:20:34 2005 Subject: [BioSQL-l] location in features Message-ID: <426FAF4D.3000700@mpiib-berlin.mpg.de> Hi, I can't get rid of this exception: org.biojava.bio.BioRuntimeException: BioSQL SeqFeature doesn't have any associated location spans. seqfeature_id=148 Can anybody help me? put the sequence in: Sequence seq = DNATools.createDNASequence(sequence, "AF100928"); Feature.Template templSeq = new Feature.Template(); templSeq.source = "ncbi"; templSeq.type = "gen"; templSeq.location = Location.empty; seq.createFeature(templSeq); db.addSequence(seq); get it out: seq = db.getSequence("AF100928"); System.out.println(seq.getName() + " contains " + seq.countFeatures() + " features"); seq.getName() works fine, but the seq doesn't have any features, but I can see them in the db. What am I missing here? Martina From hlapp at gnf.org Wed Apr 27 14:51:20 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Apr 27 14:44:06 2005 Subject: [BioSQL-l] location in features In-Reply-To: <426FAF4D.3000700@mpiib-berlin.mpg.de> References: <426FAF4D.3000700@mpiib-berlin.mpg.de> Message-ID: <3d10e113e0e7e6e9e28d31242cc83c40@gnf.org> Hi Martina, people on the biojava mailing list will probably be better able to help you out. Also, Richard and Mark have been working on getting Biojava interoperate better with the standard biosql schema. They may know better where your issue is coming from. -hilmar On Apr 27, 2005, at 8:27 AM, Martina wrote: > Hi, > I can't get rid of this exception: > org.biojava.bio.BioRuntimeException: BioSQL SeqFeature doesn't have > any associated location spans. seqfeature_id=148 > > Can anybody help me? > > put the sequence in: > > Sequence seq = DNATools.createDNASequence(sequence, "AF100928"); > Feature.Template templSeq = new Feature.Template(); > templSeq.source = "ncbi"; > templSeq.type = "gen"; > templSeq.location = Location.empty; > seq.createFeature(templSeq); > db.addSequence(seq); > > get it out: > seq = db.getSequence("AF100928"); > System.out.println(seq.getName() + " contains " + seq.countFeatures() > + " features"); > > seq.getName() works fine, but the seq doesn't have any features, but I > can see them in the db. > > What am I missing here? > > Martina > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hollandr at gis.a-star.edu.sg Wed Apr 27 23:41:42 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Wed Apr 27 23:36:21 2005 Subject: [BioSQL-l] location in features Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601951467@BIONIC.biopolis.one-north.com> Hullo Martina. I must admit I am confused. I have been using BioJava+BioSQL to load Genbank records with features with no trouble, they always come out again with no exceptions raised and none missing. I am using Oracle, but this shouldn't make a difference as the SQL code that looks for features is the same for all database types at present. Can I ask what database type you are using (MySQL, Oracle etc.), and the versions of BioJava and BioSQL you have? I'd also suggest downloading biojava-live from CVS, if you haven't done so already, and trying that to see if someone has already fixed your problem. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gnf.org] > Sent: Thursday, April 28, 2005 2:51 AM > To: Martina > Cc: OBDA BioSQL; BioJava; Richard HOLLAND; Mark Schreiber > Subject: Re: [BioSQL-l] location in features > > > Hi Martina, people on the biojava mailing list will probably > be better > able to help you out. Also, Richard and Mark have been working on > getting Biojava interoperate better with the standard biosql schema. > They may know better where your issue is coming from. > > -hilmar > > On Apr 27, 2005, at 8:27 AM, Martina wrote: > > > Hi, > > I can't get rid of this exception: > > org.biojava.bio.BioRuntimeException: BioSQL SeqFeature doesn't have > > any associated location spans. seqfeature_id=148 > > > > Can anybody help me? > > > > put the sequence in: > > > > Sequence seq = DNATools.createDNASequence(sequence, "AF100928"); > > Feature.Template templSeq = new Feature.Template(); > > templSeq.source = "ncbi"; > > templSeq.type = "gen"; > > templSeq.location = Location.empty; > > seq.createFeature(templSeq); > > db.addSequence(seq); > > > > get it out: > > seq = db.getSequence("AF100928"); > > System.out.println(seq.getName() + " contains " + > seq.countFeatures() > > + " features"); > > > > seq.getName() works fine, but the seq doesn't have any > features, but I > > can see them in the db. > > > > What am I missing here? > > > > Martina > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l@open-bio.org > > http://open-bio.org/mailman/listinfo/biosql-l > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > From suzi at fruitfly.org Wed Apr 20 18:42:29 2005 From: suzi at fruitfly.org (Suzanna Lewis) Date: Mon May 2 09:10:24 2005 Subject: [BioSQL-l] Hackathon 2005 Message-ID: (sorry for multiple postings, but please do forward to anyone else who you think might be interested) ------------------------------------------------------------------------ ----------- Dear everyone, It has been a long time and we Bioinformatics devotees are overdue for another total-immersion coding-fest (the last hackathon was held in Singapore February 2003, more than two years ago). Apple has offered to host us this year, and as an added bonus include free admission to the World-Wide Developers Conference in San Francisco the prior week. They are also looking for some people to present interesting new developments at the WWDC, so if you have something noteworthy please let us know. Apple is not attaching any strings, so our work need not address Apple-specific software or hardware areas. Apple will provide space and hardware (and access to their engineers if we'd like). Week 1 (June 6-10) would be spent at the WWDC. Week 2 (June 12-16) would be in Cupertino, at Apple's headquarters. We're free to focus on what interests us, our tentative plans include: 1. Bio-ontologies software 2. High-performance computing (e.g. large scale computations, optimization) 3. Image analysis 4. Documentation 5. Anything else that may interest you Our plan is to organize this much as the Aspen Center for Physics computational biology workshops were organized (for those old enough to remember): A couple of presentations to start the day; collaboration and coding afterwards; time for a bit of fun (does anyone else cycle?), and discussions in the late afternoons and evenings. Would everyone who is interested in attending please send us a short description of what you would like to do, and perhaps other people who you would like to work with. There is somewhat limited space, so we will try to prioritize groups that have a clear focus and a need to interact. We now this is very short notice, but we hope that there will be enough interest to make it possible. We are looking into additional funding support, to pay for travel expenses, but this is still to be decided. Looking forward to hearing from everyone. George, Cyrus, Steve, and Suzanna (the Bay Area locals)