[BioSQL-l] release preparation

mark.schreiber at novartis.com mark.schreiber at novartis.com
Mon Apr 18 02:01:53 EDT 2005


Hello Hilmar,

As this document is on a site I maintain I should have updated it before 
now, my bad! Agreed that from a SQL query perspective the schemas are the 
same, one just has more complexity (if I can call it that) under the hood.

I would prefer to keep instructions for the less complex version up for 
the time being as we are having difficulties getting biojava to work 
seamlessly with the more complex version. This is almost certainly a 
failing of biojava for which the oracle support seems to have been 
compiled against the 'simple' schema not the 'complex schema'.

I expect we will soon have biojava supporting your version and we can drop 
the 'simple' schema. After all, there is not much point using oracle if 
you don't make use of the features.

- Mark





Hilmar Lapp <hlapp at gmx.net>
Sent by: biosql-l-bounces at portal.open-bio.org
04/18/2005 01:38 PM

 
        To:     "Richard HOLLAND" <hollandr at gis.a-star.edu.sg>
        cc:     Biosql <biosql-l at open-bio.org>, (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [BioSQL-l] release preparation


First off, before going through your HowTo document, as for the 
description of which content is supposed to go where, have you read the 
doc/schema-overview.txt in the biosql repository? Could you list the 
questions that that document leaves open? I'd rather expand that 
document than writing another one from scratch; I thought Aaron did a 
pretty good job towards your request, but certainly this can improved 
or spiked with more details or whatever you find it could do better on.

Now to the HowTo. BTW is there a reason this should not be included in 
the distribution?

> /BioJava and BioSQL/Oracle HOWTO
>
> What you'll need
>
> Bio*
>
> You'll need the latest version of BioJava to take advantage of the 
> full functionality of BioSQL. This can be downloaded from biojava.org 
> . You'll also need the latest Oracle BioSQL schema. Here you have a 
> choice of two options:
> Original : by Hilmar Lapp, the original BioSQL schema takes full 
> advantage of Oracle's security mechanisms and produces a complex but 
> high quality schema. You'll need sysdba access to your database to 
> install it.

I'd appreciate if this could be straightened out a bit, as you really 
do not need sysdba access if you're not going to create tablespaces and 
users, and not doing these steps is a simple matter of commenting out 
the respective lines.

If you are though then having access to sysdba or access to someone who 
does (i.e., pair-programming with your DBA for this task) is kind of 
unavoidable ...

Also, the distinction of a 'complex schema' coming out of the original 
and 'simplified structure' of Len's version sounds a bit too misleading 
for me, since the schema is no different between either version; there 
is no difference in number of tables or constraints or whatever (or is 
there?).

What simplified structure might refer to is that Len's version leaves 
out the PL/SQL packages etc? Again, just as a note, this is trivial to 
disable in BS-create-all, just comment out the respective steps.

As another note, in most Oracle environments an installer will not have 
sysdba access nor will she be supposed to create tablespaces or users; 
the DBA will do it for her. In those environments, the scriptlet that 
does this step will serve merely as an instructional template for the 
DBA for what to create. I.e., in usual Oracle environments tablespace, 
user, and role creation will be commented out because the DBA does them 
(has done them already).

>  Go to cvs.open-bio.org , select the biosql project, and navigate to 
> and download the entire biosql-schema/sql/biosql-ora folder.
> Simplifed : by Len Trigg, this version is simplified in structure and 
> sits entirely inside a single user account, requiring no sysdba access 
> to install. You'll have to ask for a copy of the script from the 
> biosql-l mailing lists.
> Both options are fully functional and compatible with both BioJava and 
> BioPerl.
>
> Oracle
>
> Obviously, you'll need an Oracle database. For the Original schema, 
> you'll also need sysdba access, or get your DBA to help you if you do 
> not have this yourself.
> For the Simplified schema you just need your own login to Oracle, and 
> the permissions to create tables. You'll also need to know the 
> tablespace name to use, ask your DBA.
>
> Bugfixing
>
> NOTE: Some of these fixes may already have been made by the time you 
> read this, so be careful and check they have not already been done!
>
> Original schema
>
> Before you do anything else, you'll need to ensure that all the 
> scripts in the folder refer to the correct local settings file. This 
> is not always the case, so be careful. The best thing to do is a 
> global search on all the files you downloaded, and replace all 
> references to BS-defs with BS-defs-local .

I've done this a while ago and think there's no instances left where 
this hasn't been changed. Please check.

>  Of course, don't do this in BS-defs.sql itself.
>
> Now you'll need to find the CREATE TABLE SG_Biosequence statement in 
> BS-DDL.sql . You'll notice there is a constraint there called 
> Alphabet4 . The values in the constraint ( dna ,protein etc.) are all 
> in lower case. BioJava uses upper case values for these fields, but 
> BioPerl uses lower case! To make it work with BioJava, you'll have to 
> modify the constraint line so that it reads like this:
> CONSTRAINT Alphabet4
>       CHECK (lower(Alphabet) IN ('dna', 'protein', 'protein-term', 
> 'rna')),

I've changed this but by enumerating all allowed terms so case-mixing 
within a term isn't allowed. I haven't included 'protein-term' yet; 
what is this? Is it necessary? What does it denote?

>
> This of course will make BioJava work, but will stop BioPerl from 
> being able to retrieve records correctly as it will not recognise the 
> upper-case versions of these values. One day hopefuly the two projects 
> will come up with a resolution to this issue.

I've changed this in bioperl-db so that a retrieved alphabet term is 
converted to lower case. (This doesn't make Biojava work with 
Bioperl-db-inserted data yet though :-)

>
> In BS-create-Biosql-usersyns.sql you need to add another command under 
> the list of set commands at the top. This command should read:
> set lines 200

Fixed, thanks for reporting.

> What this does is to temporarily increase the maximum length of am 
> output line in Oracle, whilst it is creating the usersyns.sql script. 
> If you do not do this, the generated script will contain linebreaks 
> midway through names of tables, which will cause the script to fail.
>
> Last of all, unless this has already been fixed in the CVS versions of 
> BioSQL by the time you read this, there is a section at the end of 
> BS-grants.sql which grants permissions to the various BioSQL users to 
> see the SG_User table. The statement currently reads like this:
> --
>    -- Biosql grants for SG_USER: needs select on all views and synonyms
>    -- that don't follow the SG% convention.
>    --
>    SELECT 'GRANT SELECT ON ' || object_name || ' TO &biosql_user;'
>    FROM user_objects
>    WHERE object_name NOT LIKE 'SG_%'
>    AND   object_name NOT LIKE '%$%'
>    AND   object_name NOT LIKE '%_PK_SEQ'
>    AND   object_type IN ('VIEW','SYNONYM')
>    ;
> You need to comment out the line that reads AND   object_name NOT LIKE 
> '%_PK_SEQ' by putting two dashes ( -- ) before it. This allows the 
> users to see the sequence required to allow them to generate new 
> records in the database.

Note that the original statement is correct because SG_USER (or 
whatever you define biosql_user to be) is supposed to be read-only and 
should never generate new records in the database. SG_LOADER, or 
whatever you set biosql_loader to be, is for r/w access and should get 
proper permissions to the sequences.

Of course you are free to dispose of the distinction between a 
read-only and a r/w user for your instance, but I don't think that 
should be the default ... BTW there is nothing that stops you from 
defining biosql_user and biosql_loader to the exact same user to 
achieve this very effect.

Let me know if I'm missing something here ...

>
> Simplified schema
>
> The only fix to make here is to do with the maximum value allowed in a 
> bioentry qualifier. Find the statement that creates the table 
> BioEntry_Qualifier_Value and alter the definition for the VALUE column 
> so that it has a maximum size of 300.

Note that in the standard schema this is a VARCHAR2(4000) meanwhile.

>
> Installation
>
> Original schema
>
> Make sure you have set the $ORACLE_SID environment variable to the 
> correct database before running the scripts, as they 
> connect/disconnect several times and if it is not set, you may end up 
> running them against the wrong database.

Again, if the roles, user, and tablespace creation steps are commented 
out there should be no reconnecting. At least theoretically ...

>
> The installation requires the creation of three tablespaces - one for 
> data, one for indexes, one for LOB objects.

Again note that there is nothing that stops you from defining all three 
in BS-defs-local to the same tablespace (or two) which already exist. 
(If you define them to the same it should exist already as the 
tablespace creation script does assume that they are different.)

I kind of tried to write it such that you can do it 'complicated' if 
you want and simple if you don't ... maybe I should have pointed that 
out better.

> Decide where you will be keeping the database files for these, and 
> what you will call the tablespaces. Don't create them yet though, just 
> write down the names. As always it is good practice to keep the data 
> and index tablespaces on separate disks to prevent IO bottlenecks, but 
> you can probably safely put the data and LOB tablespaces on the same 
> disk.
>
> You will also need to decide on names for the two basic roles that 
> BioSQL uses - the base_user role which contains just enough privileges 
> to connect to the database, and the schema_creator role, which 
> contains the privileges required to create database objects in a 
> schema. Again, don't create them just yet.
>
> Now, copy BS-defs.sql to BS-defs-local.sql and edit it. You should 
> check every entry in it carefully, particularly the names and 
> locations of the tablespace files to be created, and the names of the 
> two roles you just decided on above. You will also choose names for 
> the various default BioSQL roles. biosql_owner is not a role but the 
> actual owner of the schema that will have the schema_creator role 
> granted to it, you'll need to define its password here too. 
> biosql_user is a role to be granted to people who need read-only 
> access to the BioSQL database, biosql_loader is a role designed for 
> batch upload processes, whilst biosql_admin has full read-write 
> permission on the schema.

I guess I need to update the comments here. I ended up never using the 
biosql_admin role but using the biosql_loader role instead as the r/w 
user. This is pretty much how permissions are granted.

So maybe do I need to include a sample BS-defs-local and BS-create-all 
with 'simplified' settings?

                 -hilmar

>
> Once you have edited the BS-defs-local.sql script appropriately, you 
> need to create the two base roles of base_user and schema_creator 
> manually. Create them by running something similar to the following 
> script whilst logged in as sysdba, from inside the biosql-ora 
> directory:
> @BS-defs-local
>    create role &base_user;
>    grant
>    CREATE SESSION,
>    CREATE SYNONYM,
>    CREATE VIEW
>    to &base_user;
>    create role &schema_creator;
>    grant
>    CREATE PROCEDURE,
>    CREATE ROLE,
>    CREATE SEQUENCE,
>    CREATE SESSION,
>    CREATE SYNONYM,
>    CREATE TRIGGER,
>    CREATE TYPE,
>    CREATE VIEW,
>    CREATE TABLE
>    to &schema_creator
>    with admin option;
>
> If you want some basic users set up, edit the BS-create-users.sql 
> script to look at the sample users it will create for you 
> automatically. If you don't want them, or want different names etc., 
> comment them out or edit them.
>
> The final stage before actual installation is to edit the 
> BS-create-all.sql script to ensure that only the steps you require are 
> carried out. If you already have predefined tablespaces and don't want 
> it to create new ones, comment out the line that reads 
> @BS-create-tablespaces . Likewise if you don't want any default data 
> loaded into the database, comment out the line near the end that reads 
> @BS-prepopulate-db .
>
> Under section 8 of BS-create-all.sql you need to make sure the 
> following commands appear in the order below. If they appear in any 
> other order, you will not be able to create other users to access the 
> database later! The commands should read:
> @BS-create-roles
>    @BS-create-synonyms
>    @BS-create-Biosql-API2
>    @BS-create-Biosql-usersyns
>    @BS-grants
> (NOTE: The BS-create-Biosql-API2 script is an alternative to 
> BS-create-Biosql-API which works much better with BioJava. This is 
> because BioJava has no flexibility about column names in tables. The 
> API2 version of the script ensures that the column names are exactly 
> the same as what BioJava expects by using synonyms. But, no matter 
> which you run, everything will still work fine with BioPerl).
>
> Now, log in to the database as sysdba from inside the biosql-ora 
> directory. Create the BioSQL database by typing:
> @BS-create-all
> . You might want to spool the output to see what happens, but you'll 
> find that half of it doesn't appear in the spool file, because BioSQL 
> is using spool itself to generate dynamic scripts on the fly. If 
> you've done everything right, the only messages you should get are a 
> few Table or view does not exist style messages, referring to the 
> attempts by the script to drop old objects before recreating new ones.
>
> During installation you will be prompted for the sysdba username and 
> password several times. This is required to create tablespaces and 
> users.
>
> If something goes wrong, you can safely rerun the script without 
> dropping anything first as it will drop the database objects from the 
> previous attempt first. It will however leave behind the tablespaces, 
> users, and roles. You can always just drop the users and tablespaces 
> that have been created if it really messes up, and start again from 
> scratch.
>
> Now,  your database has been installed! The only remaining step is to 
> log in to each user who will be using BioSQL, and run the usersyns.sql 
> script that the installation generated for you in the biosql-ora 
> directory. This script creates the synonyms for the BioSQL objects and 
> allows the users to see them. This script should not have any errors 
> at all. If it does, edit it and check it closely for things like 
> misplaced linebreaks etc.
>
> Note that Oracle sometimes has issues with roles and does not 
> apparently grant them correctly. If this happens, you will need to 
> grant the appropriate roles to the individual users manually (see the 
> short create role script above) and rerun the usersyns.sql script. 
> Sometimes you will find they don't even have the appropriate 
> tablespace quotas on the three BioSQL tablespaces. You'll need to 
> grant these tablespace quotas using the alter user <bloggs> quota 
> unlimited on <tablespace> command.
>
> Simplified schema
>
> NOTE: You will have to do a global search-and-replace on this script 
> to replace the two tablespace names with the ones you will actually be 
> using. Check with your DBA. This version of the schema only has two 
> tablespaces - one for data, the other for indexes.
>
> This is much easier to set up than the Original schema. Simply log in 
> as the user you wish to install BioSQL as, ensure that your DBA has 
> granted that user the same rights as for the schema_creator role 
> described in the Original installation instructions above, then 
> execute the single script that defines the schema. You should have no 
> problems. You can spool the output to a file if you like to be able to 
> check the results.
>
> This schema is a one-user-only schema, where all users log in as the 
> schema owner and have full read/write access to the entire database. 
> This is the most important difference between this schema and the 
> Original .
>
> Testing
>
> Any BioJava script should work fine!
>
> THE END!
>
> Richard Holland, hollandr at gis dot a-star dot edu dot sg, December 
> 2004

On Sunday, April 17, 2005, at 06:07  PM, Richard HOLLAND wrote:

> The only issues I have are with the Oracle installation, which I came
> across whilst writing the Oracle BioSQL howto at
> http://www.biojava.org/docs/bj_in_anger/bj_and_bsql_oracle_howto.htm -
> the issues are mentioned in that article. If they have been resolved or
> are no longer relevant, then I'd consider it ready for release.
>
> However as part of the release I'd really appreciate a document
> describing exactly what is supposed to be stored in each column/table
> (just supposed to be - doesn't have to be the way any particular Bio*
> project actually does it). This would be very helpful in the efforts to
> unite the various Bio* projects and make them all use the same tables
> for the same things (which is not always the case at present).
>
> cheers,
> Richard
>
> Richard Holland
> Bioinformatics Specialist
> GIS extension 8199
> ---------------------------------------------
> This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its content to any
> other person. Thank you.
> ---------------------------------------------
>
>
>> -----Original Message-----
>> From: biosql-l-bounces at portal.open-bio.org
>> [mailto:biosql-l-bounces at portal.open-bio.org] On Behalf Of Hilmar Lapp
>> Sent: Sunday, April 17, 2005 4:31 AM
>> To: Biosql
>> Subject: [BioSQL-l] release preparation
>>
>>
>> I've issued this call earlier and I believe have implemented all
>> suggestions. To be sure, please let me know if you have any
>> issues with
>> the schema or instantiation or if you know of any that should be
>> addressed before releasing 1.0.
>>
>> Other than that Brian has updated the PostgreSQL generated ERD HTML
>> document so that everything should be up to date and ready to go.
>>
>> So please let me know and otherwise I'll target release for
>> the end of
>> this month.
>>
>>               -hilmar
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at open-bio.org
>> http://open-bio.org/mailman/listinfo/biosql-l
>>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


_______________________________________________
BioSQL-l mailing list
BioSQL-l at open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l





More information about the BioSQL-l mailing list