From hlapp at gnf.org  Fri Apr  1 15:25:55 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Fri Apr  1 15:22:07 2005
Subject: [BioSQL-l] dates and terms
In-Reply-To: <OF2DFE695F.6E530410-ON48256FD6.00055D95-48256FD6.000582D8@EU.novartis.net>
References: <OF2DFE695F.6E530410-ON48256FD6.00055D95-48256FD6.000582D8@EU.novartis.net>
Message-ID: <dd83318618c659d13847b3e3b62bdcdc@gnf.org>

I agree but I'm also a reluctant to expand the 'official' schema, 
because 1) it would necessarily introduce redundancy, 2) it is solely 
needed for query optimization, and 3) for most RDBMSs you can probably 
come up with a non-invasive optimization on top of the official schema 
that is easy enough to implement, like I showed for Oracle.

Actually, depending on the level of individual expertise argument 3) 
may border on arrogance so I'll retract it immediately. So maybe what 
could efficiently help people solve this and similar problems is 
supplementary SQL code in the repository, for instance organized by 
problem or use case. So for the date problem, there would be SQL 
scripts for each supported platform that implement a solution without 
altering the core schema, like the one I suggested for Oracle and 
augmented with a function-based index DDL so that even if you didn't 
know PL/SQL and your DBA were yourself you could just run the script 
and be on your way.

BTW doing something like

	SQL> ALTER TABLE bioentry_qualifier_value ADD (date_value DATE);

followed by properly populating the column from the VARCHAR-type value 
column in my opinion still counts as a non-invasive optimization, 
because unless someone used SELECT * (which is always a bad idea 
anyway) this won't break anything. If the RDBMS supports triggers, you 
can write a trigger that automatically creates and maintains the value 
of the additional column depending on the value of the value column.

And to make the separation tidy and obvious, you could also create a 
table

CREATE TABLE bioentry_date (
	bioentry_id INTEGER NOT NULL,
	term_id	INTEGER NOT NULL,
	rank NUMBER(3,0) NOT NULL,
	date_value DATE)
)

and then use the same method as before to populate and maintain the 
table automatically. (for brevity I obviously left out UK and FK 
constraints, but they'd be analogous to bioentry_qualifier_value.)

So, bottom line of what I'm saying is that usually if a problem 
pertains to optimization rather than a model deficiency, there'll be an 
array of options to solve the problem without altering the model, and 
so I'll be reluctant to alter the model.

If people disagree (or agree) with this view please let me know, it'd 
be good to know where people generally stand on such questions, and 
what poses a problem and what doesn't.

	-hilmar

On Mar 31, 2005, at 5:00 PM, mark.schreiber@novartis.com wrote:

> Hello -
>
> I guess this is the nearest approximation to a date field. It might be
> something worth considering for a later version of bioSQL as pretty 
> much
> all records have one or more dates attached to them.
>
> - Mark
>
>
>
>
>
> Hilmar Lapp <hlapp@gnf.org>
> 04/01/2005 04:39 AM
>
>
>         To:     Mark Schreiber/GP/Novartis@PH
>         cc:     biosql-l@open-bio.org
>         Subject:        Re: [BioSQL-l] dates and terms
>
>
> Bioperl-db stores these similarly, but the term is 'date_changed' which
> basically comes from Bioperl's Bio::Seq::RichSeq.
>
> You can compare these dates but it's hard to do so universally for a
> search against the database. There is a scriptlet
> scripts/biosql/update-on-new-date.pl in the bioperl-db repository that
> shows a pretty straightforward approach for comparison. It uses
> Date::Parse which does a nice job of detecting most date formats
> automatically.
>
> The formats being used are actually I believe not dramatically
> different. In UniProt, they look like the following:
>
> DT   01-NOV-1995 (Rel. 32, Created)
> DT   01-OCT-1996 (Rel. 34, Last sequence update)
> DT   28-FEB-2003 (Rel. 41, Last annotation update)
>
> and these get stored as an array with the following elements
>
> 01-NOV-1995 (Rel. 32, Created)
> 01-OCT-1996 (Rel. 34, Last sequence update)
> 28-FEB-2003 (Rel. 41, Last annotation update)
>
> Date::Parse will just ignore the non-date stuff in parentheses. I don't
> know whether there's a similarly convenient library in Java.
>
> In Oracle you can specify the date format when converting. So, the
> following would take everything up to the first space character and
> convert it assuming the format used above:
>
>    1  select to_date(decode(instr('01-NOV-1995 (Rel. 32, Created)',' 
> '),
>    2                        0,
>    3                        '01-NOV-1995 (Rel. 32, Created)',
>    4                        substr('01-NOV-1995 (Rel. 32, Created)',
>    5                               1,
>    6                               instr('01-NOV-1995 (Rel. 32,
> Created)',
>    7                                     ' ')
>    8                              )
>    9                       ),
>   10                 'dd-mon-yyyy')
>   11* from dual
> SQL> /
>
> TO_DATE(DECODE(IN
> -----------------
> 11/01/95 00:00:00
>
> 1 row selected.
>
> The DECODE() protects from cases when there is nothing following the
> date.
>
> If this looks too messy, hide it in a function:
>
>    1  CREATE OR REPLACE
>    2  FUNCTION biosql_to_date(qual_value IN VARCHAR2,
>    3                          date_format IN VARCHAR2 DEFAULT
> 'dd-mon-yyyy')
>    4  RETURN DATE
>    5  IS
>    6     spacepos INTEGER;
>    7  BEGIN
>    8     spacepos := INSTR(qual_value,' ');
>    9     IF spacepos = 0 THEN
>   10             RETURN TO_DATE(qual_value,date_format);
>   11     END IF;
>   12     RETURN TO_DATE(SUBSTR(qual_value,1,spacepos),
>   13                    date_format);
>   14* END;
> SQL> /
>
> Function created.
>
> Elapsed: 00:00:00.60
> SQL> select biosql_to_date('01-NOV-1995 (Rel. 32, Created)') from dual;
>
> BIOSQL_TO_DATE('0
> -----------------
> 11/01/95 00:00:00
>
> 1 row selected.
>
> Elapsed: 00:00:00.01
> SQL> select biosql_to_date('01-NOV-1995') from dual;
>
> BIOSQL_TO_DATE('0
> -----------------
> 11/01/95 00:00:00
>
> 1 row selected.
>
> Elapsed: 00:00:00.01
>
> Obviously, if you query using this or a similar function, the query
> optimizer will do a full table scan and not use an index on
> bioentry_qualifier_value. However, you can create a function index on
> bioentry_qualifier_value.value using the above function, and queries
> using the same function will then be indexed. In that case you would
> need to make a small amendment to the function above by catching the
> exception that results from parsing strings that aren't dates and then
> return NULL instead. (Oracle does not index NULLs. Unlike in
> PostgreSQL, you cannot have partial indexes in Oracle AFAIK, i.e., in
> Oracle the index creation statement cannot contain a WHERE clause.)
>
> Does this help?
>
>                  -hilmar
>
> On Mar 31, 2005, at 1:07 AM, mark.schreiber@novartis.com wrote:
>
>> Hello -
>>
>> Many records that might be stored in BioSQL have associated date
>> fields.
>> Biojava stores these as value in bioentry_qualifier_value with the
>> term_id
>> pointing to the Term for date.
>>
>> This seems to place a serious limitation on searching by date. I would
>> like to be able to search for sequences entered between X and Y or
>> before
>> X etc. Has anyone come up with a workaround for date operations on
>> VarChar2 or Strings?
>>
>> Thanks
>>
>> Mark Schreiber
>> Principal Scientist (Bioinformatics)
>>
>> Novartis Institute for Tropical Diseases (NITD)
>> 10 Biopolis Road
>> #05-01 Chromos
>> Singapore 138670
>> www.nitd.novartis.com
>>
>> phone +65 6722 2973
>> fax  +65 6722 2910
>>
>>
>> ______________________________________________________________________
>> The Novartis email address format has changed to
>> firstname.lastname@novartis.com.  Please update your address book
>> accordingly.
>> ______________________________________________________________________
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l@open-bio.org
>> http://open-bio.org/mailman/listinfo/biosql-l
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From caritov at gmail.com  Mon Apr  4 15:44:01 2005
From: caritov at gmail.com (carito vargas)
Date: Mon Apr  4 15:37:49 2005
Subject: [BioSQL-l] Trying to get BioSql
Message-ID: <15a9a89705040412443abd1e0@mail.gmail.com>

Hello,

I want to change our database model (of genetic sequences, involving
the proces of annotation and genome sequence) and want to use BioSql. 
I have understood that BioSql is a schema.  How can I get it? It works
with BioPerl and MySql?

Carito
From hollandr at gis.a-star.edu.sg  Mon Apr  4 21:10:27 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Mon Apr  4 21:06:43 2005
Subject: [BioSQL-l] Trying to get BioSql
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601950A31@BIONIC.biopolis.one-north.com>

You can install the BioSQL schema in Oracle, MySQL, or Postgres, and
maybe more I don't know about. There is a website on the subject here:

http://obda.open-bio.org/

The schema itself is available here:


Currently BioPerl, BioJava and BioPython all have interfaces to BioSQL,
although there are a few pecularities that make them not entirely 100%
compatible with each other when doing so. Hopefully these will be sorted
soon.

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biosql-l-bounces@portal.open-bio.org 
> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
> carito vargas
> Sent: Tuesday, April 05, 2005 3:44 AM
> To: biosql-l@open-bio.org
> Subject: [BioSQL-l] Trying to get BioSql
> 
> 
> Hello,
> 
> I want to change our database model (of genetic sequences, involving
> the proces of annotation and genome sequence) and want to use BioSql. 
> I have understood that BioSql is a schema.  How can I get it? It works
> with BioPerl and MySql?
> 
> Carito
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 

From caritov at gmail.com  Wed Apr  6 11:44:28 2005
From: caritov at gmail.com (carito vargas)
Date: Wed Apr  6 11:38:26 2005
Subject: [BioSQL-l] last version of the schema
Message-ID: <15a9a897050406084412322b5e@mail.gmail.com>

Hi, 
I am trying to load a swissprot database and get this error:

DBD::mysql::st execute failed: Unknown column 'display_id' in 'field 
> list'

This means - as I read in other posts- that I didn't get the last
version of the schema.  I download it from
http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/sql/biosqldb-mysql.sql?cvsroot=biosql
Revision 1.40 and it still doesn't work ... anyone can tell me where
can I find a version that works??

Carito Vargas
From hlapp at gnf.org  Wed Apr  6 14:05:08 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Wed Apr  6 13:58:36 2005
Subject: [BioSQL-l] last version of the schema
In-Reply-To: <15a9a897050406084412322b5e@mail.gmail.com>
References: <15a9a897050406084412322b5e@mail.gmail.com>
Message-ID: <4dc55e7673882455658d8c24dbd3aff9@gnf.org>

You may have the right version of the schema (there should indeed be no  
column display_id), but the wrong version of bioperl-db. Did you  
download the CVS head from bioperl-db? If you did, did you run the  
tests and what where the test results? All tests should pass; if they  
don't then that's not a good sign.

	-hilmar

On Apr 6, 2005, at 8:44 AM, carito vargas wrote:

> Hi,
> I am trying to load a swissprot database and get this error:
>
> DBD::mysql::st execute failed: Unknown column 'display_id' in 'field
>> list'
>
> This means - as I read in other posts- that I didn't get the last
> version of the schema.  I download it from
> http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/sql/ 
> biosqldb-mysql.sql?cvsroot=biosql
> Revision 1.40 and it still doesn't work ... anyone can tell me where
> can I find a version that works??
>
> Carito Vargas
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gnf.org  Wed Apr  6 15:36:48 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Wed Apr  6 15:30:17 2005
Subject: [BioSQL-l] Re: last version of the schema
In-Reply-To: <15a9a897050406120925479812@mail.gmail.com>
References: <15a9a897050406084412322b5e@mail.gmail.com>
	<4dc55e7673882455658d8c24dbd3aff9@gnf.org>
	<15a9a897050406120925479812@mail.gmail.com>
Message-ID: <bfbc1343f77050c87743e31f09436650@gnf.org>


On Apr 6, 2005, at 12:09 PM, carito vargas wrote:

> I had another question, with the last version of BioSql Schema we are
> capable to store diferent types of Annotations?

Yes. If you look at the ERD you'll see that there are references, 
dbxrefs, comments, and ontology terms associated with bioentries.

In bioperl, these map to Bio::Annotation::Reference, 
Bio::Annotation::DBLink, Bio::Annotation::Comment, and 
Bio::Annotation::OntologyTerm (w/o qualifier value) or 
Bio::Annotation::SimpleValue (w/ qualifier value).

	-hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gnf.org  Fri Apr  8 19:51:07 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Fri Apr  8 19:44:47 2005
Subject: [BioSQL-l] Re: [Bioperl-l] biosql.html
In-Reply-To: <GPENLDEIJJHJLHOAJBBPGEDPCEAA.brian_osborne@cognia.com>
References: <GPENLDEIJJHJLHOAJBBPGEDPCEAA.brian_osborne@cognia.com>
Message-ID: <a7aae2331d5efa98c7e2942f9f288a4b@gnf.org>

Thanks a lot Brian. This will help. -hilmar

On Apr 7, 2005, at 5:09 AM, Brian Osborne wrote:

> Hilmar,
>
> I've updated biosql.html in biosql-schema. Postgres installation in 
> Cygwin
> was no longer the 2 minute exercise it was a while back but it's still
> pretty easy, biosql installation was as easy as ever.
>
> Brian O.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From ankitson at gmail.com  Sun Apr 10 14:04:03 2005
From: ankitson at gmail.com (ankit soni)
Date: Sun Apr 10 13:57:42 2005
Subject: [BioSQL-l] getting exon information from genbank files
Message-ID: <ce17dc1a0504101104bcb747b@mail.gmail.com>

Hi all,
I have just started using BioSQL for one of my projects and I have
loaded few genbank files in the MySQL database using BioPerl and the
standard schema.
I wanted to ask how can I get the information about the exons, introns
from the database.
If I use the following querry I get the start and end position but I
am not able to find out what limits(start_pos and end-pos) stand for
i.e. gene or exon or intron.
mysql> select * from location where seqfeature_id='XXXX';
+-------------+---------------+-----------+---------+-----------+---------+--------+------+
| location_id | seqfeature_id | dbxref_id | term_id | start_pos |
end_pos | strand | rank |
+-------------+---------------+-----------+---------+-----------+---------+--------+------+
|       YYYY |         XXXX  |      NULL  |    NULL |      ABC  |   
EFG |      1    |    1     |
+-------------+---------------+-----------+---------+-----------+---------+--------+------+

It would be very helpful if somebody can guide me.
I am sorry if I am unable to use the correct biological terms as I
know very little of biology.

Ankit Soni
Junior Undergraduate
Dept. of Computer Science
IIT kanpur
India
From s0460205 at sms.ed.ac.uk  Mon Apr 11 04:28:32 2005
From: s0460205 at sms.ed.ac.uk (SG Edwards)
Date: Mon Apr 11 04:23:45 2005
Subject: [BioSQL-l] Examples of queries - help for beginners!!
Message-ID: <1113208112.425a35308186b@sms.ed.ac.uk>


Hi everyone,

(following on from the post about genbank exons) I would be interested to know
if there is somewhere that contains a list of example queries for use with
BioSQL, for example, if you want all proteins that were entered before 2000 and
have a PDB structure => "do this query".

This would be an extremely helpful tool for people who are new to the SQL
language (like me!) and would make BioSQL a far more accessible tool.

Thanks,

Stephen
From hlapp at gnf.org  Mon Apr 11 14:55:09 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Apr 11 14:48:21 2005
Subject: [BioSQL-l] getting exon information from genbank files
In-Reply-To: <ce17dc1a0504101104bcb747b@mail.gmail.com>
Message-ID: <3A25F44B-AABB-11D9-81AB-000A959EB4C4@gnf.org>

Ankit, the values you're showing in your sample record, did you make  
them up entirely or is this an actual query result?

Note that all columns in the location table are numeric, so it only  
creates confusion if you choose letters as characters to mask the real  
values. If they are the real values that you must have changed the  
schema and not used load_seqdatabase.pl to load records.

Note also that generally what's in biosql will closely resemble the  
object model that was built by the SeqIO bioperl parser run on your  
input record(s) - provided you used load_seqdatabase.pl to load the  
record(s). So, what ends up in biosql as the result of loading a  
genbank record greatly depends on the genbank record itself. As a rule,  
what the genbank record had in its feature table you'll also find in  
biosql as a seqfeature record, and what wasn't in the feature table you  
also won't find in biosql. Introns are usually not annotated in Genbank  
explicitly, they are only implicit as the region between exons, so  
unless the genbank record you loaded were exceptions you . How to find  
exons again depends on the feature table of the original records: some  
have a single cDNA feature with a composite ('split') location, which  
will end up in biosql as one seqfeature that has many locations  
attached. Genomic contigs sometimes have the exons annotated as  
individual features, and then this is what you'll find in biosql too:  
one seqfeature per exon, each with a single location.

The bottom line is, if you load through load_seqdatabase.pl the content  
in biosql will closely match the object tree in bioperl - which often  
times will be close to the data structure of the original input record.  
Features that weren't there to begin with you won't find magically  
added.

So, to come back to your question, there is no good answer because it  
greatly depends  on what your input was. Most likely though you'll have  
to impute introns by fetching the locations of the cDNA (or mRNA)  
feature or the locations of the exon features, order them properly, and  
then infer introns between consecutive exons.

If this is what you need to do all the time I'd write a script that  
does this in an automated fashion against all newly loaded records and  
inserts the introns as features back into the database.

	-hilmar

On Sunday, April 10, 2005, at 11:04  AM, ankit soni wrote:

> Hi all,
> I have just started using BioSQL for one of my projects and I have
> loaded few genbank files in the MySQL database using BioPerl and the
> standard schema.
> I wanted to ask how can I get the information about the exons, introns
> from the database.
> If I use the following querry I get the start and end position but I
> am not able to find out what limits(start_pos and end-pos) stand for
> i.e. gene or exon or intron.
> mysql> select * from location where seqfeature_id='XXXX';
> +-------------+---------------+-----------+---------+----------- 
> +---------+--------+------+
> | location_id | seqfeature_id | dbxref_id | term_id | start_pos |
> end_pos | strand | rank |
> +-------------+---------------+-----------+---------+----------- 
> +---------+--------+------+
> |       YYYY |         XXXX  |      NULL  |    NULL |      ABC  |
> EFG |      1    |    1     |
> +-------------+---------------+-----------+---------+----------- 
> +---------+--------+------+
>
> It would be very helpful if somebody can guide me.
> I am sorry if I am unable to use the correct biological terms as I
> know very little of biology.
>
> Ankit Soni
> Junior Undergraduate
> Dept. of Computer Science
> IIT kanpur
> India
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gnf.org  Mon Apr 11 14:59:44 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Apr 11 14:52:56 2005
Subject: [BioSQL-l] Examples of queries - help for beginners!!
In-Reply-To: <1113208112.425a35308186b@sms.ed.ac.uk>
Message-ID: <DDCA9846-AABB-11D9-81AB-000A959EB4C4@gnf.org>

I agree - but the answers aren't necessarily simple.

For instance, to take your example, you'd have to write a bioperl-db 
adaptor first for the structure modules in bioperl to get them 
serialized to biosql, adapt load_seqdatabase.pl or create a clone that 
would load structures instead of Bio::Seq's, establish a method that 
links structures to their protein entries in biosql, and then you could 
finally look at how to constrain for the date of entry.

	-hilmar


On Monday, April 11, 2005, at 01:28  AM, SG Edwards wrote:

>
> Hi everyone,
>
> (following on from the post about genbank exons) I would be interested 
> to know
> if there is somewhere that contains a list of example queries for use 
> with
> BioSQL, for example, if you want all proteins that were entered before 
> 2000 and
> have a PDB structure => "do this query".
>
> This would be an extremely helpful tool for people who are new to the 
> SQL
> language (like me!) and would make BioSQL a far more accessible tool.
>
> Thanks,
>
> Stephen
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From ankitson at gmail.com  Tue Apr 12 05:09:38 2005
From: ankitson at gmail.com (ankit soni)
Date: Tue Apr 12 05:06:35 2005
Subject: [BioSQL-l] Re: getting exon information from genbank files
In-Reply-To: <3A25F44B-AABB-11D9-81AB-000A959EB4C4@gnf.org>
References: <ce17dc1a0504101104bcb747b@mail.gmail.com>
	<3A25F44B-AABB-11D9-81AB-000A959EB4C4@gnf.org>
Message-ID: <ce17dc1a0504120209524f535b@mail.gmail.com>

Sorry for the confusion the values were masked  they were not actual values .
Later I was able to figure out how to do the stuff what  I needed.
I am developing few example SQL queries which I will post on the list soon. 

Thanks for helping.
Ankit Soni


On Mon, 11 Apr 2005 11:55:09 -0700, Hilmar Lapp <hlapp@gnf.org> wrote:
> Ankit, the values you're showing in your sample record, did you make  
> them up entirely or is this an actual query result?
> 
> Note that all columns in the location table are numeric, so it only  
> creates confusion if you choose letters as characters to mask the real  
> values. If they are the real values that you must have changed the  
> schema and not used load_seqdatabase.pl to load records.
> 
> Note also that generally what's in biosql will closely resemble the  
> object model that was built by the SeqIO bioperl parser run on your  
> input record(s) - provided you used load_seqdatabase.pl to load the  
> record(s). So, what ends up in biosql as the result of loading a  
> genbank record greatly depends on the genbank record itself. As a rule,  
> what the genbank record had in its feature table you'll also find in  
> biosql as a seqfeature record, and what wasn't in the feature table you  
> also won't find in biosql. Introns are usually not annotated in Genbank  
> explicitly, they are only implicit as the region between exons, so  
> unless the genbank record you loaded were exceptions you . How to find  
> exons again depends on the feature table of the original records: some  
> have a single cDNA feature with a composite ('split') location, which  
> will end up in biosql as one seqfeature that has many locations  
> attached. Genomic contigs sometimes have the exons annotated as  
> individual features, and then this is what you'll find in biosql too:  
> one seqfeature per exon, each with a single location.
> 
> The bottom line is, if you load through load_seqdatabase.pl the content  
> in biosql will closely match the object tree in bioperl - which often  
> times will be close to the data structure of the original input record.  
> Features that weren't there to begin with you won't find magically  
> added.
> 
> So, to come back to your question, there is no good answer because it  
> greatly depends  on what your input was. Most likely though you'll have  
> to impute introns by fetching the locations of the cDNA (or mRNA)  
> feature or the locations of the exon features, order them properly, and  
> then infer introns between consecutive exons.
> 
> If this is what you need to do all the time I'd write a script that  
> does this in an automated fashion against all newly loaded records and  
> inserts the introns as features back into the database.
> 
> 	-hilmar
> 
> On Sunday, April 10, 2005, at 11:04  AM, ankit soni wrote:
> 
> > Hi all,
> > I have just started using BioSQL for one of my projects and I have
> > loaded few genbank files in the MySQL database using BioPerl and the
> > standard schema.
> > I wanted to ask how can I get the information about the exons, introns
> > from the database.
> > If I use the following querry I get the start and end position but I
> > am not able to find out what limits(start_pos and end-pos) stand for
> > i.e. gene or exon or intron.
> > mysql> select * from location where seqfeature_id='XXXX';
> > +-------------+---------------+-----------+---------+----------- 
> > +---------+--------+------+
> > | location_id | seqfeature_id | dbxref_id | term_id | start_pos |
> > end_pos | strand | rank |
> > +-------------+---------------+-----------+---------+----------- 
> > +---------+--------+------+
> > |       YYYY |         XXXX  |      NULL  |    NULL |      ABC  |
> > EFG |      1    |    1     |
> > +-------------+---------------+-----------+---------+----------- 
> > +---------+--------+------+
> >
> > It would be very helpful if somebody can guide me.
> > I am sorry if I am unable to use the correct biological terms as I
> > know very little of biology.
> >
> > Ankit Soni
> > Junior Undergraduate
> > Dept. of Computer Science
> > IIT kanpur
> > India
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l@open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
>
From caritov at gmail.com  Tue Apr 12 10:48:02 2005
From: caritov at gmail.com (carito vargas)
Date: Tue Apr 12 10:42:01 2005
Subject: [BioSQL-l] annotation bundle
Message-ID: <15a9a897050412074855715f33@mail.gmail.com>

Hi,

I want to store the important results of the annotation process of a
sequence.  We already have a Data Base model, but we wanted to study
the factibility of migrating it to BioSql Schema.  I don't know which
tables I should use to store the posible CDS I obtain from a new
sequence.  Still I don't understand well the use / meaning of the
tables of the Annotation Bundle.

Carito Vargas
From hlapp at gnf.org  Tue Apr 12 13:17:55 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Apr 12 13:12:52 2005
Subject: [BioSQL-l] Re: getting exon information from genbank files
In-Reply-To: <ce17dc1a0504120209524f535b@mail.gmail.com>
References: <ce17dc1a0504101104bcb747b@mail.gmail.com>
	<3A25F44B-AABB-11D9-81AB-000A959EB4C4@gnf.org>
	<ce17dc1a0504120209524f535b@mail.gmail.com>
Message-ID: <f1c9b9b38d5fd964b0d5147ee4c4b9f5@gnf.org>

Thanks. Help is always appreciated and sample queries will surely be 
helpful to people.

	-hilmar

On Apr 12, 2005, at 2:09 AM, ankit soni wrote:

> Sorry for the confusion the values were masked  they were not actual 
> values .
> Later I was able to figure out how to do the stuff what  I needed.
> I am developing few example SQL queries which I will post on the list 
> soon.
>
> Thanks for helping.
> Ankit Soni
>
>
>
> On Mon, 11 Apr 2005 11:55:09 -0700, Hilmar Lapp <hlapp@gnf.org> wrote:
>> Ankit, the values you're showing in your sample record, did you make
>> them up entirely or is this an actual query result?
>>
>> Note that all columns in the location table are numeric, so it only
>> creates confusion if you choose letters as characters to mask the real
>> values. If they are the real values that you must have changed the
>> schema and not used load_seqdatabase.pl to load records.
>>
>> Note also that generally what's in biosql will closely resemble the
>> object model that was built by the SeqIO bioperl parser run on your
>> input record(s) - provided you used load_seqdatabase.pl to load the
>> record(s). So, what ends up in biosql as the result of loading a
>> genbank record greatly depends on the genbank record itself. As a 
>> rule,
>> what the genbank record had in its feature table you'll also find in
>> biosql as a seqfeature record, and what wasn't in the feature table 
>> you
>> also won't find in biosql. Introns are usually not annotated in 
>> Genbank
>> explicitly, they are only implicit as the region between exons, so
>> unless the genbank record you loaded were exceptions you . How to find
>> exons again depends on the feature table of the original records: some
>> have a single cDNA feature with a composite ('split') location, which
>> will end up in biosql as one seqfeature that has many locations
>> attached. Genomic contigs sometimes have the exons annotated as
>> individual features, and then this is what you'll find in biosql too:
>> one seqfeature per exon, each with a single location.
>>
>> The bottom line is, if you load through load_seqdatabase.pl the 
>> content
>> in biosql will closely match the object tree in bioperl - which often
>> times will be close to the data structure of the original input 
>> record.
>> Features that weren't there to begin with you won't find magically
>> added.
>>
>> So, to come back to your question, there is no good answer because it
>> greatly depends  on what your input was. Most likely though you'll 
>> have
>> to impute introns by fetching the locations of the cDNA (or mRNA)
>> feature or the locations of the exon features, order them properly, 
>> and
>> then infer introns between consecutive exons.
>>
>> If this is what you need to do all the time I'd write a script that
>> does this in an automated fashion against all newly loaded records and
>> inserts the introns as features back into the database.
>>
>> 	-hilmar
>>
>> On Sunday, April 10, 2005, at 11:04  AM, ankit soni wrote:
>>
>>> Hi all,
>>> I have just started using BioSQL for one of my projects and I have
>>> loaded few genbank files in the MySQL database using BioPerl and the
>>> standard schema.
>>> I wanted to ask how can I get the information about the exons, 
>>> introns
>>> from the database.
>>> If I use the following querry I get the start and end position but I
>>> am not able to find out what limits(start_pos and end-pos) stand for
>>> i.e. gene or exon or intron.
>>> mysql> select * from location where seqfeature_id='XXXX';
>>> +-------------+---------------+-----------+---------+-----------
>>> +---------+--------+------+
>>> | location_id | seqfeature_id | dbxref_id | term_id | start_pos |
>>> end_pos | strand | rank |
>>> +-------------+---------------+-----------+---------+-----------
>>> +---------+--------+------+
>>> |       YYYY |         XXXX  |      NULL  |    NULL |      ABC  |
>>> EFG |      1    |    1     |
>>> +-------------+---------------+-----------+---------+-----------
>>> +---------+--------+------+
>>>
>>> It would be very helpful if somebody can guide me.
>>> I am sorry if I am unable to use the correct biological terms as I
>>> know very little of biology.
>>>
>>> Ankit Soni
>>> Junior Undergraduate
>>> Dept. of Computer Science
>>> IIT kanpur
>>> India
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>>
>>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gnf.org  Tue Apr 12 13:26:35 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Apr 12 13:21:21 2005
Subject: [BioSQL-l] annotation bundle
In-Reply-To: <15a9a897050412074855715f33@mail.gmail.com>
References: <15a9a897050412074855715f33@mail.gmail.com>
Message-ID: <a0dc1d5f4ae502beed066d4d878b1d3c@gnf.org>

  From which end are you coming programmatically? That is, do you 
capture your annotation in the object model of either bioperl, biojava, 
or biopython, or do you try to insert it directly into the database?

I'm asking because if you capture the annotation in one of the 
supporting object models then updating the database may be done for you 
if you re-serialize the modified (annotated) object. This is for 
instance how it will work with bioperl.

As for understanding the roles of particular tables in biosql, have you 
read doc/schema-overview.txt in the biosql download? It was written by 
Aaron with the 'end-user' in mind, and there is a section on the 
annotation bundle representation. If this document doesn't help you, 
could you be specific on what remains unclear and I'll try to answer as 
best as I can.

	-hilmar

On Apr 12, 2005, at 7:48 AM, carito vargas wrote:

> Hi,
>
> I want to store the important results of the annotation process of a
> sequence.  We already have a Data Base model, but we wanted to study
> the factibility of migrating it to BioSql Schema.  I don't know which
> tables I should use to store the posible CDS I obtain from a new
> sequence.  Still I don't understand well the use / meaning of the
> tables of the Annotation Bundle.
>
> Carito Vargas
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gmx.net  Sat Apr 16 16:02:20 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Apr 16 15:55:54 2005
Subject: [BioSQL-l] biosql.org website
Message-ID: <711085F1-AEB2-11D9-8911-000A959EB4C4@gmx.net>

[for those on biosql-l or others who weren't aware - after the domain 
has been squatted on for years the OBF 2 days ago finally was able to 
assume control over the biosql.org domain - thanks Chris for the swift 
registration and thanks Andrew for noticing availability in the first 
place]

Chris,

how can we instate and/or populate the website for www.biosql.org? I 
suggest that until we have something separate that the domain point to 
(be synonymous with) obda.open-bio.org.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gmx.net  Sat Apr 16 16:31:19 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Apr 16 16:24:55 2005
Subject: [BioSQL-l] release preparation
Message-ID: <7D619058-AEB6-11D9-8911-000A959EB4C4@gmx.net>

I've issued this call earlier and I believe have implemented all 
suggestions. To be sure, please let me know if you have any issues with 
the schema or instantiation or if you know of any that should be 
addressed before releasing 1.0.

Other than that Brian has updated the PostgreSQL generated ERD HTML 
document so that everything should be up to date and ready to go.

So please let me know and otherwise I'll target release for the end of 
this month.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gmx.net  Sat Apr 16 16:49:02 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Apr 16 16:42:27 2005
Subject: [BioSQL-l] term synonym
Message-ID: <F70873AC-AEB8-11D9-8911-000A959EB4C4@gmx.net>

There is one small issue with the naming used in the Mysql and 
PostgreSQL schemas, namely this definition of the term_synonym table:

-- ontology terms have synonyms, here is how to store them
CREATE TABLE term_synonym (
        synonym            VARCHAR(255) BINARY NOT NULL,
        term_id            INT(10) UNSIGNED NOT NULL,
        PRIMARY KEY (term_id,synonym)
) TYPE=INNODB;

Synonym is a reserved word in many RDBMSs, so the column synonym may 
eventually be renamed to name, which is its name already in the HSQL 
and Oracle versions.

Is the Mysql or PostgreSQL term_synonym table used with the current 
naming anywhere outside of bioperl? Does anybody have an opinion on 
whether this should be changed now or in a later release only of found 
necessary?

I'm leaning towards leaving it as it is for now but am open if people 
feel differently.

	-hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hollandr at gis.a-star.edu.sg  Sun Apr 17 21:07:40 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Sun Apr 17 21:02:39 2005
Subject: [BioSQL-l] release preparation
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601950F8A@BIONIC.biopolis.one-north.com>

The only issues I have are with the Oracle installation, which I came
across whilst writing the Oracle BioSQL howto at
http://www.biojava.org/docs/bj_in_anger/bj_and_bsql_oracle_howto.htm -
the issues are mentioned in that article. If they have been resolved or
are no longer relevant, then I'd consider it ready for release.

However as part of the release I'd really appreciate a document
describing exactly what is supposed to be stored in each column/table
(just supposed to be - doesn't have to be the way any particular Bio*
project actually does it). This would be very helpful in the efforts to
unite the various Bio* projects and make them all use the same tables
for the same things (which is not always the case at present).

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biosql-l-bounces@portal.open-bio.org 
> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Sunday, April 17, 2005 4:31 AM
> To: Biosql
> Subject: [BioSQL-l] release preparation
> 
> 
> I've issued this call earlier and I believe have implemented all 
> suggestions. To be sure, please let me know if you have any 
> issues with 
> the schema or instantiation or if you know of any that should be 
> addressed before releasing 1.0.
> 
> Other than that Brian has updated the PostgreSQL generated ERD HTML 
> document so that everything should be up to date and ready to go.
> 
> So please let me know and otherwise I'll target release for 
> the end of 
> this month.
> 
> 	-hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 

From hlapp at gmx.net  Mon Apr 18 01:38:53 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon Apr 18 01:32:41 2005
Subject: [BioSQL-l] release preparation
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601950F8A@BIONIC.biopolis.one-north.com>
Message-ID: <26861DBC-AFCC-11D9-9FB4-000A959EB4C4@gmx.net>

First off, before going through your HowTo document, as for the 
description of which content is supposed to go where, have you read the 
doc/schema-overview.txt in the biosql repository? Could you list the 
questions that that document leaves open? I'd rather expand that 
document than writing another one from scratch; I thought Aaron did a 
pretty good job towards your request, but certainly this can improved 
or spiked with more details or whatever you find it could do better on.

Now to the HowTo. BTW is there a reason this should not be included in 
the distribution?

> /BioJava and BioSQL/Oracle HOWTO
>
> What you'll need
>
> Bio*
>
> You'll need the latest version of BioJava to take advantage of the 
> full functionality of BioSQL. This can be downloaded from biojava.org 
> . You'll also need the latest Oracle BioSQL schema. Here you have a 
> choice of two options:
> Original : by Hilmar Lapp, the original BioSQL schema takes full 
> advantage of Oracle's security mechanisms and produces a complex but 
> high quality schema. You'll need sysdba access to your database to 
> install it.

I'd appreciate if this could be straightened out a bit, as you really 
do not need sysdba access if you're not going to create tablespaces and 
users, and not doing these steps is a simple matter of commenting out 
the respective lines.

If you are though then having access to sysdba or access to someone who 
does (i.e., pair-programming with your DBA for this task) is kind of 
unavoidable ...

Also, the distinction of a 'complex schema' coming out of the original 
and 'simplified structure' of Len's version sounds a bit too misleading 
for me, since the schema is no different between either version; there 
is no difference in number of tables or constraints or whatever (or is 
there?).

What simplified structure might refer to is that Len's version leaves 
out the PL/SQL packages etc? Again, just as a note, this is trivial to 
disable in BS-create-all, just comment out the respective steps.

As another note, in most Oracle environments an installer will not have 
sysdba access nor will she be supposed to create tablespaces or users; 
the DBA will do it for her. In those environments, the scriptlet that 
does this step will serve merely as an instructional template for the 
DBA for what to create. I.e., in usual Oracle environments tablespace, 
user, and role creation will be commented out because the DBA does them 
(has done them already).

>  Go to cvs.open-bio.org , select the biosql project, and navigate to 
> and download the entire biosql-schema/sql/biosql-ora folder.
> Simplifed : by Len Trigg, this version is simplified in structure and 
> sits entirely inside a single user account, requiring no sysdba access 
> to install. You'll have to ask for a copy of the script from the 
> biosql-l mailing lists.
> Both options are fully functional and compatible with both BioJava and 
> BioPerl.
>
> Oracle
>
> Obviously, you'll need an Oracle database. For the Original schema, 
> you'll also need sysdba access, or get your DBA to help you if you do 
> not have this yourself.
> For the Simplified schema you just need your own login to Oracle, and 
> the permissions to create tables. You'll also need to know the 
> tablespace name to use, ask your DBA.
>
> Bugfixing
>
> NOTE: Some of these fixes may already have been made by the time you 
> read this, so be careful and check they have not already been done!
>
> Original schema
>
> Before you do anything else, you'll need to ensure that all the 
> scripts in the folder refer to the correct local settings file. This 
> is not always the case, so be careful. The best thing to do is a 
> global search on all the files you downloaded, and replace all 
> references to BS-defs with BS-defs-local .

I've done this a while ago and think there's no instances left where 
this hasn't been changed. Please check.

>  Of course, don't do this in BS-defs.sql itself.
>
> Now you'll need to find the CREATE TABLE SG_Biosequence statement in 
> BS-DDL.sql . You'll notice there is a constraint there called 
> Alphabet4 . The values in the constraint ( dna ,protein etc.) are all 
> in lower case. BioJava uses upper case values for these fields, but 
> BioPerl uses lower case! To make it work with BioJava, you'll have to 
> modify the constraint line so that it reads like this:
> CONSTRAINT Alphabet4
>       CHECK (lower(Alphabet) IN ('dna', 'protein', 'protein-term', 
> 'rna')),

I've changed this but by enumerating all allowed terms so case-mixing 
within a term isn't allowed. I haven't included 'protein-term' yet; 
what is this? Is it necessary? What does it denote?

>
> This of course will make BioJava work, but will stop BioPerl from 
> being able to retrieve records correctly as it will not recognise the 
> upper-case versions of these values. One day hopefuly the two projects 
> will come up with a resolution to this issue.

I've changed this in bioperl-db so that a retrieved alphabet term is 
converted to lower case. (This doesn't make Biojava work with 
Bioperl-db-inserted data yet though :-)

>
> In BS-create-Biosql-usersyns.sql you need to add another command under 
> the list of set commands at the top. This command should read:
> set lines 200

Fixed, thanks for reporting.

> What this does is to temporarily increase the maximum length of am 
> output line in Oracle, whilst it is creating the usersyns.sql script. 
> If you do not do this, the generated script will contain linebreaks 
> midway through names of tables, which will cause the script to fail.
>
> Last of all, unless this has already been fixed in the CVS versions of 
> BioSQL by the time you read this, there is a section at the end of 
> BS-grants.sql which grants permissions to the various BioSQL users to 
> see the SG_User table. The statement currently reads like this:
> --
>    -- Biosql grants for SG_USER: needs select on all views and synonyms
>    -- that don't follow the SG% convention.
>    --
>    SELECT 'GRANT SELECT ON ' || object_name || ' TO &biosql_user;'
>    FROM user_objects
>    WHERE object_name NOT LIKE 'SG_%'
>    AND   object_name NOT LIKE '%$%'
>    AND   object_name NOT LIKE '%_PK_SEQ'
>    AND   object_type IN ('VIEW','SYNONYM')
>    ;
> You need to comment out the line that reads AND   object_name NOT LIKE 
> '%_PK_SEQ' by putting two dashes ( -- ) before it. This allows the 
> users to see the sequence required to allow them to generate new 
> records in the database.

Note that the original statement is correct because SG_USER (or 
whatever you define biosql_user to be) is supposed to be read-only and 
should never generate new records in the database. SG_LOADER, or 
whatever you set biosql_loader to be, is for r/w access and should get 
proper permissions to the sequences.

Of course you are free to dispose of the distinction between a 
read-only and a r/w user for your instance, but I don't think that 
should be the default ... BTW there is nothing that stops you from 
defining biosql_user and biosql_loader to the exact same user to 
achieve this very effect.

Let me know if I'm missing something here ...

>
> Simplified schema
>
> The only fix to make here is to do with the maximum value allowed in a 
> bioentry qualifier. Find the statement that creates the table 
> BioEntry_Qualifier_Value and alter the definition for the VALUE column 
> so that it has a maximum size of 300.

Note that in the standard schema this is a VARCHAR2(4000) meanwhile.

>
> Installation
>
> Original schema
>
> Make sure you have set the $ORACLE_SID environment variable to the 
> correct database before running the scripts, as they 
> connect/disconnect several times and if it is not set, you may end up 
> running them against the wrong database.

Again, if the roles, user, and tablespace creation steps are commented 
out there should be no reconnecting. At least theoretically ...

>
> The installation requires the creation of three tablespaces - one for 
> data, one for indexes, one for LOB objects.

Again note that there is nothing that stops you from defining all three 
in BS-defs-local to the same tablespace (or two) which already exist. 
(If you define them to the same it should exist already as the 
tablespace creation script does assume that they are different.)

I kind of tried to write it such that you can do it 'complicated' if 
you want and simple if you don't ... maybe I should have pointed that 
out better.

> Decide where you will be keeping the database files for these, and 
> what you will call the tablespaces. Don't create them yet though, just 
> write down the names. As always it is good practice to keep the data 
> and index tablespaces on separate disks to prevent IO bottlenecks, but 
> you can probably safely put the data and LOB tablespaces on the same 
> disk.
>
> You will also need to decide on names for the two basic roles that 
> BioSQL uses - the base_user role which contains just enough privileges 
> to connect to the database, and the schema_creator role, which 
> contains the privileges required to create database objects in a 
> schema. Again, don't create them just yet.
>
> Now, copy BS-defs.sql to BS-defs-local.sql and edit it. You should 
> check every entry in it carefully, particularly the names and 
> locations of the tablespace files to be created, and the names of the 
> two roles you just decided on above. You will also choose names for 
> the various default BioSQL roles. biosql_owner is not a role but the 
> actual owner of the schema that will have the schema_creator role 
> granted to it, you'll need to define its password here too. 
> biosql_user is a role to be granted to people who need read-only 
> access to the BioSQL database, biosql_loader is a role designed for 
> batch upload processes, whilst biosql_admin has full read-write 
> permission on the schema.

I guess I need to update the comments here. I ended up never using the 
biosql_admin role but using the biosql_loader role instead as the r/w 
user. This is pretty much how permissions are granted.

So maybe do I need to include a sample BS-defs-local and BS-create-all 
with 'simplified' settings?

	-hilmar

>
> Once you have edited the BS-defs-local.sql script appropriately, you 
> need to create the two base roles of base_user and schema_creator 
> manually. Create them by running something similar to the following 
> script whilst logged in as sysdba, from inside the biosql-ora 
> directory:
> @BS-defs-local
>    create role &base_user;
>    grant
>    CREATE SESSION,
>    CREATE SYNONYM,
>    CREATE VIEW
>    to &base_user;
>    create role &schema_creator;
>    grant
>    CREATE PROCEDURE,
>    CREATE ROLE,
>    CREATE SEQUENCE,
>    CREATE SESSION,
>    CREATE SYNONYM,
>    CREATE TRIGGER,
>    CREATE TYPE,
>    CREATE VIEW,
>    CREATE TABLE
>    to &schema_creator
>    with admin option;
>
> If you want some basic users set up, edit the BS-create-users.sql 
> script to look at the sample users it will create for you 
> automatically. If you don't want them, or want different names etc., 
> comment them out or edit them.
>
> The final stage before actual installation is to edit the 
> BS-create-all.sql script to ensure that only the steps you require are 
> carried out. If you already have predefined tablespaces and don't want 
> it to create new ones, comment out the line that reads 
> @BS-create-tablespaces . Likewise if you don't want any default data 
> loaded into the database, comment out the line near the end that reads 
> @BS-prepopulate-db .
>
> Under section 8 of BS-create-all.sql you need to make sure the 
> following commands appear in the order below. If they appear in any 
> other order, you will not be able to create other users to access the 
> database later! The commands should read:
> @BS-create-roles
>    @BS-create-synonyms
>    @BS-create-Biosql-API2
>    @BS-create-Biosql-usersyns
>    @BS-grants
> (NOTE: The BS-create-Biosql-API2 script is an alternative to 
> BS-create-Biosql-API which works much better with BioJava. This is 
> because BioJava has no flexibility about column names in tables. The 
> API2 version of the script ensures that the column names are exactly 
> the same as what BioJava expects by using synonyms. But, no matter 
> which you run, everything will still work fine with BioPerl).
>
> Now, log in to the database as sysdba from inside the biosql-ora 
> directory. Create the BioSQL database by typing:
> @BS-create-all
> . You might want to spool the output to see what happens, but you'll 
> find that half of it doesn't appear in the spool file, because BioSQL 
> is using spool itself to generate dynamic scripts on the fly. If 
> you've done everything right, the only messages you should get are a 
> few Table or view does not exist style messages, referring to the 
> attempts by the script to drop old objects before recreating new ones.
>
> During installation you will be prompted for the sysdba username and 
> password several times. This is required to create tablespaces and 
> users.
>
> If something goes wrong, you can safely rerun the script without 
> dropping anything first as it will drop the database objects from the 
> previous attempt first. It will however leave behind the tablespaces, 
> users, and roles. You can always just drop the users and tablespaces 
> that have been created if it really messes up, and start again from 
> scratch.
>
> Now,  your database has been installed! The only remaining step is to 
> log in to each user who will be using BioSQL, and run the usersyns.sql 
> script that the installation generated for you in the biosql-ora 
> directory. This script creates the synonyms for the BioSQL objects and 
> allows the users to see them. This script should not have any errors 
> at all. If it does, edit it and check it closely for things like 
> misplaced linebreaks etc.
>
> Note that Oracle sometimes has issues with roles and does not 
> apparently grant them correctly. If this happens, you will need to 
> grant the appropriate roles to the individual users manually (see the 
> short create role script above) and rerun the usersyns.sql script. 
> Sometimes you will find they don't even have the appropriate 
> tablespace quotas on the three BioSQL tablespaces. You'll need to 
> grant these tablespace quotas using the alter user <bloggs> quota 
> unlimited on <tablespace> command.
>
> Simplified schema
>
> NOTE: You will have to do a global search-and-replace on this script 
> to replace the two tablespace names with the ones you will actually be 
> using. Check with your DBA. This version of the schema only has two 
> tablespaces - one for data, the other for indexes.
>
> This is much easier to set up than the Original schema. Simply log in 
> as the user you wish to install BioSQL as, ensure that your DBA has 
> granted that user the same rights as for the schema_creator role 
> described in the Original installation instructions above, then 
> execute the single script that defines the schema. You should have no 
> problems. You can spool the output to a file if you like to be able to 
> check the results.
>
> This schema is a one-user-only schema, where all users log in as the 
> schema owner and have full read/write access to the entire database. 
> This is the most important difference between this schema and the 
> Original .
>
> Testing
>
> Any BioJava script should work fine!
>
> THE END!
>
> Richard Holland, hollandr at gis dot a-star dot edu dot sg, December 
> 2004

On Sunday, April 17, 2005, at 06:07  PM, Richard HOLLAND wrote:

> The only issues I have are with the Oracle installation, which I came
> across whilst writing the Oracle BioSQL howto at
> http://www.biojava.org/docs/bj_in_anger/bj_and_bsql_oracle_howto.htm -
> the issues are mentioned in that article. If they have been resolved or
> are no longer relevant, then I'd consider it ready for release.
>
> However as part of the release I'd really appreciate a document
> describing exactly what is supposed to be stored in each column/table
> (just supposed to be - doesn't have to be the way any particular Bio*
> project actually does it). This would be very helpful in the efforts to
> unite the various Bio* projects and make them all use the same tables
> for the same things (which is not always the case at present).
>
> cheers,
> Richard
>
> Richard Holland
> Bioinformatics Specialist
> GIS extension 8199
> ---------------------------------------------
> This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its content to any
> other person. Thank you.
> ---------------------------------------------
>
>
>> -----Original Message-----
>> From: biosql-l-bounces@portal.open-bio.org
>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp
>> Sent: Sunday, April 17, 2005 4:31 AM
>> To: Biosql
>> Subject: [BioSQL-l] release preparation
>>
>>
>> I've issued this call earlier and I believe have implemented all
>> suggestions. To be sure, please let me know if you have any
>> issues with
>> the schema or instantiation or if you know of any that should be
>> addressed before releasing 1.0.
>>
>> Other than that Brian has updated the PostgreSQL generated ERD HTML
>> document so that everything should be up to date and ready to go.
>>
>> So please let me know and otherwise I'll target release for
>> the end of
>> this month.
>>
>> 	-hilmar
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l@open-bio.org
>> http://open-bio.org/mailman/listinfo/biosql-l
>>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hollandr at gis.a-star.edu.sg  Mon Apr 18 01:49:15 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Mon Apr 18 01:44:02 2005
Subject: [BioSQL-l] release preparation
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601950FC2@BIONIC.biopolis.one-north.com>

I will read schema-overview.txt and see what needs changing, if
anything. Do you have a deadline for the release that I should work
towards?

I don't see why the HowTo shouldn't be included. It went on the BioJava
site at the time as that seemed the logical home for it, but it is of
course equally at home on the BioSQL site.

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp@gmx.net] 
> Sent: Monday, April 18, 2005 1:39 PM
> To: Richard HOLLAND
> Cc: Biosql
> Subject: Re: [BioSQL-l] release preparation
> 
> 
> First off, before going through your HowTo document, as for the 
> description of which content is supposed to go where, have 
> you read the 
> doc/schema-overview.txt in the biosql repository? Could you list the 
> questions that that document leaves open? I'd rather expand that 
> document than writing another one from scratch; I thought Aaron did a 
> pretty good job towards your request, but certainly this can improved 
> or spiked with more details or whatever you find it could do 
> better on.
> 
> Now to the HowTo. BTW is there a reason this should not be 
> included in 
> the distribution?
> 
> > /BioJava and BioSQL/Oracle HOWTO
> >
> > What you'll need
> >
> > Bio*
> >
> > You'll need the latest version of BioJava to take advantage of the 
> > full functionality of BioSQL. This can be downloaded from 
> biojava.org 
> > . You'll also need the latest Oracle BioSQL schema. Here you have a 
> > choice of two options:
> > Original : by Hilmar Lapp, the original BioSQL schema takes full 
> > advantage of Oracle's security mechanisms and produces a 
> complex but 
> > high quality schema. You'll need sysdba access to your database to 
> > install it.
> 
> I'd appreciate if this could be straightened out a bit, as you really 
> do not need sysdba access if you're not going to create 
> tablespaces and 
> users, and not doing these steps is a simple matter of commenting out 
> the respective lines.
> 
> If you are though then having access to sysdba or access to 
> someone who 
> does (i.e., pair-programming with your DBA for this task) is kind of 
> unavoidable ...
> 
> Also, the distinction of a 'complex schema' coming out of the 
> original 
> and 'simplified structure' of Len's version sounds a bit too 
> misleading 
> for me, since the schema is no different between either 
> version; there 
> is no difference in number of tables or constraints or 
> whatever (or is 
> there?).
> 
> What simplified structure might refer to is that Len's version leaves 
> out the PL/SQL packages etc? Again, just as a note, this is 
> trivial to 
> disable in BS-create-all, just comment out the respective steps.
> 
> As another note, in most Oracle environments an installer 
> will not have 
> sysdba access nor will she be supposed to create tablespaces 
> or users; 
> the DBA will do it for her. In those environments, the scriptlet that 
> does this step will serve merely as an instructional template for the 
> DBA for what to create. I.e., in usual Oracle environments 
> tablespace, 
> user, and role creation will be commented out because the DBA 
> does them 
> (has done them already).
> 
> >  Go to cvs.open-bio.org , select the biosql project, and 
> navigate to 
> > and download the entire biosql-schema/sql/biosql-ora folder.
> > Simplifed : by Len Trigg, this version is simplified in 
> structure and 
> > sits entirely inside a single user account, requiring no 
> sysdba access 
> > to install. You'll have to ask for a copy of the script from the 
> > biosql-l mailing lists.
> > Both options are fully functional and compatible with both 
> BioJava and 
> > BioPerl.
> >
> > Oracle
> >
> > Obviously, you'll need an Oracle database. For the Original schema, 
> > you'll also need sysdba access, or get your DBA to help you 
> if you do 
> > not have this yourself.
> > For the Simplified schema you just need your own login to 
> Oracle, and 
> > the permissions to create tables. You'll also need to know the 
> > tablespace name to use, ask your DBA.
> >
> > Bugfixing
> >
> > NOTE: Some of these fixes may already have been made by the 
> time you 
> > read this, so be careful and check they have not already been done!
> >
> > Original schema
> >
> > Before you do anything else, you'll need to ensure that all the 
> > scripts in the folder refer to the correct local settings 
> file. This 
> > is not always the case, so be careful. The best thing to do is a 
> > global search on all the files you downloaded, and replace all 
> > references to BS-defs with BS-defs-local .
> 
> I've done this a while ago and think there's no instances left where 
> this hasn't been changed. Please check.
> 
> >  Of course, don't do this in BS-defs.sql itself.
> >
> > Now you'll need to find the CREATE TABLE SG_Biosequence 
> statement in 
> > BS-DDL.sql . You'll notice there is a constraint there called 
> > Alphabet4 . The values in the constraint ( dna ,protein 
> etc.) are all 
> > in lower case. BioJava uses upper case values for these fields, but 
> > BioPerl uses lower case! To make it work with BioJava, 
> you'll have to 
> > modify the constraint line so that it reads like this:
> > CONSTRAINT Alphabet4
> >       CHECK (lower(Alphabet) IN ('dna', 'protein', 'protein-term', 
> > 'rna')),
> 
> I've changed this but by enumerating all allowed terms so case-mixing 
> within a term isn't allowed. I haven't included 'protein-term' yet; 
> what is this? Is it necessary? What does it denote?
> 
> >
> > This of course will make BioJava work, but will stop BioPerl from 
> > being able to retrieve records correctly as it will not 
> recognise the 
> > upper-case versions of these values. One day hopefuly the 
> two projects 
> > will come up with a resolution to this issue.
> 
> I've changed this in bioperl-db so that a retrieved alphabet term is 
> converted to lower case. (This doesn't make Biojava work with 
> Bioperl-db-inserted data yet though :-)
> 
> >
> > In BS-create-Biosql-usersyns.sql you need to add another 
> command under 
> > the list of set commands at the top. This command should read:
> > set lines 200
> 
> Fixed, thanks for reporting.
> 
> > What this does is to temporarily increase the maximum length of am 
> > output line in Oracle, whilst it is creating the 
> usersyns.sql script. 
> > If you do not do this, the generated script will contain linebreaks 
> > midway through names of tables, which will cause the script to fail.
> >
> > Last of all, unless this has already been fixed in the CVS 
> versions of 
> > BioSQL by the time you read this, there is a section at the end of 
> > BS-grants.sql which grants permissions to the various 
> BioSQL users to 
> > see the SG_User table. The statement currently reads like this:
> > --
> >    -- Biosql grants for SG_USER: needs select on all views 
> and synonyms
> >    -- that don't follow the SG% convention.
> >    --
> >    SELECT 'GRANT SELECT ON ' || object_name || ' TO &biosql_user;'
> >    FROM user_objects
> >    WHERE object_name NOT LIKE 'SG_%'
> >    AND   object_name NOT LIKE '%$%'
> >    AND   object_name NOT LIKE '%_PK_SEQ'
> >    AND   object_type IN ('VIEW','SYNONYM')
> >    ;
> > You need to comment out the line that reads AND   
> object_name NOT LIKE 
> > '%_PK_SEQ' by putting two dashes ( -- ) before it. This allows the 
> > users to see the sequence required to allow them to generate new 
> > records in the database.
> 
> Note that the original statement is correct because SG_USER (or 
> whatever you define biosql_user to be) is supposed to be 
> read-only and 
> should never generate new records in the database. SG_LOADER, or 
> whatever you set biosql_loader to be, is for r/w access and 
> should get 
> proper permissions to the sequences.
> 
> Of course you are free to dispose of the distinction between a 
> read-only and a r/w user for your instance, but I don't think that 
> should be the default ... BTW there is nothing that stops you from 
> defining biosql_user and biosql_loader to the exact same user to 
> achieve this very effect.
> 
> Let me know if I'm missing something here ...
> 
> >
> > Simplified schema
> >
> > The only fix to make here is to do with the maximum value 
> allowed in a 
> > bioentry qualifier. Find the statement that creates the table 
> > BioEntry_Qualifier_Value and alter the definition for the 
> VALUE column 
> > so that it has a maximum size of 300.
> 
> Note that in the standard schema this is a VARCHAR2(4000) meanwhile.
> 
> >
> > Installation
> >
> > Original schema
> >
> > Make sure you have set the $ORACLE_SID environment variable to the 
> > correct database before running the scripts, as they 
> > connect/disconnect several times and if it is not set, you 
> may end up 
> > running them against the wrong database.
> 
> Again, if the roles, user, and tablespace creation steps are 
> commented 
> out there should be no reconnecting. At least theoretically ...
> 
> >
> > The installation requires the creation of three tablespaces 
> - one for 
> > data, one for indexes, one for LOB objects.
> 
> Again note that there is nothing that stops you from defining 
> all three 
> in BS-defs-local to the same tablespace (or two) which already exist. 
> (If you define them to the same it should exist already as the 
> tablespace creation script does assume that they are different.)
> 
> I kind of tried to write it such that you can do it 'complicated' if 
> you want and simple if you don't ... maybe I should have pointed that 
> out better.
> 
> > Decide where you will be keeping the database files for these, and 
> > what you will call the tablespaces. Don't create them yet 
> though, just 
> > write down the names. As always it is good practice to keep 
> the data 
> > and index tablespaces on separate disks to prevent IO 
> bottlenecks, but 
> > you can probably safely put the data and LOB tablespaces on 
> the same 
> > disk.
> >
> > You will also need to decide on names for the two basic roles that 
> > BioSQL uses - the base_user role which contains just enough 
> privileges 
> > to connect to the database, and the schema_creator role, which 
> > contains the privileges required to create database objects in a 
> > schema. Again, don't create them just yet.
> >
> > Now, copy BS-defs.sql to BS-defs-local.sql and edit it. You should 
> > check every entry in it carefully, particularly the names and 
> > locations of the tablespace files to be created, and the 
> names of the 
> > two roles you just decided on above. You will also choose names for 
> > the various default BioSQL roles. biosql_owner is not a 
> role but the 
> > actual owner of the schema that will have the schema_creator role 
> > granted to it, you'll need to define its password here too. 
> > biosql_user is a role to be granted to people who need read-only 
> > access to the BioSQL database, biosql_loader is a role designed for 
> > batch upload processes, whilst biosql_admin has full read-write 
> > permission on the schema.
> 
> I guess I need to update the comments here. I ended up never 
> using the 
> biosql_admin role but using the biosql_loader role instead as the r/w 
> user. This is pretty much how permissions are granted.
> 
> So maybe do I need to include a sample BS-defs-local and 
> BS-create-all 
> with 'simplified' settings?
> 
> 	-hilmar
> 
> >
> > Once you have edited the BS-defs-local.sql script 
> appropriately, you 
> > need to create the two base roles of base_user and schema_creator 
> > manually. Create them by running something similar to the following 
> > script whilst logged in as sysdba, from inside the biosql-ora 
> > directory:
> > @BS-defs-local
> >    create role &base_user;
> >    grant
> >    CREATE SESSION,
> >    CREATE SYNONYM,
> >    CREATE VIEW
> >    to &base_user;
> >    create role &schema_creator;
> >    grant
> >    CREATE PROCEDURE,
> >    CREATE ROLE,
> >    CREATE SEQUENCE,
> >    CREATE SESSION,
> >    CREATE SYNONYM,
> >    CREATE TRIGGER,
> >    CREATE TYPE,
> >    CREATE VIEW,
> >    CREATE TABLE
> >    to &schema_creator
> >    with admin option;
> >
> > If you want some basic users set up, edit the BS-create-users.sql 
> > script to look at the sample users it will create for you 
> > automatically. If you don't want them, or want different 
> names etc., 
> > comment them out or edit them.
> >
> > The final stage before actual installation is to edit the 
> > BS-create-all.sql script to ensure that only the steps you 
> require are 
> > carried out. If you already have predefined tablespaces and 
> don't want 
> > it to create new ones, comment out the line that reads 
> > @BS-create-tablespaces . Likewise if you don't want any 
> default data 
> > loaded into the database, comment out the line near the end 
> that reads 
> > @BS-prepopulate-db .
> >
> > Under section 8 of BS-create-all.sql you need to make sure the 
> > following commands appear in the order below. If they appear in any 
> > other order, you will not be able to create other users to 
> access the 
> > database later! The commands should read:
> > @BS-create-roles
> >    @BS-create-synonyms
> >    @BS-create-Biosql-API2
> >    @BS-create-Biosql-usersyns
> >    @BS-grants
> > (NOTE: The BS-create-Biosql-API2 script is an alternative to 
> > BS-create-Biosql-API which works much better with BioJava. This is 
> > because BioJava has no flexibility about column names in 
> tables. The 
> > API2 version of the script ensures that the column names 
> are exactly 
> > the same as what BioJava expects by using synonyms. But, no matter 
> > which you run, everything will still work fine with BioPerl).
> >
> > Now, log in to the database as sysdba from inside the biosql-ora 
> > directory. Create the BioSQL database by typing:
> > @BS-create-all
> > . You might want to spool the output to see what happens, 
> but you'll 
> > find that half of it doesn't appear in the spool file, 
> because BioSQL 
> > is using spool itself to generate dynamic scripts on the fly. If 
> > you've done everything right, the only messages you should 
> get are a 
> > few Table or view does not exist style messages, referring to the 
> > attempts by the script to drop old objects before 
> recreating new ones.
> >
> > During installation you will be prompted for the sysdba 
> username and 
> > password several times. This is required to create tablespaces and 
> > users.
> >
> > If something goes wrong, you can safely rerun the script without 
> > dropping anything first as it will drop the database 
> objects from the 
> > previous attempt first. It will however leave behind the 
> tablespaces, 
> > users, and roles. You can always just drop the users and 
> tablespaces 
> > that have been created if it really messes up, and start again from 
> > scratch.
> >
> > Now,  your database has been installed! The only remaining 
> step is to 
> > log in to each user who will be using BioSQL, and run the 
> usersyns.sql 
> > script that the installation generated for you in the biosql-ora 
> > directory. This script creates the synonyms for the BioSQL 
> objects and 
> > allows the users to see them. This script should not have 
> any errors 
> > at all. If it does, edit it and check it closely for things like 
> > misplaced linebreaks etc.
> >
> > Note that Oracle sometimes has issues with roles and does not 
> > apparently grant them correctly. If this happens, you will need to 
> > grant the appropriate roles to the individual users 
> manually (see the 
> > short create role script above) and rerun the usersyns.sql script. 
> > Sometimes you will find they don't even have the appropriate 
> > tablespace quotas on the three BioSQL tablespaces. You'll need to 
> > grant these tablespace quotas using the alter user <bloggs> quota 
> > unlimited on <tablespace> command.
> >
> > Simplified schema
> >
> > NOTE: You will have to do a global search-and-replace on 
> this script 
> > to replace the two tablespace names with the ones you will 
> actually be 
> > using. Check with your DBA. This version of the schema only has two 
> > tablespaces - one for data, the other for indexes.
> >
> > This is much easier to set up than the Original schema. 
> Simply log in 
> > as the user you wish to install BioSQL as, ensure that your DBA has 
> > granted that user the same rights as for the schema_creator role 
> > described in the Original installation instructions above, then 
> > execute the single script that defines the schema. You 
> should have no 
> > problems. You can spool the output to a file if you like to 
> be able to 
> > check the results.
> >
> > This schema is a one-user-only schema, where all users log 
> in as the 
> > schema owner and have full read/write access to the entire 
> database. 
> > This is the most important difference between this schema and the 
> > Original .
> >
> > Testing
> >
> > Any BioJava script should work fine!
> >
> > THE END!
> >
> > Richard Holland, hollandr at gis dot a-star dot edu dot sg, 
> December 
> > 2004
> 
> On Sunday, April 17, 2005, at 06:07  PM, Richard HOLLAND wrote:
> 
> > The only issues I have are with the Oracle installation, 
> which I came
> > across whilst writing the Oracle BioSQL howto at
> > 
> http://www.biojava.org/docs/bj_in_anger/bj_and_bsql_oracle_howto.htm -
> > the issues are mentioned in that article. If they have been 
> resolved or
> > are no longer relevant, then I'd consider it ready for release.
> >
> > However as part of the release I'd really appreciate a document
> > describing exactly what is supposed to be stored in each 
> column/table
> > (just supposed to be - doesn't have to be the way any 
> particular Bio*
> > project actually does it). This would be very helpful in 
> the efforts to
> > unite the various Bio* projects and make them all use the 
> same tables
> > for the same things (which is not always the case at present).
> >
> > cheers,
> > Richard
> >
> > Richard Holland
> > Bioinformatics Specialist
> > GIS extension 8199
> > ---------------------------------------------
> > This email is confidential and may be privileged. If you are not the
> > intended recipient, please delete it and notify us 
> immediately. Please
> > do not copy or use it for any purpose, or disclose its 
> content to any
> > other person. Thank you.
> > ---------------------------------------------
> >
> >
> >> -----Original Message-----
> >> From: biosql-l-bounces@portal.open-bio.org
> >> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
> Hilmar Lapp
> >> Sent: Sunday, April 17, 2005 4:31 AM
> >> To: Biosql
> >> Subject: [BioSQL-l] release preparation
> >>
> >>
> >> I've issued this call earlier and I believe have implemented all
> >> suggestions. To be sure, please let me know if you have any
> >> issues with
> >> the schema or instantiation or if you know of any that should be
> >> addressed before releasing 1.0.
> >>
> >> Other than that Brian has updated the PostgreSQL generated ERD HTML
> >> document so that everything should be up to date and ready to go.
> >>
> >> So please let me know and otherwise I'll target release for
> >> the end of
> >> this month.
> >>
> >> 	-hilmar
> >> -- 
> >> -------------------------------------------------------------
> >> Hilmar Lapp                            email: lapp at gnf.org
> >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> >> -------------------------------------------------------------
> >>
> >> _______________________________________________
> >> BioSQL-l mailing list
> >> BioSQL-l@open-bio.org
> >> http://open-bio.org/mailman/listinfo/biosql-l
> >>
> >
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> 

From hlapp at gmx.net  Mon Apr 18 02:00:35 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon Apr 18 01:53:58 2005
Subject: [BioSQL-l] release preparation
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601950FC2@BIONIC.biopolis.one-north.com>
Message-ID: <2E2D45D2-AFCF-11D9-9FB4-000A959EB4C4@gmx.net>


On Sunday, April 17, 2005, at 10:49  PM, Richard HOLLAND wrote:

> I will read schema-overview.txt and see what needs changing, if
> anything. Do you have a deadline for the release that I should work
> towards?

I'm aiming for the end of this month. Depending on your conclusions we 
will need more or less time to include more details; if it's more than 
a few sentences I'll need some time in advance though unless somebody 
(maybe Aaron?) can share the work.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From mark.schreiber at novartis.com  Mon Apr 18 02:01:53 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Apr 18 01:55:25 2005
Subject: [BioSQL-l] release preparation
Message-ID: <OF01B2C049.D8883630-ON48256FE7.00202786-48256FE7.002121DB@EU.novartis.net>

Hello Hilmar,

As this document is on a site I maintain I should have updated it before 
now, my bad! Agreed that from a SQL query perspective the schemas are the 
same, one just has more complexity (if I can call it that) under the hood.

I would prefer to keep instructions for the less complex version up for 
the time being as we are having difficulties getting biojava to work 
seamlessly with the more complex version. This is almost certainly a 
failing of biojava for which the oracle support seems to have been 
compiled against the 'simple' schema not the 'complex schema'.

I expect we will soon have biojava supporting your version and we can drop 
the 'simple' schema. After all, there is not much point using oracle if 
you don't make use of the features.

- Mark


Hilmar Lapp <hlapp@gmx.net>
Sent by: biosql-l-bounces@portal.open-bio.org
04/18/2005 01:38 PM

 
        To:     "Richard HOLLAND" <hollandr@gis.a-star.edu.sg>
        cc:     Biosql <biosql-l@open-bio.org>, (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [BioSQL-l] release preparation


First off, before going through your HowTo document, as for the 
description of which content is supposed to go where, have you read the 
doc/schema-overview.txt in the biosql repository? Could you list the 
questions that that document leaves open? I'd rather expand that 
document than writing another one from scratch; I thought Aaron did a 
pretty good job towards your request, but certainly this can improved 
or spiked with more details or whatever you find it could do better on.

Now to the HowTo. BTW is there a reason this should not be included in 
the distribution?

> /BioJava and BioSQL/Oracle HOWTO
>
> What you'll need
>
> Bio*
>
> You'll need the latest version of BioJava to take advantage of the 
> full functionality of BioSQL. This can be downloaded from biojava.org 
> . You'll also need the latest Oracle BioSQL schema. Here you have a 
> choice of two options:
> Original : by Hilmar Lapp, the original BioSQL schema takes full 
> advantage of Oracle's security mechanisms and produces a complex but 
> high quality schema. You'll need sysdba access to your database to 
> install it.

I'd appreciate if this could be straightened out a bit, as you really 
do not need sysdba access if you're not going to create tablespaces and 
users, and not doing these steps is a simple matter of commenting out 
the respective lines.

If you are though then having access to sysdba or access to someone who 
does (i.e., pair-programming with your DBA for this task) is kind of 
unavoidable ...

Also, the distinction of a 'complex schema' coming out of the original 
and 'simplified structure' of Len's version sounds a bit too misleading 
for me, since the schema is no different between either version; there 
is no difference in number of tables or constraints or whatever (or is 
there?).

What simplified structure might refer to is that Len's version leaves 
out the PL/SQL packages etc? Again, just as a note, this is trivial to 
disable in BS-create-all, just comment out the respective steps.

As another note, in most Oracle environments an installer will not have 
sysdba access nor will she be supposed to create tablespaces or users; 
the DBA will do it for her. In those environments, the scriptlet that 
does this step will serve merely as an instructional template for the 
DBA for what to create. I.e., in usual Oracle environments tablespace, 
user, and role creation will be commented out because the DBA does them 
(has done them already).

>  Go to cvs.open-bio.org , select the biosql project, and navigate to 
> and download the entire biosql-schema/sql/biosql-ora folder.
> Simplifed : by Len Trigg, this version is simplified in structure and 
> sits entirely inside a single user account, requiring no sysdba access 
> to install. You'll have to ask for a copy of the script from the 
> biosql-l mailing lists.
> Both options are fully functional and compatible with both BioJava and 
> BioPerl.
>
> Oracle
>
> Obviously, you'll need an Oracle database. For the Original schema, 
> you'll also need sysdba access, or get your DBA to help you if you do 
> not have this yourself.
> For the Simplified schema you just need your own login to Oracle, and 
> the permissions to create tables. You'll also need to know the 
> tablespace name to use, ask your DBA.
>
> Bugfixing
>
> NOTE: Some of these fixes may already have been made by the time you 
> read this, so be careful and check they have not already been done!
>
> Original schema
>
> Before you do anything else, you'll need to ensure that all the 
> scripts in the folder refer to the correct local settings file. This 
> is not always the case, so be careful. The best thing to do is a 
> global search on all the files you downloaded, and replace all 
> references to BS-defs with BS-defs-local .

I've done this a while ago and think there's no instances left where 
this hasn't been changed. Please check.

>  Of course, don't do this in BS-defs.sql itself.
>
> Now you'll need to find the CREATE TABLE SG_Biosequence statement in 
> BS-DDL.sql . You'll notice there is a constraint there called 
> Alphabet4 . The values in the constraint ( dna ,protein etc.) are all 
> in lower case. BioJava uses upper case values for these fields, but 
> BioPerl uses lower case! To make it work with BioJava, you'll have to 
> modify the constraint line so that it reads like this:
> CONSTRAINT Alphabet4
>       CHECK (lower(Alphabet) IN ('dna', 'protein', 'protein-term', 
> 'rna')),

I've changed this but by enumerating all allowed terms so case-mixing 
within a term isn't allowed. I haven't included 'protein-term' yet; 
what is this? Is it necessary? What does it denote?

>
> This of course will make BioJava work, but will stop BioPerl from 
> being able to retrieve records correctly as it will not recognise the 
> upper-case versions of these values. One day hopefuly the two projects 
> will come up with a resolution to this issue.

I've changed this in bioperl-db so that a retrieved alphabet term is 
converted to lower case. (This doesn't make Biojava work with 
Bioperl-db-inserted data yet though :-)

>
> In BS-create-Biosql-usersyns.sql you need to add another command under 
> the list of set commands at the top. This command should read:
> set lines 200

Fixed, thanks for reporting.

> What this does is to temporarily increase the maximum length of am 
> output line in Oracle, whilst it is creating the usersyns.sql script. 
> If you do not do this, the generated script will contain linebreaks 
> midway through names of tables, which will cause the script to fail.
>
> Last of all, unless this has already been fixed in the CVS versions of 
> BioSQL by the time you read this, there is a section at the end of 
> BS-grants.sql which grants permissions to the various BioSQL users to 
> see the SG_User table. The statement currently reads like this:
> --
>    -- Biosql grants for SG_USER: needs select on all views and synonyms
>    -- that don't follow the SG% convention.
>    --
>    SELECT 'GRANT SELECT ON ' || object_name || ' TO &biosql_user;'
>    FROM user_objects
>    WHERE object_name NOT LIKE 'SG_%'
>    AND   object_name NOT LIKE '%$%'
>    AND   object_name NOT LIKE '%_PK_SEQ'
>    AND   object_type IN ('VIEW','SYNONYM')
>    ;
> You need to comment out the line that reads AND   object_name NOT LIKE 
> '%_PK_SEQ' by putting two dashes ( -- ) before it. This allows the 
> users to see the sequence required to allow them to generate new 
> records in the database.

Note that the original statement is correct because SG_USER (or 
whatever you define biosql_user to be) is supposed to be read-only and 
should never generate new records in the database. SG_LOADER, or 
whatever you set biosql_loader to be, is for r/w access and should get 
proper permissions to the sequences.

Of course you are free to dispose of the distinction between a 
read-only and a r/w user for your instance, but I don't think that 
should be the default ... BTW there is nothing that stops you from 
defining biosql_user and biosql_loader to the exact same user to 
achieve this very effect.

Let me know if I'm missing something here ...

>
> Simplified schema
>
> The only fix to make here is to do with the maximum value allowed in a 
> bioentry qualifier. Find the statement that creates the table 
> BioEntry_Qualifier_Value and alter the definition for the VALUE column 
> so that it has a maximum size of 300.

Note that in the standard schema this is a VARCHAR2(4000) meanwhile.

>
> Installation
>
> Original schema
>
> Make sure you have set the $ORACLE_SID environment variable to the 
> correct database before running the scripts, as they 
> connect/disconnect several times and if it is not set, you may end up 
> running them against the wrong database.

Again, if the roles, user, and tablespace creation steps are commented 
out there should be no reconnecting. At least theoretically ...

>
> The installation requires the creation of three tablespaces - one for 
> data, one for indexes, one for LOB objects.

Again note that there is nothing that stops you from defining all three 
in BS-defs-local to the same tablespace (or two) which already exist. 
(If you define them to the same it should exist already as the 
tablespace creation script does assume that they are different.)

I kind of tried to write it such that you can do it 'complicated' if 
you want and simple if you don't ... maybe I should have pointed that 
out better.

> Decide where you will be keeping the database files for these, and 
> what you will call the tablespaces. Don't create them yet though, just 
> write down the names. As always it is good practice to keep the data 
> and index tablespaces on separate disks to prevent IO bottlenecks, but 
> you can probably safely put the data and LOB tablespaces on the same 
> disk.
>
> You will also need to decide on names for the two basic roles that 
> BioSQL uses - the base_user role which contains just enough privileges 
> to connect to the database, and the schema_creator role, which 
> contains the privileges required to create database objects in a 
> schema. Again, don't create them just yet.
>
> Now, copy BS-defs.sql to BS-defs-local.sql and edit it. You should 
> check every entry in it carefully, particularly the names and 
> locations of the tablespace files to be created, and the names of the 
> two roles you just decided on above. You will also choose names for 
> the various default BioSQL roles. biosql_owner is not a role but the 
> actual owner of the schema that will have the schema_creator role 
> granted to it, you'll need to define its password here too. 
> biosql_user is a role to be granted to people who need read-only 
> access to the BioSQL database, biosql_loader is a role designed for 
> batch upload processes, whilst biosql_admin has full read-write 
> permission on the schema.

I guess I need to update the comments here. I ended up never using the 
biosql_admin role but using the biosql_loader role instead as the r/w 
user. This is pretty much how permissions are granted.

So maybe do I need to include a sample BS-defs-local and BS-create-all 
with 'simplified' settings?

                 -hilmar

>
> Once you have edited the BS-defs-local.sql script appropriately, you 
> need to create the two base roles of base_user and schema_creator 
> manually. Create them by running something similar to the following 
> script whilst logged in as sysdba, from inside the biosql-ora 
> directory:
> @BS-defs-local
>    create role &base_user;
>    grant
>    CREATE SESSION,
>    CREATE SYNONYM,
>    CREATE VIEW
>    to &base_user;
>    create role &schema_creator;
>    grant
>    CREATE PROCEDURE,
>    CREATE ROLE,
>    CREATE SEQUENCE,
>    CREATE SESSION,
>    CREATE SYNONYM,
>    CREATE TRIGGER,
>    CREATE TYPE,
>    CREATE VIEW,
>    CREATE TABLE
>    to &schema_creator
>    with admin option;
>
> If you want some basic users set up, edit the BS-create-users.sql 
> script to look at the sample users it will create for you 
> automatically. If you don't want them, or want different names etc., 
> comment them out or edit them.
>
> The final stage before actual installation is to edit the 
> BS-create-all.sql script to ensure that only the steps you require are 
> carried out. If you already have predefined tablespaces and don't want 
> it to create new ones, comment out the line that reads 
> @BS-create-tablespaces . Likewise if you don't want any default data 
> loaded into the database, comment out the line near the end that reads 
> @BS-prepopulate-db .
>
> Under section 8 of BS-create-all.sql you need to make sure the 
> following commands appear in the order below. If they appear in any 
> other order, you will not be able to create other users to access the 
> database later! The commands should read:
> @BS-create-roles
>    @BS-create-synonyms
>    @BS-create-Biosql-API2
>    @BS-create-Biosql-usersyns
>    @BS-grants
> (NOTE: The BS-create-Biosql-API2 script is an alternative to 
> BS-create-Biosql-API which works much better with BioJava. This is 
> because BioJava has no flexibility about column names in tables. The 
> API2 version of the script ensures that the column names are exactly 
> the same as what BioJava expects by using synonyms. But, no matter 
> which you run, everything will still work fine with BioPerl).
>
> Now, log in to the database as sysdba from inside the biosql-ora 
> directory. Create the BioSQL database by typing:
> @BS-create-all
> . You might want to spool the output to see what happens, but you'll 
> find that half of it doesn't appear in the spool file, because BioSQL 
> is using spool itself to generate dynamic scripts on the fly. If 
> you've done everything right, the only messages you should get are a 
> few Table or view does not exist style messages, referring to the 
> attempts by the script to drop old objects before recreating new ones.
>
> During installation you will be prompted for the sysdba username and 
> password several times. This is required to create tablespaces and 
> users.
>
> If something goes wrong, you can safely rerun the script without 
> dropping anything first as it will drop the database objects from the 
> previous attempt first. It will however leave behind the tablespaces, 
> users, and roles. You can always just drop the users and tablespaces 
> that have been created if it really messes up, and start again from 
> scratch.
>
> Now,  your database has been installed! The only remaining step is to 
> log in to each user who will be using BioSQL, and run the usersyns.sql 
> script that the installation generated for you in the biosql-ora 
> directory. This script creates the synonyms for the BioSQL objects and 
> allows the users to see them. This script should not have any errors 
> at all. If it does, edit it and check it closely for things like 
> misplaced linebreaks etc.
>
> Note that Oracle sometimes has issues with roles and does not 
> apparently grant them correctly. If this happens, you will need to 
> grant the appropriate roles to the individual users manually (see the 
> short create role script above) and rerun the usersyns.sql script. 
> Sometimes you will find they don't even have the appropriate 
> tablespace quotas on the three BioSQL tablespaces. You'll need to 
> grant these tablespace quotas using the alter user <bloggs> quota 
> unlimited on <tablespace> command.
>
> Simplified schema
>
> NOTE: You will have to do a global search-and-replace on this script 
> to replace the two tablespace names with the ones you will actually be 
> using. Check with your DBA. This version of the schema only has two 
> tablespaces - one for data, the other for indexes.
>
> This is much easier to set up than the Original schema. Simply log in 
> as the user you wish to install BioSQL as, ensure that your DBA has 
> granted that user the same rights as for the schema_creator role 
> described in the Original installation instructions above, then 
> execute the single script that defines the schema. You should have no 
> problems. You can spool the output to a file if you like to be able to 
> check the results.
>
> This schema is a one-user-only schema, where all users log in as the 
> schema owner and have full read/write access to the entire database. 
> This is the most important difference between this schema and the 
> Original .
>
> Testing
>
> Any BioJava script should work fine!
>
> THE END!
>
> Richard Holland, hollandr at gis dot a-star dot edu dot sg, December 
> 2004

On Sunday, April 17, 2005, at 06:07  PM, Richard HOLLAND wrote:

> The only issues I have are with the Oracle installation, which I came
> across whilst writing the Oracle BioSQL howto at
> http://www.biojava.org/docs/bj_in_anger/bj_and_bsql_oracle_howto.htm -
> the issues are mentioned in that article. If they have been resolved or
> are no longer relevant, then I'd consider it ready for release.
>
> However as part of the release I'd really appreciate a document
> describing exactly what is supposed to be stored in each column/table
> (just supposed to be - doesn't have to be the way any particular Bio*
> project actually does it). This would be very helpful in the efforts to
> unite the various Bio* projects and make them all use the same tables
> for the same things (which is not always the case at present).
>
> cheers,
> Richard
>
> Richard Holland
> Bioinformatics Specialist
> GIS extension 8199
> ---------------------------------------------
> This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its content to any
> other person. Thank you.
> ---------------------------------------------
>
>
>> -----Original Message-----
>> From: biosql-l-bounces@portal.open-bio.org
>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp
>> Sent: Sunday, April 17, 2005 4:31 AM
>> To: Biosql
>> Subject: [BioSQL-l] release preparation
>>
>>
>> I've issued this call earlier and I believe have implemented all
>> suggestions. To be sure, please let me know if you have any
>> issues with
>> the schema or instantiation or if you know of any that should be
>> addressed before releasing 1.0.
>>
>> Other than that Brian has updated the PostgreSQL generated ERD HTML
>> document so that everything should be up to date and ready to go.
>>
>> So please let me know and otherwise I'll target release for
>> the end of
>> this month.
>>
>>               -hilmar
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l@open-bio.org
>> http://open-bio.org/mailman/listinfo/biosql-l
>>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


_______________________________________________
BioSQL-l mailing list
BioSQL-l@open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l


From hlapp at gmx.net  Mon Apr 18 02:04:13 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon Apr 18 01:57:33 2005
Subject: [BioSQL-l] release preparation
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601950FC2@BIONIC.biopolis.one-north.com>
Message-ID: <B0035DF8-AFCF-11D9-9FB4-000A959EB4C4@gmx.net>


On Sunday, April 17, 2005, at 10:49  PM, Richard HOLLAND wrote:

> I don't see why the HowTo shouldn't be included. It went on the BioJava
> site at the time as that seemed the logical home for it, but it is of
> course equally at home on the BioSQL site.

OK. Would you want to make modifications according to my comments, or 
would you rather that I do this myself? I'd rather have you do it 
because I haven't used Biojava with Biosql yet and obviously haven't 
struggled with any of the things you have, so I'm lacking the 
'end-user' viewpoint or experience.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hollandr at gis.a-star.edu.sg  Mon Apr 18 02:10:32 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Mon Apr 18 02:05:54 2005
Subject: [BioSQL-l] release preparation
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601950FCA@BIONIC.biopolis.one-north.com>

I'll make the changes and send the updated version to Mark so that he
can update the BioJava in Anger website. From there you can take a copy
in whatever format you need. I'll let you know when it is done.


Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp@gmx.net] 
> Sent: Monday, April 18, 2005 2:04 PM
> To: Richard HOLLAND
> Cc: Biosql
> Subject: Re: [BioSQL-l] release preparation
> 
> 
> 
> On Sunday, April 17, 2005, at 10:49  PM, Richard HOLLAND wrote:
> 
> > I don't see why the HowTo shouldn't be included. It went on 
> the BioJava
> > site at the time as that seemed the logical home for it, 
> but it is of
> > course equally at home on the BioSQL site.
> 
> OK. Would you want to make modifications according to my comments, or 
> would you rather that I do this myself? I'd rather have you do it 
> because I haven't used Biojava with Biosql yet and obviously haven't 
> struggled with any of the things you have, so I'm lacking the 
> 'end-user' viewpoint or experience.
> 
> 	-hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> 

From len at reeltwo.com  Mon Apr 18 04:58:17 2005
From: len at reeltwo.com (Len Trigg)
Date: Mon Apr 18 04:52:37 2005
Subject: [BioSQL-l] release preparation
In-Reply-To: <OF01B2C049.D8883630-ON48256FE7.00202786-48256FE7.002121DB@EU.novartis.net>
References: <OF01B2C049.D8883630-ON48256FE7.00202786-48256FE7.002121DB@EU.novartis.net>
Message-ID: <hbsm1ofoly.wl%len@reeltwo.com>


Mark Schreiber wrote:
> now, my bad! Agreed that from a SQL query perspective the schemas are the 
> same, one just has more complexity (if I can call it that) under the hood.

Indeed, the complexity is more to do with the complexity of installing
and understanding what's going on in all those files :-) (particularly
if you are not an oracle expert and have only been looking at the
BioSQL schemas for the other supported databases), and that's why I
did the simple version.  That's partly confirmed by the fact that the
bjia description of how to use the original schema is about 8KB, while
the description for the simple schema is about 1KB.  I'm all for
dumping the simple one if the barrier for entry for the original
schema is lowered (maybe it already has been).


> I would prefer to keep instructions for the less complex version up for 
> the time being as we are having difficulties getting biojava to work 
> seamlessly with the more complex version. This is almost certainly a 
> failing of biojava for which the oracle support seems to have been 
> compiled against the 'simple' schema not the 'complex schema'.

It certainly was only tested against the simple version, because
that's the only schema I had working when I wrote the Oracle support.
I am a little surprised that you are having major difficulties though,
since the original package has a compatibility layer that (supposedly)
presents the same schema as the simple version.


> I expect we will soon have biojava supporting your version and we can drop 
> the 'simple' schema. After all, there is not much point using oracle if 
> you don't make use of the features.

In my case, it was a matter of using Oracle because that was what was
already installed :-)


Cheers,
Len.

From mark.schreiber at novartis.com  Mon Apr 18 05:08:23 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Apr 18 05:06:59 2005
Subject: [BioSQL-l] release preparation
Message-ID: <OF441E5E88.F69F4458-ON48256FE7.00317673-48256FE7.003234E0@EU.novartis.net>

We tracked down the error to the fact that the Hilmar Oracle version (no 
assertion is made about the complexity) uses CLOBs to store sequence while 
the Len Oracle (no assertion about simplicity etc) version uses LONGs to 
store sequence. The biojava support code seems to assume LONGs and 
strangely until very recently  the JDBC oracle dirver seems to let you 
write LONGs to CLOBs although the data that comes out again is completely 
munged.

It would be possible to modify the biojava adapters to check for LONG or 
CLOB and behaive appropriately but this would cause lots of maintenance 
problems later.

Given this situtation, unless some one complains very loudly, the biojava 
oracle adapters will be changed to assume CLOBs. Note: This means if you 
are using biojava and oracle and biosql now then it will break unless you 
adopts Hilmar's version. It will not cause any changes to biojava users of 
MySQL etc.

- Mark


Len Trigg <len@reeltwo.com>
04/18/2005 04:58 PM

 
        To:     Mark Schreiber/GP/Novartis@PH
        cc:     Hilmar Lapp <hlapp@gmx.net>, Biosql <biosql-l@open-bio.org>
        Subject:        Re: [BioSQL-l] release preparation


Mark Schreiber wrote:
> now, my bad! Agreed that from a SQL query perspective the schemas are 
the 
> same, one just has more complexity (if I can call it that) under the 
hood.

Indeed, the complexity is more to do with the complexity of installing
and understanding what's going on in all those files :-) (particularly
if you are not an oracle expert and have only been looking at the
BioSQL schemas for the other supported databases), and that's why I
did the simple version.  That's partly confirmed by the fact that the
bjia description of how to use the original schema is about 8KB, while
the description for the simple schema is about 1KB.  I'm all for
dumping the simple one if the barrier for entry for the original
schema is lowered (maybe it already has been).


> I would prefer to keep instructions for the less complex version up for 
> the time being as we are having difficulties getting biojava to work 
> seamlessly with the more complex version. This is almost certainly a 
> failing of biojava for which the oracle support seems to have been 
> compiled against the 'simple' schema not the 'complex schema'.

It certainly was only tested against the simple version, because
that's the only schema I had working when I wrote the Oracle support.
I am a little surprised that you are having major difficulties though,
since the original package has a compatibility layer that (supposedly)
presents the same schema as the simple version.


> I expect we will soon have biojava supporting your version and we can 
drop 
> the 'simple' schema. After all, there is not much point using oracle if 
> you don't make use of the features.

In my case, it was a matter of using Oracle because that was what was
already installed :-)


Cheers,
Len.


From mark.schreiber at novartis.com  Mon Apr 18 05:08:23 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Apr 18 05:09:03 2005
Subject: [BioSQL-l] release preparation
Message-ID: <OF441E5E88.F69F4458-ON48256FE7.00317673-48256FE7.003234E0@EU.novartis.net>

We tracked down the error to the fact that the Hilmar Oracle version (no 
assertion is made about the complexity) uses CLOBs to store sequence while 
the Len Oracle (no assertion about simplicity etc) version uses LONGs to 
store sequence. The biojava support code seems to assume LONGs and 
strangely until very recently  the JDBC oracle dirver seems to let you 
write LONGs to CLOBs although the data that comes out again is completely 
munged.

It would be possible to modify the biojava adapters to check for LONG or 
CLOB and behaive appropriately but this would cause lots of maintenance 
problems later.

Given this situtation, unless some one complains very loudly, the biojava 
oracle adapters will be changed to assume CLOBs. Note: This means if you 
are using biojava and oracle and biosql now then it will break unless you 
adopts Hilmar's version. It will not cause any changes to biojava users of 
MySQL etc.

- Mark


Len Trigg <len@reeltwo.com>
04/18/2005 04:58 PM

 
        To:     Mark Schreiber/GP/Novartis@PH
        cc:     Hilmar Lapp <hlapp@gmx.net>, Biosql <biosql-l@open-bio.org>
        Subject:        Re: [BioSQL-l] release preparation


Mark Schreiber wrote:
> now, my bad! Agreed that from a SQL query perspective the schemas are 
the 
> same, one just has more complexity (if I can call it that) under the 
hood.

Indeed, the complexity is more to do with the complexity of installing
and understanding what's going on in all those files :-) (particularly
if you are not an oracle expert and have only been looking at the
BioSQL schemas for the other supported databases), and that's why I
did the simple version.  That's partly confirmed by the fact that the
bjia description of how to use the original schema is about 8KB, while
the description for the simple schema is about 1KB.  I'm all for
dumping the simple one if the barrier for entry for the original
schema is lowered (maybe it already has been).


> I would prefer to keep instructions for the less complex version up for 
> the time being as we are having difficulties getting biojava to work 
> seamlessly with the more complex version. This is almost certainly a 
> failing of biojava for which the oracle support seems to have been 
> compiled against the 'simple' schema not the 'complex schema'.

It certainly was only tested against the simple version, because
that's the only schema I had working when I wrote the Oracle support.
I am a little surprised that you are having major difficulties though,
since the original package has a compatibility layer that (supposedly)
presents the same schema as the simple version.


> I expect we will soon have biojava supporting your version and we can 
drop 
> the 'simple' schema. After all, there is not much point using oracle if 
> you don't make use of the features.

In my case, it was a matter of using Oracle because that was what was
already installed :-)


Cheers,
Len.


From hollandr at gis.a-star.edu.sg  Mon Apr 18 05:08:41 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Mon Apr 18 05:09:07 2005
Subject: [BioSQL-l] release preparation
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560195100A@BIONIC.biopolis.one-north.com>

I looked into this in a bit more detail earlier today and found that,
since some version of Oracle around the 9i point in time, the official
Oracle JDBC driver API for accessing LOBs in changed. This means that
whereas before the same code could be used in BioJava to access both
Hilmar's and Len's versions of the database, since the 9i drivers this
has no longer been possible, and BioJava only works with Len's version.
The problem is due to the way in which Len's schema uses LONG values for
biosequence.seq, but Hilmar's uses CLOBs. 

(The nitty gritty - before 9i, Oracle JDBC allowed you to access both
LONG and CLOB columns using getString()/setString() methods to
manipulate them. Now, these methods only work with LONG columns, and you
have to do fancy tricks to get anything useful into/out of CLOBs).

After discussing this with Mark earlier this afternoon, I am planning on
changing BioJava to use the new Oracle CLOB API, at which point it will
no longer work with schemas set up using Len's version. No change to
BioSQL is required. This, from a BioJava point of view, would make the
simple schema redundant. I am not sure if there are people in the other
Bio* projects who use the simple schema though so we probably can't just
drop it.

Are there any objections? I have crossposted this to the BioJava list to
make sure everyone who might be affected gets a say.

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biosql-l-bounces@portal.open-bio.org 
> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of Len Trigg
> Sent: Monday, April 18, 2005 4:58 PM
> To: mark.schreiber@novartis.com
> Cc: Hilmar Lapp; Biosql
> Subject: Re: [BioSQL-l] release preparation
> 
> 
> 
> Mark Schreiber wrote:
> > now, my bad! Agreed that from a SQL query perspective the 
> schemas are the 
> > same, one just has more complexity (if I can call it that) 
> under the hood.
> 
> Indeed, the complexity is more to do with the complexity of installing
> and understanding what's going on in all those files :-) (particularly
> if you are not an oracle expert and have only been looking at the
> BioSQL schemas for the other supported databases), and that's why I
> did the simple version.  That's partly confirmed by the fact that the
> bjia description of how to use the original schema is about 8KB, while
> the description for the simple schema is about 1KB.  I'm all for
> dumping the simple one if the barrier for entry for the original
> schema is lowered (maybe it already has been).
> 
> 
> > I would prefer to keep instructions for the less complex 
> version up for 
> > the time being as we are having difficulties getting 
> biojava to work 
> > seamlessly with the more complex version. This is almost 
> certainly a 
> > failing of biojava for which the oracle support seems to have been 
> > compiled against the 'simple' schema not the 'complex schema'.
> 
> It certainly was only tested against the simple version, because
> that's the only schema I had working when I wrote the Oracle support.
> I am a little surprised that you are having major difficulties though,
> since the original package has a compatibility layer that (supposedly)
> presents the same schema as the simple version.
> 
> 
> > I expect we will soon have biojava supporting your version 
> and we can drop 
> > the 'simple' schema. After all, there is not much point 
> using oracle if 
> > you don't make use of the features.
> 
> In my case, it was a matter of using Oracle because that was what was
> already installed :-)
> 
> 
> Cheers,
> Len.
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 

From amackey at pcbi.upenn.edu  Mon Apr 18 09:24:48 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Mon Apr 18 09:21:11 2005
Subject: [BioSQL-l] release preparation
In-Reply-To: <2E2D45D2-AFCF-11D9-9FB4-000A959EB4C4@gmx.net>
References: <2E2D45D2-AFCF-11D9-9FB4-000A959EB4C4@gmx.net>
Message-ID: <c9aeacc4c672b6710a4f4d74b8bf3ec0@pcbi.upenn.edu>


I'm happy to field questions/improvements to the overview (seeing as 
it's my only real contribution to BioSQL).

-Aaron

On Apr 18, 2005, at 2:00 AM, Hilmar Lapp wrote:

>
> On Sunday, April 17, 2005, at 10:49  PM, Richard HOLLAND wrote:
>
>> I will read schema-overview.txt and see what needs changing, if
>> anything. Do you have a deadline for the release that I should work
>> towards?
>
> I'm aiming for the end of this month. Depending on your conclusions we 
> will need more or less time to include more details; if it's more than 
> a few sentences I'll need some time in advance though unless somebody 
> (maybe Aaron?) can share the work.
>
> 	-hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey@pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697

From dag at sonsorol.org  Mon Apr 18 08:01:13 2005
From: dag at sonsorol.org (Chris Dagdigian)
Date: Mon Apr 18 09:43:48 2005
Subject: [BioSQL-l] Re: [Root-l] biosql.org website
In-Reply-To: <711085F1-AEB2-11D9-8911-000A959EB4C4@gmx.net>
References: <711085F1-AEB2-11D9-8911-000A959EB4C4@gmx.net>
Message-ID: <4263A189.7070701@sonsorol.org>

Hello

Doing it now but it will take a day or so for the DNS changes I had to 
make to the biosql.org nameserver entry to propagate. Once the 
nameservers are pointing to ones I control I'll point biosql.org to the 
odba site.

chris


Hilmar Lapp wrote:

> [for those on biosql-l or others who weren't aware - after the domain 
> has been squatted on for years the OBF 2 days ago finally was able to 
> assume control over the biosql.org domain - thanks Chris for the swift 
> registration and thanks Andrew for noticing availability in the first 
> place]
> 
> Chris,
> 
> how can we instate and/or populate the website for www.biosql.org? I 
> suggest that until we have something separate that the domain point to 
> (be synonymous with) obda.open-bio.org.
> 
>     -hilmar

-- 
Chris Dagdigian, <dag@sonsorol.org>
BioTeam  - Independent life science IT & informatics consulting
Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E iChat/AIM: bioteamdag  Web: http://bioteam.net
From hlapp at gmx.net  Mon Apr 18 12:56:20 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon Apr 18 12:50:05 2005
Subject: [BioSQL-l] release preparation
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D560195100A@BIONIC.biopolis.one-north.com>
Message-ID: <C98C96C9-B02A-11D9-A2D2-000A959EB4C4@gmx.net>

CLOB is IMHO actually easier to handle. Also, LONG is really odd to 
deal with in SQL whereas the Oracle server will nicely on-the-fly 
convert strings to CLOB and vice versa so long as they are shorter than 
4000 chars. Some of the type-generic functions that come with Oracle 
will not accept LONG but do accept CLOB. Just as another anecdotal 
piece, the built-in BLAST searcher available in Oracle 10g expects a 
cursor returning CLOBs, not LONGs.

With the java.sql.Clob interface to get at the full value as a string 
is as simple as

	Clob clob = resultSet.getClob(<your column index here>);
	String clobValue = clob.getSubString(0, clob.length());

Inserting a new value in reality is a two-step process:

	PreparedStatement pst = conn.prepareStatement("INSERT INTO Biosequence 
(Bioentry_Id, Seq) VALUES (?, EMPTY_CLOB())");
	pst.executeUpdate(idValue);
	pst = conn.prepareStatement("SELECT Seq FROM Biosequence WHERE 
Bioentry_Id = ? FOR UPDATE", ResultSet.TYPE_FORWARD_ONLY, 
ResultSet.CONCUR_UPDATABLE);

	ResultSet rs = pst.executeQuery(idValue);
	Clob clob = rs.getClob(1);
	clob.setString(0, theSeq);
	// not sure this is necessary
	rs.updateClob(1, clob);
	rs.close();
	// don't forget to release lock
	conn.commit();
	
I vaguely remember that Len or somebody else from the Biojava crowd had 
this all figured out?

	-hilmar

On Monday, April 18, 2005, at 02:08  AM, Richard HOLLAND wrote:

> I looked into this in a bit more detail earlier today and found that,
> since some version of Oracle around the 9i point in time, the official
> Oracle JDBC driver API for accessing LOBs in changed. This means that
> whereas before the same code could be used in BioJava to access both
> Hilmar's and Len's versions of the database, since the 9i drivers this
> has no longer been possible, and BioJava only works with Len's version.
> The problem is due to the way in which Len's schema uses LONG values 
> for
> biosequence.seq, but Hilmar's uses CLOBs.
>
> (The nitty gritty - before 9i, Oracle JDBC allowed you to access both
> LONG and CLOB columns using getString()/setString() methods to
> manipulate them. Now, these methods only work with LONG columns, and 
> you
> have to do fancy tricks to get anything useful into/out of CLOBs).
>
> After discussing this with Mark earlier this afternoon, I am planning 
> on
> changing BioJava to use the new Oracle CLOB API, at which point it will
> no longer work with schemas set up using Len's version. No change to
> BioSQL is required. This, from a BioJava point of view, would make the
> simple schema redundant. I am not sure if there are people in the other
> Bio* projects who use the simple schema though so we probably can't 
> just
> drop it.
>
> Are there any objections? I have crossposted this to the BioJava list 
> to
> make sure everyone who might be affected gets a say.
>
> cheers,
> Richard
>
> Richard Holland
> Bioinformatics Specialist
> GIS extension 8199
> ---------------------------------------------
> This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its content to any
> other person. Thank you.
> ---------------------------------------------
>
>
>> -----Original Message-----
>> From: biosql-l-bounces@portal.open-bio.org
>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of Len Trigg
>> Sent: Monday, April 18, 2005 4:58 PM
>> To: mark.schreiber@novartis.com
>> Cc: Hilmar Lapp; Biosql
>> Subject: Re: [BioSQL-l] release preparation
>>
>>
>>
>> Mark Schreiber wrote:
>>> now, my bad! Agreed that from a SQL query perspective the
>> schemas are the
>>> same, one just has more complexity (if I can call it that)
>> under the hood.
>>
>> Indeed, the complexity is more to do with the complexity of installing
>> and understanding what's going on in all those files :-) (particularly
>> if you are not an oracle expert and have only been looking at the
>> BioSQL schemas for the other supported databases), and that's why I
>> did the simple version.  That's partly confirmed by the fact that the
>> bjia description of how to use the original schema is about 8KB, while
>> the description for the simple schema is about 1KB.  I'm all for
>> dumping the simple one if the barrier for entry for the original
>> schema is lowered (maybe it already has been).
>>
>>
>>> I would prefer to keep instructions for the less complex
>> version up for
>>> the time being as we are having difficulties getting
>> biojava to work
>>> seamlessly with the more complex version. This is almost
>> certainly a
>>> failing of biojava for which the oracle support seems to have been
>>> compiled against the 'simple' schema not the 'complex schema'.
>>
>> It certainly was only tested against the simple version, because
>> that's the only schema I had working when I wrote the Oracle support.
>> I am a little surprised that you are having major difficulties though,
>> since the original package has a compatibility layer that (supposedly)
>> presents the same schema as the simple version.
>>
>>
>>> I expect we will soon have biojava supporting your version
>> and we can drop
>>> the 'simple' schema. After all, there is not much point
>> using oracle if
>>> you don't make use of the features.
>>
>> In my case, it was a matter of using Oracle because that was what was
>> already installed :-)
>>
>>
>> Cheers,
>> Len.
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l@open-bio.org
>> http://open-bio.org/mailman/listinfo/biosql-l
>>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Mon Apr 18 13:42:16 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon Apr 18 13:36:01 2005
Subject: [BioSQL-l] Re: [Root-l] biosql.org website
In-Reply-To: <4263A189.7070701@sonsorol.org>
Message-ID: <3483A5B8-B031-11D9-83EB-000A959EB4C4@gmx.net>

Thanks Chris. -hilmar

On Monday, April 18, 2005, at 05:01  AM, Chris Dagdigian wrote:

> Hello
>
> Doing it now but it will take a day or so for the DNS changes I had to 
> make to the biosql.org nameserver entry to propagate. Once the 
> nameservers are pointing to ones I control I'll point biosql.org to 
> the odba site.
>
> chris
>
>
> Hilmar Lapp wrote:
>
>> [for those on biosql-l or others who weren't aware - after the domain 
>> has been squatted on for years the OBF 2 days ago finally was able to 
>> assume control over the biosql.org domain - thanks Chris for the 
>> swift registration and thanks Andrew for noticing availability in the 
>> first place]
>> Chris,
>> how can we instate and/or populate the website for www.biosql.org? I 
>> suggest that until we have something separate that the domain point 
>> to (be synonymous with) obda.open-bio.org.
>>     -hilmar
>
> -- 
> Chris Dagdigian, <dag@sonsorol.org>
> BioTeam  - Independent life science IT & informatics consulting
> Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193
> PGP KeyID: 83D4310E iChat/AIM: bioteamdag  Web: http://bioteam.net
> _______________________________________________
> Root-l mailing list
> Root-l@open-bio.org
> http://open-bio.org/mailman/listinfo/root-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Mon Apr 18 13:58:02 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon Apr 18 13:51:30 2005
Subject: [BioSQL-l] release preparation
In-Reply-To: <c9aeacc4c672b6710a4f4d74b8bf3ec0@pcbi.upenn.edu>
Message-ID: <6825ADDA-B033-11D9-83EB-000A959EB4C4@gmx.net>

Thanks for offering help, it'll certainly help. -hilmar

On Monday, April 18, 2005, at 06:24  AM, Aaron J. Mackey wrote:

>
> I'm happy to field questions/improvements to the overview (seeing as 
> it's my only real contribution to BioSQL).
>
> -Aaron
>
> On Apr 18, 2005, at 2:00 AM, Hilmar Lapp wrote:
>
>>
>> On Sunday, April 17, 2005, at 10:49  PM, Richard HOLLAND wrote:
>>
>>> I will read schema-overview.txt and see what needs changing, if
>>> anything. Do you have a deadline for the release that I should work
>>> towards?
>>
>> I'm aiming for the end of this month. Depending on your conclusions 
>> we will need more or less time to include more details; if it's more 
>> than a few sentences I'll need some time in advance though unless 
>> somebody (maybe Aaron?) can share the work.
>>
>> 	-hilmar
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l@open-bio.org
>> http://open-bio.org/mailman/listinfo/biosql-l
>>
>>
> --
> Aaron J. Mackey, Ph.D.
> Dept. of Biology, Goddard 212
> University of Pennsylvania       email:  amackey@pcbi.upenn.edu
> 415 S. University Avenue         office: 215-898-1205
> Philadelphia, PA  19104-6017     fax:    215-746-6697
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hollandr at gis.a-star.edu.sg  Mon Apr 18 21:45:33 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Mon Apr 18 21:41:18 2005
Subject: [BioSQL-l] release preparation
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601951039@BIONIC.biopolis.one-north.com>

Don't worry, I do know how to do it, it's just that in the existing
BioJava-live code it hasn't been done, and I'll need to be careful to
add the usual checks to see if we're using Oracle or not before choosing
the appropriate SQL to update biosequence with.

CLOBs under 4000 chars are certainly easier, but over 4000 you have to
be careful, and there is a bug which prevents clob.getSubstring()
working for any position greater than can be described in 16 bits
(although I know I experienced this one before, I can't find a reference
to it now....). You then have to use the clob's Stream accessors
instead, but it's not a problem really. Yes, I know 16-bits (65k bases
or so) is huge, but in our current BioSQL all our sequences are around
the 10000 base length so the 4000-char limited accessor methods are not
an option.

Len's suggestion of having table helpers in BioJava to check which
version to use and therefore maintain backwards compatibility is a good
one. It's slightly more work, but not too much to warrant a major panic
attack. I'll let you know when biojava-live has been updated.

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp@gmx.net] 
> Sent: Tuesday, April 19, 2005 12:56 AM
> To: Richard HOLLAND
> Cc: Len Trigg; mark.schreiber@novartis.com; Biosql; biojava-list List
> Subject: Re: [BioSQL-l] release preparation
> 
> 
> CLOB is IMHO actually easier to handle. Also, LONG is really odd to 
> deal with in SQL whereas the Oracle server will nicely on-the-fly 
> convert strings to CLOB and vice versa so long as they are 
> shorter than 
> 4000 chars. Some of the type-generic functions that come with Oracle 
> will not accept LONG but do accept CLOB. Just as another anecdotal 
> piece, the built-in BLAST searcher available in Oracle 10g expects a 
> cursor returning CLOBs, not LONGs.
> 
> With the java.sql.Clob interface to get at the full value as a string 
> is as simple as
> 
> 	Clob clob = resultSet.getClob(<your column index here>);
> 	String clobValue = clob.getSubString(0, clob.length());
> 
> Inserting a new value in reality is a two-step process:
> 
> 	PreparedStatement pst = conn.prepareStatement("INSERT 
> INTO Biosequence 
> (Bioentry_Id, Seq) VALUES (?, EMPTY_CLOB())");
> 	pst.executeUpdate(idValue);
> 	pst = conn.prepareStatement("SELECT Seq FROM Biosequence WHERE 
> Bioentry_Id = ? FOR UPDATE", ResultSet.TYPE_FORWARD_ONLY, 
> ResultSet.CONCUR_UPDATABLE);
> 
> 	ResultSet rs = pst.executeQuery(idValue);
> 	Clob clob = rs.getClob(1);
> 	clob.setString(0, theSeq);
> 	// not sure this is necessary
> 	rs.updateClob(1, clob);
> 	rs.close();
> 	// don't forget to release lock
> 	conn.commit();
> 	
> I vaguely remember that Len or somebody else from the Biojava 
> crowd had 
> this all figured out?
> 
> 	-hilmar
> 
> On Monday, April 18, 2005, at 02:08  AM, Richard HOLLAND wrote:
> 
> > I looked into this in a bit more detail earlier today and 
> found that,
> > since some version of Oracle around the 9i point in time, 
> the official
> > Oracle JDBC driver API for accessing LOBs in changed. This 
> means that
> > whereas before the same code could be used in BioJava to access both
> > Hilmar's and Len's versions of the database, since the 9i 
> drivers this
> > has no longer been possible, and BioJava only works with 
> Len's version.
> > The problem is due to the way in which Len's schema uses 
> LONG values 
> > for
> > biosequence.seq, but Hilmar's uses CLOBs.
> >
> > (The nitty gritty - before 9i, Oracle JDBC allowed you to 
> access both
> > LONG and CLOB columns using getString()/setString() methods to
> > manipulate them. Now, these methods only work with LONG 
> columns, and 
> > you
> > have to do fancy tricks to get anything useful into/out of CLOBs).
> >
> > After discussing this with Mark earlier this afternoon, I 
> am planning 
> > on
> > changing BioJava to use the new Oracle CLOB API, at which 
> point it will
> > no longer work with schemas set up using Len's version. No change to
> > BioSQL is required. This, from a BioJava point of view, 
> would make the
> > simple schema redundant. I am not sure if there are people 
> in the other
> > Bio* projects who use the simple schema though so we probably can't 
> > just
> > drop it.
> >
> > Are there any objections? I have crossposted this to the 
> BioJava list 
> > to
> > make sure everyone who might be affected gets a say.
> >
> > cheers,
> > Richard
> >
> > Richard Holland
> > Bioinformatics Specialist
> > GIS extension 8199
> > ---------------------------------------------
> > This email is confidential and may be privileged. If you are not the
> > intended recipient, please delete it and notify us 
> immediately. Please
> > do not copy or use it for any purpose, or disclose its 
> content to any
> > other person. Thank you.
> > ---------------------------------------------
> >
> >
> >> -----Original Message-----
> >> From: biosql-l-bounces@portal.open-bio.org
> >> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
> Len Trigg
> >> Sent: Monday, April 18, 2005 4:58 PM
> >> To: mark.schreiber@novartis.com
> >> Cc: Hilmar Lapp; Biosql
> >> Subject: Re: [BioSQL-l] release preparation
> >>
> >>
> >>
> >> Mark Schreiber wrote:
> >>> now, my bad! Agreed that from a SQL query perspective the
> >> schemas are the
> >>> same, one just has more complexity (if I can call it that)
> >> under the hood.
> >>
> >> Indeed, the complexity is more to do with the complexity 
> of installing
> >> and understanding what's going on in all those files :-) 
> (particularly
> >> if you are not an oracle expert and have only been looking at the
> >> BioSQL schemas for the other supported databases), and that's why I
> >> did the simple version.  That's partly confirmed by the 
> fact that the
> >> bjia description of how to use the original schema is 
> about 8KB, while
> >> the description for the simple schema is about 1KB.  I'm all for
> >> dumping the simple one if the barrier for entry for the original
> >> schema is lowered (maybe it already has been).
> >>
> >>
> >>> I would prefer to keep instructions for the less complex
> >> version up for
> >>> the time being as we are having difficulties getting
> >> biojava to work
> >>> seamlessly with the more complex version. This is almost
> >> certainly a
> >>> failing of biojava for which the oracle support seems to have been
> >>> compiled against the 'simple' schema not the 'complex schema'.
> >>
> >> It certainly was only tested against the simple version, because
> >> that's the only schema I had working when I wrote the 
> Oracle support.
> >> I am a little surprised that you are having major 
> difficulties though,
> >> since the original package has a compatibility layer that 
> (supposedly)
> >> presents the same schema as the simple version.
> >>
> >>
> >>> I expect we will soon have biojava supporting your version
> >> and we can drop
> >>> the 'simple' schema. After all, there is not much point
> >> using oracle if
> >>> you don't make use of the features.
> >>
> >> In my case, it was a matter of using Oracle because that 
> was what was
> >> already installed :-)
> >>
> >>
> >> Cheers,
> >> Len.
> >>
> >> _______________________________________________
> >> BioSQL-l mailing list
> >> BioSQL-l@open-bio.org
> >> http://open-bio.org/mailman/listinfo/biosql-l
> >>
> >
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> 

From hlapp at gnf.org  Mon Apr 18 22:19:30 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Apr 18 22:13:13 2005
Subject: [BioSQL-l] release preparation
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601951039@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601951039@BIONIC.biopolis.one-north.com>
Message-ID: <afbdfa9b5fe32acd3ebb494bbe3ebe0b@gnf.org>

Sounds good. BTW note that 65k is not large at all even on the 
transcript level; the infamous titin and some similar genes have longer 
transcripts. Maybe Dengue doesn't have titin but still ... ;)

	-hilmar

On Apr 18, 2005, at 6:45 PM, Richard HOLLAND wrote:

> Don't worry, I do know how to do it, it's just that in the existing
> BioJava-live code it hasn't been done, and I'll need to be careful to
> add the usual checks to see if we're using Oracle or not before 
> choosing
> the appropriate SQL to update biosequence with.
>
> CLOBs under 4000 chars are certainly easier, but over 4000 you have to
> be careful, and there is a bug which prevents clob.getSubstring()
> working for any position greater than can be described in 16 bits
> (although I know I experienced this one before, I can't find a 
> reference
> to it now....). You then have to use the clob's Stream accessors
> instead, but it's not a problem really. Yes, I know 16-bits (65k bases
> or so) is huge, but in our current BioSQL all our sequences are around
> the 10000 base length so the 4000-char limited accessor methods are not
> an option.
>
> Len's suggestion of having table helpers in BioJava to check which
> version to use and therefore maintain backwards compatibility is a good
> one. It's slightly more work, but not too much to warrant a major panic
> attack. I'll let you know when biojava-live has been updated.
>
> cheers,
> Richard
>
> Richard Holland
> Bioinformatics Specialist
> GIS extension 8199
> ---------------------------------------------
> This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its content to any
> other person. Thank you.
> ---------------------------------------------
>
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp@gmx.net]
>> Sent: Tuesday, April 19, 2005 12:56 AM
>> To: Richard HOLLAND
>> Cc: Len Trigg; mark.schreiber@novartis.com; Biosql; biojava-list List
>> Subject: Re: [BioSQL-l] release preparation
>>
>>
>> CLOB is IMHO actually easier to handle. Also, LONG is really odd to
>> deal with in SQL whereas the Oracle server will nicely on-the-fly
>> convert strings to CLOB and vice versa so long as they are
>> shorter than
>> 4000 chars. Some of the type-generic functions that come with Oracle
>> will not accept LONG but do accept CLOB. Just as another anecdotal
>> piece, the built-in BLAST searcher available in Oracle 10g expects a
>> cursor returning CLOBs, not LONGs.
>>
>> With the java.sql.Clob interface to get at the full value as a string
>> is as simple as
>>
>> 	Clob clob = resultSet.getClob(<your column index here>);
>> 	String clobValue = clob.getSubString(0, clob.length());
>>
>> Inserting a new value in reality is a two-step process:
>>
>> 	PreparedStatement pst = conn.prepareStatement("INSERT
>> INTO Biosequence
>> (Bioentry_Id, Seq) VALUES (?, EMPTY_CLOB())");
>> 	pst.executeUpdate(idValue);
>> 	pst = conn.prepareStatement("SELECT Seq FROM Biosequence WHERE
>> Bioentry_Id = ? FOR UPDATE", ResultSet.TYPE_FORWARD_ONLY,
>> ResultSet.CONCUR_UPDATABLE);
>>
>> 	ResultSet rs = pst.executeQuery(idValue);
>> 	Clob clob = rs.getClob(1);
>> 	clob.setString(0, theSeq);
>> 	// not sure this is necessary
>> 	rs.updateClob(1, clob);
>> 	rs.close();
>> 	// don't forget to release lock
>> 	conn.commit();
>> 	
>> I vaguely remember that Len or somebody else from the Biojava
>> crowd had
>> this all figured out?
>>
>> 	-hilmar
>>
>> On Monday, April 18, 2005, at 02:08  AM, Richard HOLLAND wrote:
>>
>>> I looked into this in a bit more detail earlier today and
>> found that,
>>> since some version of Oracle around the 9i point in time,
>> the official
>>> Oracle JDBC driver API for accessing LOBs in changed. This
>> means that
>>> whereas before the same code could be used in BioJava to access both
>>> Hilmar's and Len's versions of the database, since the 9i
>> drivers this
>>> has no longer been possible, and BioJava only works with
>> Len's version.
>>> The problem is due to the way in which Len's schema uses
>> LONG values
>>> for
>>> biosequence.seq, but Hilmar's uses CLOBs.
>>>
>>> (The nitty gritty - before 9i, Oracle JDBC allowed you to
>> access both
>>> LONG and CLOB columns using getString()/setString() methods to
>>> manipulate them. Now, these methods only work with LONG
>> columns, and
>>> you
>>> have to do fancy tricks to get anything useful into/out of CLOBs).
>>>
>>> After discussing this with Mark earlier this afternoon, I
>> am planning
>>> on
>>> changing BioJava to use the new Oracle CLOB API, at which
>> point it will
>>> no longer work with schemas set up using Len's version. No change to
>>> BioSQL is required. This, from a BioJava point of view,
>> would make the
>>> simple schema redundant. I am not sure if there are people
>> in the other
>>> Bio* projects who use the simple schema though so we probably can't
>>> just
>>> drop it.
>>>
>>> Are there any objections? I have crossposted this to the
>> BioJava list
>>> to
>>> make sure everyone who might be affected gets a say.
>>>
>>> cheers,
>>> Richard
>>>
>>> Richard Holland
>>> Bioinformatics Specialist
>>> GIS extension 8199
>>> ---------------------------------------------
>>> This email is confidential and may be privileged. If you are not the
>>> intended recipient, please delete it and notify us
>> immediately. Please
>>> do not copy or use it for any purpose, or disclose its
>> content to any
>>> other person. Thank you.
>>> ---------------------------------------------
>>>
>>>
>>>> -----Original Message-----
>>>> From: biosql-l-bounces@portal.open-bio.org
>>>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of
>> Len Trigg
>>>> Sent: Monday, April 18, 2005 4:58 PM
>>>> To: mark.schreiber@novartis.com
>>>> Cc: Hilmar Lapp; Biosql
>>>> Subject: Re: [BioSQL-l] release preparation
>>>>
>>>>
>>>>
>>>> Mark Schreiber wrote:
>>>>> now, my bad! Agreed that from a SQL query perspective the
>>>> schemas are the
>>>>> same, one just has more complexity (if I can call it that)
>>>> under the hood.
>>>>
>>>> Indeed, the complexity is more to do with the complexity
>> of installing
>>>> and understanding what's going on in all those files :-)
>> (particularly
>>>> if you are not an oracle expert and have only been looking at the
>>>> BioSQL schemas for the other supported databases), and that's why I
>>>> did the simple version.  That's partly confirmed by the
>> fact that the
>>>> bjia description of how to use the original schema is
>> about 8KB, while
>>>> the description for the simple schema is about 1KB.  I'm all for
>>>> dumping the simple one if the barrier for entry for the original
>>>> schema is lowered (maybe it already has been).
>>>>
>>>>
>>>>> I would prefer to keep instructions for the less complex
>>>> version up for
>>>>> the time being as we are having difficulties getting
>>>> biojava to work
>>>>> seamlessly with the more complex version. This is almost
>>>> certainly a
>>>>> failing of biojava for which the oracle support seems to have been
>>>>> compiled against the 'simple' schema not the 'complex schema'.
>>>>
>>>> It certainly was only tested against the simple version, because
>>>> that's the only schema I had working when I wrote the
>> Oracle support.
>>>> I am a little surprised that you are having major
>> difficulties though,
>>>> since the original package has a compatibility layer that
>> (supposedly)
>>>> presents the same schema as the simple version.
>>>>
>>>>
>>>>> I expect we will soon have biojava supporting your version
>>>> and we can drop
>>>>> the 'simple' schema. After all, there is not much point
>>>> using oracle if
>>>>> you don't make use of the features.
>>>>
>>>> In my case, it was a matter of using Oracle because that
>> was what was
>>>> already installed :-)
>>>>
>>>>
>>>> Cheers,
>>>> Len.
>>>>
>>>> _______________________________________________
>>>> BioSQL-l mailing list
>>>> BioSQL-l@open-bio.org
>>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>>
>>>
>>>
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From dlondon at ebi.ac.uk  Tue Apr 19 05:16:30 2005
From: dlondon at ebi.ac.uk (Darin London)
Date: Tue Apr 19 07:12:13 2005
Subject: [BioSQL-l] Re: BOSC 2005
In-Reply-To: <20050120175859.GA7254@parrot.ebi.ac.uk>
References: <20050120175859.GA7254@parrot.ebi.ac.uk>
Message-ID: <20050419091628.GN17377@parrot.ebi.ac.uk>

{Please pass the word!}

SECOND CALL FOR SPEAKERS
 
The 6th annual Bioinformatics Open Source Conference (BOSC'2005) is organized by the
not-for-profit Open Bioinformatics Foundation. The meeting will take place
June 23-24, 2005 in Detroit, Michigan, USA, and is one of several Special Interest
Group (SIG) meetings occurring in conjunction with the 13th International Conference
on Intelligent Systems for Molecular Biology.
 
see http://www.iscb.org/ismb2005 for more information.
 
Because of the power of many Open Source bioinformatics packages in
use by the Research Community today, it is not too presumptuous to say 
that the work of the Open Source Bioinformatics Community represents 
the cutting edge of Bioinformatics in general. This has been repeatedly 
demonstrated by the quality of presentations at previous BOSC conferences.
This year, at BOSC 2005, we want to continue this tradition of excellence, 
while presenting this message to a wider part of the Research Community.  
Please, pass this message on to anyone you know that is interested in
Bioinformatics software. 
 
 
BOSC PROGRAM & CONTACT INFO

* Web: http://www.open-bio.org/bosc2005/
* Online Registration: https://www.cteusa.com/iscb4/
* Email: bosc@open-bio.org

FEES


* Corporate : $195 ($245 after May 16th)
* Academic : $170 ($220 after May 16th)
* Student : $145 ($195 after May 16th) 

SPEAKERS & ABSTRACTS WANTED

The program committee is currently seeking abstracts for talks at BOSC 
2005. BOSC is a great opportunity for you to tell the community about 
your use, development, or philosophy of open source software development 
in bioinformatics. The committee will select several submitted abstracts 
for 25-minute talks and others for shorter "lightning" talks. Accepted 
abstracts will be published on the BOSC web site.

If you are interested in speaking at BOSC 2005, 
please send us before April 26, 2005:

* an abstract (no more than a few paragraphs)
* a URL for the project page, if applicable
* information about the open source license used for your software or 
  your release plans.
 
Abstracts will be accepted for submission until April 26, 2005.
Abstracts chosen for presentation will be announced May 12, 2005 
(before the ISMB Early Registration Deadline).
 
LIGHTNING-TALK SPEAKERS WANTED!

The program committee is currently seeking speakers for the lightning 
talks at BOSC 2005. Lightning talks are quick - only five minutes 
long - and a great opportunity for you to give people a quick 
summary of your open source project, code, idea, or vision of the future.
 
If you are interested in giving a lightning talk at BOSC 2005, 
please send us:
 
* a brief title and summary (one or two lines)
* a URL for the project page, if applicable
* information about the open source license used for your software or 
  your release plans.
 
We will accept entries on-line until BOSC starts, but
space for demos and lightning talks is limited.<br/
   
SOFTWARE DEMONSTRATIONS WANTED!
If you are involved in the development of Open Source Bioinformatics Software, 
you are invited to provide a short demonstration to attendees of BOSC 2005.
 
If you are interested in giving a software demonstration at BOSC 2005,
please send us:
 
* a brief title and summary (one or two lines)
* a URL for the project page, if applicable
* Internet connectivity requirements (e.g. website Application served on the 
  world wide web, or web based client application).
 
  We will accept entries on-line until the BOSC starts, but
  space for demos and lightning talks is limited. 
 
 ** Because the mission of the OBF is to promote Open Source software, we will favor submissions for
  projects that apply a recognized Open Source License, or adhere to the general Open Source Philosophy.
  See the following websites for further details:
  href="http://www.opensource.org/licenses/
  href="http://www.opensource.org/docs/definition.php
 
 
 SESSION CHAIRS WANTED
 If you would like to be involved BOSC 2005, we invite you to chair a session.  This will 
 not require much of your time.  You will be given a schedule of presenters during your session. 
 You simply introduce each speaker, and manage the time of their presentation (25 minutes for full 
 presentations, 5-10 minutes for lightning talks/demos, depending on the number of entries).
 
 If you are interested in chairing a session, please send us your name and affiliation (if applicable).
 
-- 
cheers,

Bosc Organizing Committee
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://open-bio.org/pipermail/biosql-l/attachments/20050419/cb4b76bf/attachment.bin
From hollandr at gis.a-star.edu.sg  Mon Apr 25 02:41:44 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Mon Apr 25 02:36:52 2005
Subject: [BioSQL-l] BioJava/BioSQL on Oracle
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56019512B1@BIONIC.biopolis.one-north.com>

I've committed some changes to biojava-live which make BioJava compatible with BioSQL when the latter is running on Oracle 9i or greater and using the official schema as per the biosql-schema CVS. This involved adding an autodetect function to detect whether Clobs were used in biosequence or not, and then creating code to work with Clobs where necessary. Two extensions to OracleDBHelper might be useful in other tasks - stringToClob() and clobToString().

I also made some changes to the Ontology part of the BioJava/BioSQL interface, which was not persisting Triples correctly. It would attempt to reference the triple by its unique ID before actually giving it one, which of course fails. This should now be fixed.

cheers,
Richard

Richard Holland
Bioinformatics Specialist
Genome Institute of Singapore
60 Biopolis Street, #02-01 Genome, Singapore 138672
Tel: (65) 6478 8000   DID: (65) 6478 8199
Email: hollandr@gis.a-star.edu.sg
---------------------------------------------
This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you.
---------------------------------------------


From hollandr at gis.a-star.edu.sg  Wed Apr 27 02:57:02 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Wed Apr 27 02:51:39 2005
Subject: [BioSQL-l] BioSQL documentation
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56019513F2@BIONIC.biopolis.one-north.com>

Hilmar,

I read through the doc/schema_overview.txt and it looks fine, nothing has changed much since it was written. It's fine to leave it as it is.

Now that BioJava will play nicely with the Clobs in the official BioSQL schema for Oracle, I will rewrite the BioJava/BioSQL/Oracle HowTo and remove references to Len's schema as it is no longer relevant. The official schema will now function perfectly well with BioJava out-of-the-box (but only if you are using biojava-live, for now, until the change gets into the main release branch). I will post the URL to this list when it is complete and updated.

Mark Schreiber and I have asked if we might attend the Open Bio Hackathon this year. If we are accepted, one of our projects is to get all the Bio* projects to play nicely with BioSQL and store various bits of information in the same columns of the same tables as each other. If this does not happen, we still intend to do it, but it might take longer. If you or anyone else working with BioSQL interfaces in the Bio* projects will also be attending then we'd love to work with you on this. There are three stages: (1) identify where things should be going for all the common data formats (Genbank, Swissprot, plain fasta etc.), then (2) identify where they are actually going at the moment when loaded into BioSQL by the various Bio* projects, and finally (3) modify the various Bio* projects to use the correct locations (and hopefully retain checks for backwards compatibility so that if they can't find that information in its correct location, they'll check the old one just in case). Hopefully that's not too much work for a small group of people to finish together in a couple of days.

I was wondering if it would be a good idea to delay the official BioSQL 1.0 release until after the above standardisations have taken place. Then we can include in the distribution a document detailing exactly what goes where when loading various data formats, both for reference and for the guidance of future projects not yet written.

cheers,
Richard

Richard Holland
Bioinformatics Specialist
Genome Institute of Singapore
60 Biopolis Street, #02-01 Genome, Singapore 138672
Tel: (65) 6478 8000   DID: (65) 6478 8199
Email: hollandr@gis.a-star.edu.sg
---------------------------------------------
This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you.
---------------------------------------------


From mark.schreiber at novartis.com  Wed Apr 27 05:32:35 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Apr 27 05:26:53 2005
Subject: [BioSQL-l] BioSQL documentation
Message-ID: <OF82FCEDE3.58AAC652-ON48256FF0.003452F1-48256FF0.00346C1B@EU.novartis.net>

I don't know if this means there cannot be a 1.0 release. The BioSQL 1.0 
will be a standard. It's up to the bio* projects to play well with it.

- Mark


"Richard HOLLAND" <hollandr@gis.a-star.edu.sg>
Sent by: biosql-l-bounces@portal.open-bio.org
04/27/2005 02:57 PM

 
        To:     "Hilmar Lapp" <hlapp@gnf.org>
        cc:     biosql-l@open-bio.org, (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [BioSQL-l] BioSQL documentation


Hilmar,

I read through the doc/schema_overview.txt and it looks fine, nothing has 
changed much since it was written. It's fine to leave it as it is.

Now that BioJava will play nicely with the Clobs in the official BioSQL 
schema for Oracle, I will rewrite the BioJava/BioSQL/Oracle HowTo and 
remove references to Len's schema as it is no longer relevant. The 
official schema will now function perfectly well with BioJava 
out-of-the-box (but only if you are using biojava-live, for now, until the 
change gets into the main release branch). I will post the URL to this 
list when it is complete and updated.

Mark Schreiber and I have asked if we might attend the Open Bio Hackathon 
this year. If we are accepted, one of our projects is to get all the Bio* 
projects to play nicely with BioSQL and store various bits of information 
in the same columns of the same tables as each other. If this does not 
happen, we still intend to do it, but it might take longer. If you or 
anyone else working with BioSQL interfaces in the Bio* projects will also 
be attending then we'd love to work with you on this. There are three 
stages: (1) identify where things should be going for all the common data 
formats (Genbank, Swissprot, plain fasta etc.), then (2) identify where 
they are actually going at the moment when loaded into BioSQL by the 
various Bio* projects, and finally (3) modify the various Bio* projects to 
use the correct locations (and hopefully retain checks for backwards 
compatibility so that if they can't find that information in its correct 
location, they'll check the old one just in case).!
 Hopefully that's not too much work for a small group of people to finish 
together in a couple of days.

I was wondering if it would be a good idea to delay the official BioSQL 
1.0 release until after the above standardisations have taken place. Then 
we can include in the distribution a document detailing exactly what goes 
where when loading various data formats, both for reference and for the 
guidance of future projects not yet written.

cheers,
Richard

Richard Holland
Bioinformatics Specialist
Genome Institute of Singapore
60 Biopolis Street, #02-01 Genome, Singapore 138672
Tel: (65) 6478 8000   DID: (65) 6478 8199
Email: hollandr@gis.a-star.edu.sg
---------------------------------------------
This email is confidential and may be privileged. If you are not the 
intended recipient, please delete it and notify us immediately. Please do 
not copy or use it for any purpose, or disclose its content to any other 
person. Thank you.
---------------------------------------------


_______________________________________________
BioSQL-l mailing list
BioSQL-l@open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l


From hollandr at gis.a-star.edu.sg  Wed Apr 27 05:34:27 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Wed Apr 27 05:30:48 2005
Subject: [BioSQL-l] BioSQL documentation
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601951412@BIONIC.biopolis.one-north.com>

I just think the BioSQL 1.0 standard should include a reference as to
the 'official' way to store the different bits of various file formats
within the schema, which all apps talking to BioSQL can be expected to
comply with (and hence behave well with each other's data).

Richard.

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: mark.schreiber@novartis.com 
> [mailto:mark.schreiber@novartis.com] 
> Sent: Wednesday, April 27, 2005 5:33 PM
> To: Richard HOLLAND
> Cc: biosql-l@open-bio.org; 
> biosql-l-bounces@portal.open-bio.org; Hilmar Lapp
> Subject: Re: [BioSQL-l] BioSQL documentation
> 
> 
> I don't know if this means there cannot be a 1.0 release. The 
> BioSQL 1.0 
> will be a standard. It's up to the bio* projects to play well with it.
> 
> - Mark
> 
> 
> 
> 
> 
> "Richard HOLLAND" <hollandr@gis.a-star.edu.sg>
> Sent by: biosql-l-bounces@portal.open-bio.org
> 04/27/2005 02:57 PM
> 
>  
>         To:     "Hilmar Lapp" <hlapp@gnf.org>
>         cc:     biosql-l@open-bio.org, (bcc: Mark 
> Schreiber/GP/Novartis)
>         Subject:        [BioSQL-l] BioSQL documentation
> 
> 
> Hilmar,
> 
> I read through the doc/schema_overview.txt and it looks fine, 
> nothing has 
> changed much since it was written. It's fine to leave it as it is.
> 
> Now that BioJava will play nicely with the Clobs in the 
> official BioSQL 
> schema for Oracle, I will rewrite the BioJava/BioSQL/Oracle HowTo and 
> remove references to Len's schema as it is no longer relevant. The 
> official schema will now function perfectly well with BioJava 
> out-of-the-box (but only if you are using biojava-live, for 
> now, until the 
> change gets into the main release branch). I will post the 
> URL to this 
> list when it is complete and updated.
> 
> Mark Schreiber and I have asked if we might attend the Open 
> Bio Hackathon 
> this year. If we are accepted, one of our projects is to get 
> all the Bio* 
> projects to play nicely with BioSQL and store various bits of 
> information 
> in the same columns of the same tables as each other. If this 
> does not 
> happen, we still intend to do it, but it might take longer. If you or 
> anyone else working with BioSQL interfaces in the Bio* 
> projects will also 
> be attending then we'd love to work with you on this. There are three 
> stages: (1) identify where things should be going for all the 
> common data 
> formats (Genbank, Swissprot, plain fasta etc.), then (2) 
> identify where 
> they are actually going at the moment when loaded into BioSQL by the 
> various Bio* projects, and finally (3) modify the various 
> Bio* projects to 
> use the correct locations (and hopefully retain checks for backwards 
> compatibility so that if they can't find that information in 
> its correct 
> location, they'll check the old one just in case).!
>  Hopefully that's not too much work for a small group of 
> people to finish 
> together in a couple of days.
> 
> I was wondering if it would be a good idea to delay the 
> official BioSQL 
> 1.0 release until after the above standardisations have taken 
> place. Then 
> we can include in the distribution a document detailing 
> exactly what goes 
> where when loading various data formats, both for reference 
> and for the 
> guidance of future projects not yet written.
> 
> cheers,
> Richard
> 
> Richard Holland
> Bioinformatics Specialist
> Genome Institute of Singapore
> 60 Biopolis Street, #02-01 Genome, Singapore 138672
> Tel: (65) 6478 8000   DID: (65) 6478 8199
> Email: hollandr@gis.a-star.edu.sg
> ---------------------------------------------
> This email is confidential and may be privileged. If you are not the 
> intended recipient, please delete it and notify us 
> immediately. Please do 
> not copy or use it for any purpose, or disclose its content 
> to any other 
> person. Thank you.
> ---------------------------------------------
> 
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 
> 
> 
> 

From boehme at mpiib-berlin.mpg.de  Wed Apr 27 11:27:09 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Wed Apr 27 11:20:34 2005
Subject: [BioSQL-l] location in features
Message-ID: <426FAF4D.3000700@mpiib-berlin.mpg.de>

Hi,
I can't get rid of this exception:
org.biojava.bio.BioRuntimeException: BioSQL SeqFeature doesn't have 
any associated location spans. seqfeature_id=148

Can anybody help me?

put the sequence in:

Sequence seq = DNATools.createDNASequence(sequence, "AF100928");
Feature.Template templSeq = new Feature.Template();
		templSeq.source = "ncbi";
		templSeq.type = "gen";
		templSeq.location = Location.empty;
		seq.createFeature(templSeq);
db.addSequence(seq);

get it out:
seq = db.getSequence("AF100928");
System.out.println(seq.getName() + " contains " + seq.countFeatures()
					+ " features");	

seq.getName() works fine, but the seq doesn't have any features, but I 
can see them in the db.

What am I missing here?

Martina

From hlapp at gnf.org  Wed Apr 27 14:51:20 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Wed Apr 27 14:44:06 2005
Subject: [BioSQL-l] location in features
In-Reply-To: <426FAF4D.3000700@mpiib-berlin.mpg.de>
References: <426FAF4D.3000700@mpiib-berlin.mpg.de>
Message-ID: <3d10e113e0e7e6e9e28d31242cc83c40@gnf.org>

Hi Martina, people on the biojava mailing list will probably be better 
able to help you out. Also, Richard and Mark have been working on 
getting Biojava interoperate better with the standard biosql schema. 
They may know better where your issue is coming from.

	-hilmar

On Apr 27, 2005, at 8:27 AM, Martina wrote:

> Hi,
> I can't get rid of this exception:
> org.biojava.bio.BioRuntimeException: BioSQL SeqFeature doesn't have 
> any associated location spans. seqfeature_id=148
>
> Can anybody help me?
>
> put the sequence in:
>
> Sequence seq = DNATools.createDNASequence(sequence, "AF100928");
> Feature.Template templSeq = new Feature.Template();
> 		templSeq.source = "ncbi";
> 		templSeq.type = "gen";
> 		templSeq.location = Location.empty;
> 		seq.createFeature(templSeq);
> db.addSequence(seq);
>
> get it out:
> seq = db.getSequence("AF100928");
> System.out.println(seq.getName() + " contains " + seq.countFeatures()
> 					+ " features");	
>
> seq.getName() works fine, but the seq doesn't have any features, but I 
> can see them in the db.
>
> What am I missing here?
>
> Martina
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hollandr at gis.a-star.edu.sg  Wed Apr 27 23:41:42 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Wed Apr 27 23:36:21 2005
Subject: [BioSQL-l] location in features
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601951467@BIONIC.biopolis.one-north.com>

Hullo Martina.

I must admit I am confused. I have been using BioJava+BioSQL to load
Genbank records with features with no trouble, they always come out
again with no exceptions raised and none missing. I am using Oracle, but
this shouldn't make a difference as the SQL code that looks for features
is the same for all database types at present.

Can I ask what database type you are using (MySQL, Oracle etc.), and the
versions of BioJava and BioSQL you have?

I'd also suggest downloading biojava-live from CVS, if you haven't done
so already, and trying that to see if someone has already fixed your
problem.

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp@gnf.org] 
> Sent: Thursday, April 28, 2005 2:51 AM
> To: Martina
> Cc: OBDA BioSQL; BioJava; Richard HOLLAND; Mark Schreiber
> Subject: Re: [BioSQL-l] location in features
> 
> 
> Hi Martina, people on the biojava mailing list will probably 
> be better 
> able to help you out. Also, Richard and Mark have been working on 
> getting Biojava interoperate better with the standard biosql schema. 
> They may know better where your issue is coming from.
> 
> 	-hilmar
> 
> On Apr 27, 2005, at 8:27 AM, Martina wrote:
> 
> > Hi,
> > I can't get rid of this exception:
> > org.biojava.bio.BioRuntimeException: BioSQL SeqFeature doesn't have 
> > any associated location spans. seqfeature_id=148
> >
> > Can anybody help me?
> >
> > put the sequence in:
> >
> > Sequence seq = DNATools.createDNASequence(sequence, "AF100928");
> > Feature.Template templSeq = new Feature.Template();
> > 		templSeq.source = "ncbi";
> > 		templSeq.type = "gen";
> > 		templSeq.location = Location.empty;
> > 		seq.createFeature(templSeq);
> > db.addSequence(seq);
> >
> > get it out:
> > seq = db.getSequence("AF100928");
> > System.out.println(seq.getName() + " contains " + 
> seq.countFeatures()
> > 					+ " features");	
> >
> > seq.getName() works fine, but the seq doesn't have any 
> features, but I 
> > can see them in the db.
> >
> > What am I missing here?
> >
> > Martina
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l@open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 

From suzi at fruitfly.org  Wed Apr 20 18:42:29 2005
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Mon May  2 09:10:24 2005
Subject: [BioSQL-l] Hackathon 2005
Message-ID: <a461b8d8e0fdc21c10105bd3207e45ce@fruitfly.org>

(sorry for multiple postings, but please do forward to
anyone else who you think might be interested)

------------------------------------------------------------------------ 
-----------

Dear everyone,

It has been a long time and we Bioinformatics devotees are overdue for
another total-immersion coding-fest (the last hackathon was held in
Singapore February 2003, more than two years ago).  Apple has offered
to host us this year, and as an added bonus include free admission to
the World-Wide Developers Conference in San Francisco the prior week.
They are also looking for some people to present interesting new
developments at the WWDC, so if you have something noteworthy please
let us know. Apple is not attaching any strings, so our work need not
address Apple-specific software or hardware areas. Apple will provide
space and hardware (and access to their engineers if we'd like).

Week 1 (June 6-10) would be spent at the WWDC. Week 2 (June 12-16)
would be in Cupertino, at Apple's headquarters.  We're free to focus
on what interests us, our tentative plans include:

   1. Bio-ontologies software
   2. High-performance computing (e.g. large scale computations,  
optimization)
   3. Image analysis
   4. Documentation
   5. Anything else that may interest you

Our plan is to organize this much as the Aspen Center for Physics
computational biology workshops were organized (for those old enough to
remember): A couple of presentations to start the day; collaboration
and coding afterwards; time for a bit of fun (does anyone else
cycle?), and discussions in the late afternoons and evenings.

Would everyone who is interested in attending please send us a short
description of what you would like to do, and perhaps other people who
you would like to work with. There is somewhat limited space, so we
will try to prioritize groups that have a clear focus and a need to
interact. We now this is very short notice, but we hope that there
will be enough interest to make it possible.

We are looking into additional funding support, to pay for travel
expenses, but this is still to be decided.

Looking forward to hearing from everyone.
George, Cyrus, Steve, and Suzanna (the Bay Area locals)