[BioSQL-l] Re: Postgres version support

Hilmar Lapp hlapp@gnf.org
Sat, 19 Oct 2002 17:17:03 -0700


On Monday, October 14, 2002, at 01:49 PM, Chris Mungall wrote:

>
> I think the default is not to do cascading deletes. we should probably
> switch them on. it's useful not having them temporarily as it helps 
> test
> all that pointless adapter code that exists purely for the benefit of
> mysql.

In the new version there is no such code anymore since I switched 
the MySql version to InnoDB ... (which supports enforced FKs as well 
cascading deletes)

The way I fixed the PostgreSQL generator is that ALTER TABLE ADD 
CONSTRAINTs come out unchanged - i.e., with ON CASCADE DELETE where 
it is present in the MySql version.

>
> i'm not sure about the performance overhead. i guess you could just 
> remove
> all FKs during the bulkload.

So far it has been acceptable for me with all constraints and 
indexes active in MySql and Oracle: entire swissprot takes ~5-6 hrs. 
This may take longer if you don't load into an empty database 
though. OTOH performance hasn't degraded noticeably towards the last 
10k of the 100k entries. However, load wasn't even maxed out on 
neither client nor db host -- data came from a NFS share. So, with 
data source sitting on a local disk and FK constraints disabled, 
this can probably be sped up considerably if you need faster 
turn-around.

I have no live experience with Pg yet.

>  i wouldn't bother trying to hand-combine
> these into your own trigger, there are more fruitful ways of 
> spending your
> time getting better performance from pg. though you could try asking on
> the pg-admin mail list, they are all very helpful there, and know 
> the guts
> of pg inside out.

Good tip.

>
> i'm really happy with the performance i get from pg (this is with the
> whole human assembly and refseq loaded). a default pg installation is
> usually pretty slow, but once you tinker with the parameters, run your
> vacuum analyzes etc it gets super fast.
>

Interesting. Can you tell what changes you had to make and which had 
the most effect?

> there is only one unsatisfactory aspect - the generic property pattern
> used in biosql means all property values are stored as TEXT. this turns
> out to be pretty slow for indexing. i got round this by materialising a
> view that casts the text to a varchar

It seems to me Pg has infinite length VARCHARs. I have yet to 
understand the precise difference between TEXT and huge VARCHARs in 
Pg other than VARCHARs having a pre-defined maximum length. Haven't 
read enough documentation yet. One of the main differences appears 
to be that the DBD Pg driver doesn't support TEXT very well (in that 
it doesn't honor LongReadLen etc), but I may again be mistaken due 
to too little doc reading, or maybe the doc I read is dated ...

What I did in the Pg version of the biosql schema is to replace some 
TEXT cols that won't have infinite size with VARCHARs with 
pre-defined (but somewhat big) size. This doesn't withstand 
re-generation of course, but then there is no way to tell whether 
TEXT in mysql is there because VARCHAR(1000) is not supported, or 
because it's really a CLOB.

	-hilmar

>  - i used this view when doing stuff
> like searching by gene name, but use the base table for getting the 
> longer
> property values (eg /note fields and whatnot)
>
> On Mon, 14 Oct 2002, Hilmar Lapp wrote:
>
>> I updated and fixed the transform_sql.pl script for generating the
>> Postgres version from the InnoDB version of the schema with FK
>> constraints. The result seems to run fine through psql (yeah, I
>> finally got this up and running on my laptop this weekend). I
>> committed it too.
>>
>> ChrisM, Thomas, you are regular Postgres users, right? I noticed
>> that Postgres creates triggers for enforcing the FK constraints and
>> I guess these triggers will also execute the ON DELETE CASCADE
>> actions. If I do '\d bioentry' I see a whole bunch of triggers
>> showing up supposedly enforcing constraints (bioentry does have a
>> number of FK constraints to it):
>>
>> Triggers: RI_ConstraintTrigger_16829,
>>            RI_ConstraintTrigger_16835,
>>            RI_ConstraintTrigger_16843,
>>            RI_ConstraintTrigger_16845,
>>            RI_ConstraintTrigger_16849,
>>            RI_ConstraintTrigger_16851,
>>            RI_ConstraintTrigger_16855,
>>            RI_ConstraintTrigger_16857,
>>            RI_ConstraintTrigger_16879,
>>            RI_ConstraintTrigger_16881,
>>            RI_ConstraintTrigger_16891,
>>            RI_ConstraintTrigger_16893,
>>            RI_ConstraintTrigger_16915,
>>            RI_ConstraintTrigger_16917
>>
>> Is this known to create a performance problem in Postgres? I was
>> wondering whether combining them into one trigger by hand will or
>> will not make a significant difference in performance. BTW does
>> anyone know off hand how I can view the code of a trigger in psql?
>>
>> There's no data yet, and the bioperl-db adaptors don't support Pg yet.
>>
>> 	-h
>> --
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>
>
>
>
--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------