[Biopython-dev] RULES in BioSQL PostgreSQL schema

Peter biopython at maubp.freeserve.co.uk
Fri May 22 12:27:06 UTC 2009


Hi all,

This is a continuation of a thread / bug report from Biopython (Bug 2833)
where attempting to import duplicate entries into BioSQL did not raise an
error on PostgreSQL (but does on MySQL). Cymon traced this to the
RULES present in the schema to help bioperl-db.

On Fri, May 22, 2009 at 3:05 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On May 21, 2009, at 6:52 PM, Cymon Cox wrote:
>
>> [...]
>>
>> Hi Andrea,
>>
>> The problem appears to be related to the BioSQL schema/PostGreSQL.
>>
>> As you indicated, adding a duplicate entry to bioentry returns a "INSERT 0
>> 0" and doesnt throw an IntegrityError which is what the code is looking
>> from and presumably what MySQL throws.
>>
>> The reason it doesnt throw an error is because of one (or both) of the
>> RULES in the schema:
>
> Indeed, I'd almost forgotten. The rules are there mostly as a remnant from
> earlier versions of PostgreSQL to support transactional loading the way
> bioperl-db (the object-relational mapping for BioPerl) is optimized. You
> probably don't need them anywhere else.
>
>        -hilmar
>
> <gory-details>
> Bioperl-db is optimized such that entities that very likely don't exist yet
> in the database are attempted for insert right away. If the insert fails due
> to a unique key violation, the record is looked up (and then expected to be
> found). In Oracle and MySQL you can do this and the transaction remains
> healthy; i.e., you can commit the transaction later and all statements
> except those that failed will be committed. In PostgreSQL any failed
> statement dooms the entire transaction, and the only way out is a rollback.
> In this case, if you want the loading of one sequence record as one
> transaction, failing to insert a single feature record will doom the entire
> sequence load and you would need to start over with the sequence. To fix
> this, I wrote the rules, which in essence do do the lookups for PostgreSQL
> that the bioperl-db code would otherwise avoid, and on insert do nothing if
> the record is found, which results in zero rows affected when you would
> expect one (which is what bioperl-db cues off of and then triggers a
> lookup).
> The right way to do this meanwhile is to use nested transactions, which
> PostgreSQL supports since v8.0.x, but I haven't gotten around to implement
> support for that in Bioperl-db.
> </gory-details>

Hilmar,

It seems for Biopython to work properly with BioSQL on PostgreSQL
these bioentry rules should be removed from the schema (as the
comments in the schema do suggest). Obviously doing this would
break any installation also using the current version of bioperl-db.

Do the RULES affect BioJava or BioRuby using BioSQL on
PostgreSQL?

Are you happy to remove these RULES in BioSQL v1.0.x (after
making the outlined transactional changes in bioperl-db)?

Thanks,

Peter




More information about the Biopython-dev mailing list