[BioSQL-l] Help with load_seqdatabase.pl

Jansen E Lim Jansen.Lim at bms.com
Mon Jan 27 14:03:10 EST 2003


Hilmar,

Thanks for the clarification.  I agree that performing a true update is not
the most efficient method.
One question regarding the BioSQL schema: is the interest in having it
support history?  Thus, when loading
revised entries, the previous records are flagged as such (e.g., a status
attribute having a value of [current|history].)
Historical records are very important with regard to IP issues or just
trying to figure out what was known about a
particular sequence when we last ran blast, fasta, genscan and other
analyses.

Thanks for any info.

Regards,
-Jansen

Hilmar Lapp wrote:

> Jansen, sorry for the late response. The problem is due to PostgreSQL
> handling failures within a transaction differently (than MySQL/InnoDB
> and Oracle). The way the adaptor layer works is that those entities
> which are practically infinite in number are not looked up before
> insert, but instead their presence is detected by an insert failing the
> UK constraint. Comment is such an entity. PostgreSQL, however, aborts
> the entire transaction upon such a (handled or not) failure. I have yet
> to write certain functions in PL/PgSQL that will get around that
> problem.
>
> Generally speaking though, updating bioentries through --update is not
> very robust, because 1-n and n-n connected relations require more than
> a simple update (e.g., the new version of a sequence may have less
> features or features with a different key than the old version; a
> simple update would leave you with stale features attached to the
> bioentry).
>
> I have found it much more robust to simply delete associations and
> FK-connected relations, and re-inserting the new set. So, all that is
> really UPDATEd in this case is the bioentry (and biosequence) table.
> For an example of how to do this, have a look at
> scripts/update-on-new-version.pl, which is a closure you can pass to
> the --mergeobjs option of load_seqdatabase.pl. I wrote this to update
> RefSeq, and it works well for me.
>
>         -hilmar
>
> On Thursday, January 23, 2003, at 11:24  AM, Jansen E Lim wrote:
>
> > Hello,
> >
> > I seem to be having trouble using the -lookup option of
> > load_seqdatabase.pl script.  In particular, I wanted to see what
> > the option
> > would
> > do as documented as follows:
> >             --lookup
> >             flag to look-up by unique key first, converting the
> > insert
> >             into an update if the object is found
> >
> > I also tried using --lookup 1 without success.  I have no trouble
> > using  -noupdate and -remove option with -lookup.
> >
> > Here's how I invoke the script:  load_seqdatabase.pl  -dbname
> > refseq -driver Pg -lookup -format genbank dup.dat
> > Here's the error message I get:
> >
> > DBD::Pg::st execute failed: ERROR:  Cannot insert a duplicate key
> > into unique index comment_bioentry_id_key at
> > /libpath/Bio/DB/BioSQL/BaseDriver.pm line 564, <GEN0> line 116.
> >
> > -------------------- WARNING ---------------------
> > MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed,
> > values were ("PROVISIONAL REFSEQ: This record
> > has not yet been subject to final NCBI review. The reference
> > sequence was derived from J04733.1. ","1") FKs (3)
> > ERROR:  Cannot insert a duplicate key into unique index
> > comment_bioentry_id_key
> > ---------------------------------------------------
> > NOTICE:  current transaction is aborted, queries ignored until
> > end of transaction block
> > DBD::Pg::st fetchall_arrayref failed: no statement executing at
> > /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line
> > 801, <GEN0> line 116.
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Could not store NM_012500:
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: create: object (Bio::Annotation::Comment) failed to insert
> > or to be found by unique key
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> > /stf/sys64/perl/newlib/Bio/Root/Root.pm:342
> > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> > /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:197
> > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> > /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:240
> > STACK: Bio::DB::Persistent::PersistentObject::store
> > /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
> > STACK:
> > Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children
> > /stf/biocgi/limje/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:220
> >
> > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> > /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205
> > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> > /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:240
> > STACK: Bio::DB::Persistent::PersistentObject::store
> > /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
> > STACK: Bio::DB::BioSQL::SeqAdaptor::store_children
> > /stf/biocgi/limje/Bio/DB/BioSQL/SeqAdaptor.pm:179
> > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> > /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:260
> > STACK: Bio::DB::Persistent::PersistentObject::store
> > /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
> > STACK: ../load_seqdatabase.pl:400
> > -----------------------------------------------------------
> >
> >
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> > /stf/sys64/perl/newlib/Bio/Root/Root.pm:342
> > STACK: ../load_seqdatabase.pl:409
> > -----------------------------------------------------------
> >
> > Thanks for helping out.
> >
> > -Jansen
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------



More information about the BioSQL-l mailing list