[BioSQL-l] Help with load_seqdatabase.pl

Mon Jan 27 13:16:36 EST 2003

This is what versioning is for I believe. Biosql supports version for 
bioentry as well as separately for biosequence (even though I'm not 
aware that bioperl supports that, but I may be mistaken). I.e., you are 
not required to update a sequence upon version change; you may also 
leave the old one in place and add the new one.

The problem with this is that distinguishing between current and 
superseded versions is not as easy a matter as checking for a flag. So, 
queries across many bioentries get kind of involved in order to 
disregard say superseded versions (but it's still possible). Biosql, 
however, is infinitely extensible through its qualifier/value system. 
Using that system, you can assign any flag you choose. E.g.,

	$annval = Bio::Annotation::SimpleValue->new(-tagname => "is_current",
                                                 -value => "TRUE");
     $seq->annotation->add_Annotation($annval);
     # then store in the database

To later retrieve:

	SELECT e.accession, e.entry_version -- or whatever
	FROM   bioentry e, bioentry_qualifier_value boa, ontology_term o
	WHERE  boa.bioentry_id = e.bioentry_id
	AND    boa.ontology_term_id = o.ontology_term_id
	AND    o.term_name = 'is_current'
	AND    boa.qualifier_value = 'TRUE';

You can also contrain for this in object-level queries. Let me know if 
you'd like to see an example for how to set up such a query (generally 
speaking, look at t/query.t to get an idea what the procedure looks 
like).

	-hilmar

On Monday, January 27, 2003, at 11:03  AM, Jansen E Lim wrote:

> Hilmar,
>
> Thanks for the clarification.  I agree that performing a true update 
> is not
> the most efficient method.
> One question regarding the BioSQL schema: is the interest in having it
> support history?  Thus, when loading
> revised entries, the previous records are flagged as such (e.g., a 
> status
> attribute having a value of [current|history].)
> Historical records are very important with regard to IP issues or just
> trying to figure out what was known about a
> particular sequence when we last ran blast, fasta, genscan and other
> analyses.
>
> Thanks for any info.
>
> Regards,
> -Jansen
>
> Hilmar Lapp wrote:
>
>> Jansen, sorry for the late response. The problem is due to PostgreSQL
>> handling failures within a transaction differently (than MySQL/InnoDB
>> and Oracle). The way the adaptor layer works is that those entities
>> which are practically infinite in number are not looked up before
>> insert, but instead their presence is detected by an insert failing 
>> the
>> UK constraint. Comment is such an entity. PostgreSQL, however, aborts
>> the entire transaction upon such a (handled or not) failure. I have 
>> yet
>> to write certain functions in PL/PgSQL that will get around that
>> problem.
>>
>> Generally speaking though, updating bioentries through --update is not
>> very robust, because 1-n and n-n connected relations require more than
>> a simple update (e.g., the new version of a sequence may have less
>> features or features with a different key than the old version; a
>> simple update would leave you with stale features attached to the
>> bioentry).
>>
>> I have found it much more robust to simply delete associations and
>> FK-connected relations, and re-inserting the new set. So, all that is
>> really UPDATEd in this case is the bioentry (and biosequence) table.
>> For an example of how to do this, have a look at
>> scripts/update-on-new-version.pl, which is a closure you can pass to
>> the --mergeobjs option of load_seqdatabase.pl. I wrote this to update
>> RefSeq, and it works well for me.
>>
>>         -hilmar
>>
>> On Thursday, January 23, 2003, at 11:24  AM, Jansen E Lim wrote:
>>
>>> Hello,
>>>
>>> I seem to be having trouble using the -lookup option of
>>> load_seqdatabase.pl script.  In particular, I wanted to see what
>>> the option
>>> would
>>> do as documented as follows:
>>>             --lookup
>>>             flag to look-up by unique key first, converting the
>>> insert
>>>             into an update if the object is found
>>>
>>> I also tried using --lookup 1 without success.  I have no trouble
>>> using  -noupdate and -remove option with -lookup.
>>>
>>> Here's how I invoke the script:  load_seqdatabase.pl  -dbname
>>> refseq -driver Pg -lookup -format genbank dup.dat
>>> Here's the error message I get:
>>>
>>> DBD::Pg::st execute failed: ERROR:  Cannot insert a duplicate key
>>> into unique index comment_bioentry_id_key at
>>> /libpath/Bio/DB/BioSQL/BaseDriver.pm line 564, <GEN0> line 116.
>>>
>>> -------------------- WARNING ---------------------
>>> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed,
>>> values were ("PROVISIONAL REFSEQ: This record
>>> has not yet been subject to final NCBI review. The reference
>>> sequence was derived from J04733.1. ","1") FKs (3)
>>> ERROR:  Cannot insert a duplicate key into unique index
>>> comment_bioentry_id_key
>>> ---------------------------------------------------
>>> NOTICE:  current transaction is aborted, queries ignored until
>>> end of transaction block
>>> DBD::Pg::st fetchall_arrayref failed: no statement executing at
>>> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line
>>> 801, <GEN0> line 116.
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: Could not store NM_012500:
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: create: object (Bio::Annotation::Comment) failed to insert
>>> or to be found by unique key
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>>> /stf/sys64/perl/newlib/Bio/Root/Root.pm:342
>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
>>> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:197
>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
>>> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:240
>>> STACK: Bio::DB::Persistent::PersistentObject::store
>>> /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
>>> STACK:
>>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children
>>> /stf/biocgi/limje/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:220
>>>
>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
>>> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205
>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
>>> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:240
>>> STACK: Bio::DB::Persistent::PersistentObject::store
>>> /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
>>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children
>>> /stf/biocgi/limje/Bio/DB/BioSQL/SeqAdaptor.pm:179
>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
>>> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:260
>>> STACK: Bio::DB::Persistent::PersistentObject::store
>>> /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
>>> STACK: ../load_seqdatabase.pl:400
>>> -----------------------------------------------------------
>>>
>>>
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>>> /stf/sys64/perl/newlib/Bio/Root/Root.pm:342
>>> STACK: ../load_seqdatabase.pl:409
>>> -----------------------------------------------------------
>>>
>>> Thanks for helping out.
>>>
>>> -Jansen
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>> --
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------