[BioSQL-l] load_seqdatabase.pl warnings and errors

Hilmar Lapp hlapp at gmx.net
Wed May 20 11:10:20 EDT 2009


Indeed changing the lookup will have no effect since deletion of  
bioentries doesn't cascade to references (only to bioentry-to- 
reference associations).

What I don't understand yet is how you get the CRC clash. Normally  
this kind of situation can happen if the first occurrence does not and  
the second does have PMID, by which it will be looked up, lookup fails  
(b/c the first occurrence didn't come with PMID), resulting in an  
insert of the erroneously deemed "new" reference, which then fails  
with a CRC clash.

However, there is no PMID nor any other identifier here, so I'll have  
to look into the code to find out why the second occurrence is either  
not looked up before an insert is attempted, or if it is looked up,  
why the lookup fails to find the record stored earlier.

	-hilmar

On May 20, 2009, at 7:25 AM, michael watson (IAH-C) wrote:

> We have a winner :)
>
> NC_003992, NC_011452, NC_011451, NC_011450 all share at least one  
> reference.
>
> Would changing --flatlookup to --lookup change the behaviour so it  
> checks for an existing reference before trying to insert the  
> duplicate?
>
> The answer is no :( (see below).
>
> I guess this may need some coding then!
>
> Thanks!
> Mick
>
> perl load_seqdatabase.pl --host localhost --dbname fmd_biosql -- 
> format genbank --dbuser removed --dbpass removed --lookup --remove  
> NC_003992.gbk
> Loading NC_003992.gbk ...
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,  
> values were ("","Direct Submission","Submitted (12-AUG-2004)  
> National Center for Biotechnology Information, NIH, Bethesda, MD  
> 20894, USA","CRC-E8D3CBBD80002FA1","1","8203","") FKs (<NULL>)
> Duplicate entry 'CRC-E8D3CBBD80002FA1' for key 3
> ---------------------------------------------------
> Could not store NC_003992:
> ------------- EXCEPTION  -------------
> MSG: create: object (Bio::Annotation::Reference) failed to insert or  
> to be found by unique key
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:206
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:251
> STACK Bio::DB::Persistent::PersistentObject::store /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> Persistent/PersistentObject.pm:271
> STACK Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children / 
> usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/ 
> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:217
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:214
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:251
> STACK Bio::DB::Persistent::PersistentObject::store /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> Persistent/PersistentObject.pm:271
> STACK Bio::DB::BioSQL::SeqAdaptor::store_children /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/SeqAdaptor.pm:224
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:214
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:251
> STACK Bio::DB::Persistent::PersistentObject::store /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> Persistent/PersistentObject.pm:271
> STACK (eval) load_seqdatabase.pl:622
> STACK toplevel load_seqdatabase.pl:604
>
> --------------------------------------
>
> at load_seqdatabase.pl line 635
>
> -----Original Message-----
> From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com]  
> On Behalf Of Peter
> Sent: 20 May 2009 11:59
> To: michael watson (IAH-C)
> Cc: Hilmar Lapp; biosql-l at lists.open-bio.org
> Subject: Re: [BioSQL-l] load_seqdatabase.pl warnings and errors
>
> On Wed, May 20, 2009 at 10:52 AM, michael watson (IAH-C)
> <michael.watson at bbsrc.ac.uk> wrote:
>>
>> Hi Guys
>>
>> Ok, the warnings were due to duplicate sequences - I had downloaded a
>> stream using Bio::DB::GenBank and I guess I assumed that would mean  
>> only
>> unique entries were sent back.  Using "--flatlookup --remove" gets  
>> rid
>> of the warnings.
>
> Great - easy :)
>
>> Now for NC_003992.gbk...
>>
>> To answer Hilmar's question:
>> ...
>> And when I run load_seqdatabase.pl on NC_003992.gbk alone I still  
>> get:
>>
>> perl load_seqdatabase.pl --host localhost --dbname fmd_biosql -- 
>> format
>> genbank --dbuser removed --dbpass removed --flatlookup --remove
>> NC_003992.gbk
>>
>> Loading NC_003992.gbk ...
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,  
>> values
>> were ("","Direct Submission","Submitted (12-AUG-2004) National Center
>> for Biotechnology Information, NIH, Bethesda, MD 20894,
>> USA","CRC-E8D3CBBD80002FA1","1","8203","") FKs (<NULL>)
>> Duplicate entry 'CRC-E8D3CBBD80002FA1' for key 3
>> ---------------------------------------------------
>> Could not store NC_003992:
>> ------------- EXCEPTION  -------------
>> MSG: create: object (Bio::Annotation::Reference) failed to insert  
>> or to
>> be found by unique key
>> ...
>
> I would guess that the problem is this rather generic reference in
> NC_003992 may be repeated exactly in another genome (causing the CRC
> collision):
>
> CONSRTM   NCBI Genome Project
> TITLE     Direct Submission
> JOURNAL   Submitted (12-AUG-2004) National Center for Biotechnology
> Information, NIH, Bethesda, MD 20894, USA
>
> See http://www.ncbi.nlm.nih.gov/nuccore/NC_011452
>
> i.e. Could there be another direct submission by the NCBI on that date
> in your collection?  You could search the database looking for that
> CRC and trace it back to a bioentry, or just try grep for "JOURNAL
> Submitted (12-AUG-2004) National Center for Biotechnology" on your
> GenBank files. e.g. Something like this SQL statement might be
> interesting:
>
> SELECT bioentry.accession, reference.title FROM bioentry,
> bioentry_reference, reference WHERE
> bioentry.bioentry_id=bioentry_reference.bioentry_id AND
> bioentry_reference.reference_id=reference.reference_id AND
> reference.crc="CRC-E8D3CBBD80002FA1";
>
> Peter

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================





More information about the BioSQL-l mailing list