[BioSQL-l] load_seqdatabase.pl warnings and errors

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Wed May 27 08:50:45 EDT 2009


Hi Hilmar

I tried to dig around in the code, but quite frankly I quickly got lost.
What is clear is that the existing reference is not being found in the
cache nor the database, and therefore a unique key violation occurs when
the code tries to insert the object.

I'm pretty stuffed on this project until I can get this sorted out.

If someone tells me where to look I can try and sort out why this
happens, but at the moment (for me) it's like looking for a needle in a
haystack.

Thanks in advance

Mick

-----Original Message-----
From: Hilmar Lapp [mailto:hlapp at gmx.net] 
Sent: 20 May 2009 16:10
To: michael watson (IAH-C)
Cc: Peter; biosql-l at lists.open-bio.org
Subject: Re: [BioSQL-l] load_seqdatabase.pl warnings and errors

Indeed changing the lookup will have no effect since deletion of  
bioentries doesn't cascade to references (only to bioentry-to- 
reference associations).

What I don't understand yet is how you get the CRC clash. Normally  
this kind of situation can happen if the first occurrence does not and  
the second does have PMID, by which it will be looked up, lookup fails  
(b/c the first occurrence didn't come with PMID), resulting in an  
insert of the erroneously deemed "new" reference, which then fails  
with a CRC clash.

However, there is no PMID nor any other identifier here, so I'll have  
to look into the code to find out why the second occurrence is either  
not looked up before an insert is attempted, or if it is looked up,  
why the lookup fails to find the record stored earlier.

	-hilmar

On May 20, 2009, at 7:25 AM, michael watson (IAH-C) wrote:

> We have a winner :)
>
> NC_003992, NC_011452, NC_011451, NC_011450 all share at least one  
> reference.
>
> Would changing --flatlookup to --lookup change the behaviour so it  
> checks for an existing reference before trying to insert the  
> duplicate?
>
> The answer is no :( (see below).
>
> I guess this may need some coding then!
>
> Thanks!
> Mick
>
> perl load_seqdatabase.pl --host localhost --dbname fmd_biosql -- 
> format genbank --dbuser removed --dbpass removed --lookup --remove  
> NC_003992.gbk
> Loading NC_003992.gbk ...
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,  
> values were ("","Direct Submission","Submitted (12-AUG-2004)  
> National Center for Biotechnology Information, NIH, Bethesda, MD  
> 20894, USA","CRC-E8D3CBBD80002FA1","1","8203","") FKs (<NULL>)
> Duplicate entry 'CRC-E8D3CBBD80002FA1' for key 3
> ---------------------------------------------------
> Could not store NC_003992:
> ------------- EXCEPTION  -------------
> MSG: create: object (Bio::Annotation::Reference) failed to insert or  
> to be found by unique key
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:206
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:251
> STACK Bio::DB::Persistent::PersistentObject::store /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> Persistent/PersistentObject.pm:271
> STACK Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children / 
> usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/ 
> Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:217
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:214
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:251
> STACK Bio::DB::Persistent::PersistentObject::store /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> Persistent/PersistentObject.pm:271
> STACK Bio::DB::BioSQL::SeqAdaptor::store_children /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/SeqAdaptor.pm:224
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:214
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:251
> STACK Bio::DB::Persistent::PersistentObject::store /usr/users/ 
> bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/ 
> Persistent/PersistentObject.pm:271
> STACK (eval) load_seqdatabase.pl:622
> STACK toplevel load_seqdatabase.pl:604
>
> --------------------------------------
>
> at load_seqdatabase.pl line 635
>
> -----Original Message-----
> From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com]  
> On Behalf Of Peter
> Sent: 20 May 2009 11:59
> To: michael watson (IAH-C)
> Cc: Hilmar Lapp; biosql-l at lists.open-bio.org
> Subject: Re: [BioSQL-l] load_seqdatabase.pl warnings and errors
>
> On Wed, May 20, 2009 at 10:52 AM, michael watson (IAH-C)
> <michael.watson at bbsrc.ac.uk> wrote:
>>
>> Hi Guys
>>
>> Ok, the warnings were due to duplicate sequences - I had downloaded a
>> stream using Bio::DB::GenBank and I guess I assumed that would mean  
>> only
>> unique entries were sent back.  Using "--flatlookup --remove" gets  
>> rid
>> of the warnings.
>
> Great - easy :)
>
>> Now for NC_003992.gbk...
>>
>> To answer Hilmar's question:
>> ...
>> And when I run load_seqdatabase.pl on NC_003992.gbk alone I still  
>> get:
>>
>> perl load_seqdatabase.pl --host localhost --dbname fmd_biosql -- 
>> format
>> genbank --dbuser removed --dbpass removed --flatlookup --remove
>> NC_003992.gbk
>>
>> Loading NC_003992.gbk ...
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,  
>> values
>> were ("","Direct Submission","Submitted (12-AUG-2004) National Center
>> for Biotechnology Information, NIH, Bethesda, MD 20894,
>> USA","CRC-E8D3CBBD80002FA1","1","8203","") FKs (<NULL>)
>> Duplicate entry 'CRC-E8D3CBBD80002FA1' for key 3
>> ---------------------------------------------------
>> Could not store NC_003992:
>> ------------- EXCEPTION  -------------
>> MSG: create: object (Bio::Annotation::Reference) failed to insert  
>> or to
>> be found by unique key
>> ...
>
> I would guess that the problem is this rather generic reference in
> NC_003992 may be repeated exactly in another genome (causing the CRC
> collision):
>
> CONSRTM   NCBI Genome Project
> TITLE     Direct Submission
> JOURNAL   Submitted (12-AUG-2004) National Center for Biotechnology
> Information, NIH, Bethesda, MD 20894, USA
>
> See http://www.ncbi.nlm.nih.gov/nuccore/NC_011452
>
> i.e. Could there be another direct submission by the NCBI on that date
> in your collection?  You could search the database looking for that
> CRC and trace it back to a bioentry, or just try grep for "JOURNAL
> Submitted (12-AUG-2004) National Center for Biotechnology" on your
> GenBank files. e.g. Something like this SQL statement might be
> interesting:
>
> SELECT bioentry.accession, reference.title FROM bioentry,
> bioentry_reference, reference WHERE
> bioentry.bioentry_id=bioentry_reference.bioentry_id AND
> bioentry_reference.reference_id=reference.reference_id AND
> reference.crc="CRC-E8D3CBBD80002FA1";
>
> Peter

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================






More information about the BioSQL-l mailing list