[BioSQL-l] load_seqdatabase.pl warnings and errors
michael watson (IAH-C)
michael.watson at bbsrc.ac.uk
Wed May 20 05:52:13 EDT 2009
Hi Guys
Ok, the warnings were due to duplicate sequences - I had downloaded a
stream using Bio::DB::GenBank and I guess I assumed that would mean only
unique entries were sent back. Using "--flatlookup --remove" gets rid
of the warnings.
Now for NC_003992.gbk...
To answer Hilmar's question:
mysql> select * from reference where crc = "CRC-E8D3CBBD80002FA1";
+--------------+-----------+--------------------------------------------
---------------------------------------------------------+--------------
-----+---------+----------------------+
| reference_id | dbxref_id | location
| title | authors | crc |
+--------------+-----------+--------------------------------------------
---------------------------------------------------------+--------------
-----+---------+----------------------+
| 152 | NULL | Submitted (12-AUG-2004) National Center for
Biotechnology Information, NIH, Bethesda, MD 20894, USA | Direct
Submission | NULL | CRC-E8D3CBBD80002FA1 |
+--------------+-----------+--------------------------------------------
---------------------------------------------------------+--------------
-----+---------+----------------------+
And when I run load_seqdatabase.pl on NC_003992.gbk alone I still get:
perl load_seqdatabase.pl --host localhost --dbname fmd_biosql --format
genbank --dbuser removed --dbpass removed --flatlookup --remove
NC_003992.gbk
Loading NC_003992.gbk ...
-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values
were ("","Direct Submission","Submitted (12-AUG-2004) National Center
for Biotechnology Information, NIH, Bethesda, MD 20894,
USA","CRC-E8D3CBBD80002FA1","1","8203","") FKs (<NULL>)
Duplicate entry 'CRC-E8D3CBBD80002FA1' for key 3
---------------------------------------------------
Could not store NC_003992:
------------- EXCEPTION -------------
MSG: create: object (Bio::Annotation::Reference) failed to insert or to
be found by unique key
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
/usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/D
B/BioSQL/BasePersistenceAdaptor.pm:206
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
/usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/D
B/BioSQL/BasePersistenceAdaptor.pm:251
STACK Bio::DB::Persistent::PersistentObject::store
/usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/D
B/Persistent/PersistentObject.pm:271
STACK Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children
/usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/D
B/BioSQL/AnnotationCollectionAdaptor.pm:217
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
/usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/D
B/BioSQL/BasePersistenceAdaptor.pm:214
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
/usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/D
B/BioSQL/BasePersistenceAdaptor.pm:251
STACK Bio::DB::Persistent::PersistentObject::store
/usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/D
B/Persistent/PersistentObject.pm:271
STACK Bio::DB::BioSQL::SeqAdaptor::store_children
/usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/D
B/BioSQL/SeqAdaptor.pm:224
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
/usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/D
B/BioSQL/BasePersistenceAdaptor.pm:214
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
/usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/D
B/BioSQL/BasePersistenceAdaptor.pm:251
STACK Bio::DB::Persistent::PersistentObject::store
/usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/D
B/Persistent/PersistentObject.pm:271
STACK (eval) load_seqdatabase.pl:622
STACK toplevel load_seqdatabase.pl:604
--------------------------------------
at load_seqdatabase.pl line 635
And I still have:
mysql> select * from reference where crc = "CRC-E8D3CBBD80002FA1";
+--------------+-----------+--------------------------------------------
---------------------------------------------------------+--------------
-----+---------+----------------------+
| reference_id | dbxref_id | location
| title | authors | crc |
+--------------+-----------+--------------------------------------------
---------------------------------------------------------+--------------
-----+---------+----------------------+
| 152 | NULL | Submitted (12-AUG-2004) National Center for
Biotechnology Information, NIH, Bethesda, MD 20894, USA | Direct
Submission | NULL | CRC-E8D3CBBD80002FA1 |
+--------------+-----------+--------------------------------------------
---------------------------------------------------------+--------------
-----+---------+----------------------+
1 row in set (0.01 sec)
Could this be because bases 1 to 8203 of the sequence have three
references, and the crc is created on the first and then duplicated on
the second, thus causing a problem?
Cheers
Mick
-----Original Message-----
From: Hilmar Lapp [mailto:hlapp at gmx.net]
Sent: 19 May 2009 13:25
To: michael watson (IAH-C)
Cc: biosql-l at lists.open-bio.org
Subject: Re: [BioSQL-l] load_seqdatabase.pl warnings and errors
On May 19, 2009, at 4:17 AM, michael watson (IAH-C) wrote:
> [...]
> -------------------- WARNING ---------------------
>
> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values
> were
> ("AY312586S1","32307407","AY312586","Foot-and-mouth disease virus O
> isolate O/SKR/2000 S fragment, complete
>
> 1,9762)
>
> Duplicate entry 'AY312586-1-1' for key 2
>
> ---------------------------------------------------
This suggests that a sequence with the above accession or GI number
was already in the database, or occurs in the file twice.
If this situation is possible, you will have to pass the --lookup (or
--flatlookup) flag to the script, and specify how you want updates to
take place when they are necessary (options --noupdate, --remove, and
--mergeobjs).
> -------------------- WARNING ---------------------
>
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,
> values were ("","1") FKs (324,3,4)
>
> Duplicate entry '324-3-4-1' for key 2
> ---------------------------------------------------
I suspect that 324 is the primary key of the sequence record that
raised the duplicate entry warning above. Can you check that?
If the insert is turned into an update, these warnings should go away
too.
> [...]
> -------------------- WARNING ---------------------
>
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,
> values were ("","1") FKs (323,3,4)
>
> Duplicate entry '323-3-4-1' for key 2
>
> ---------------------------------------------------
Similar to before, except 323 is probably the primary key for AY312587.
> [...]
> -------------------- WARNING ---------------------
>
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,
> values were ("","1") FKs (325,3,4)
>
> Duplicate entry '325-3-4-1' for key 2
>
> ---------------------------------------------------
And if the order of messages is preserved correctly, 325 would be the
primary key of AY312589.
> [...]
> -------------------- WARNING ---------------------
>
> MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,
> values
> were ("","Direct Submission","Submitted (12-AUG-2004) National Center
> for Biotechnology Information, NIH,
>
> C-E8D3CBBD80002FA1","1","8170","") FKs (<NULL>)
>
> Duplicate entry 'CRC-E8D3CBBD80002FA1' for key 3
>
> ---------------------------------------------------
This one is odd. Can you check which existing entry you have with
reference.crc = 'CRC-E8D3CBBD80002FA1'?
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the BioSQL-l
mailing list