[BioSQL-l] load_seqdatabase fails when loading refseq plant files
Mike Muratet
muratem at eng.uah.edu
Fri Aug 11 12:10:30 EDT 2006
Hello all
I am using biosql-schema/bioperl-db to load Refseq entries into a biosql
database. I don't see any version info in the files, but I downloaded
everything in the last month or so and everything passed all the tests
when installed. I am using perl 5.8.5, mysql 5.0.22, DBI-1.5.1,
DBD-mysql-3.006. I was loading plant file from Refseq rel 18:
load_seqdatabase.pl --dbname biosql
--lookup --u --namespace plant --format genbank --safe plant*.rna.gbff.gz
and it crashed after about 30K of 60K records:
at /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/scripts/biosql/load_seqdatabase.pl
line 633
-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values
were ("","Direct Submission","Submitted (01-JUL-2004) National Center for
Biotechnology Information, National Institutes of Health, Bethesda 20894,
United States of America","CRC-6F1453182E2BAC3F","1","786","") FKs
(<NULL>)
Duplicate entry 'CRC-6F1453182E2BAC3F' for key 3
---------------------------------------------------
Could not store XM_472403:
------------- EXCEPTION -------------
MSG: create: object (Bio::Annotation::Reference) failed to insert or to be
found by unique key
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:208
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
STACK Bio::DB::Persistent::PersistentObject::store
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm:272
STACK Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:219
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:216
t
I traced the error back through the source and database and found that
XM_472403 has the same CRC value as XM_473880. I actually got many errors of this type,
but only the last one crashed the script (in spite of --safe).
Should there be more info included in the CRC field? I am weak when
it comes to RDBMs, but looking at the schema, I would guess that the CRC field
was added to make an otherwise degenerate key unique. Would it help to add
more fields to the CRC, or another key? The former might be done without
have to change a lot of code.
Thanks
Mike
More information about the BioSQL-l
mailing list