[BioSQL-l] problem loading NCBI_taxonomy database into BioSQL bioseqdb

Nick Matzke matzke at berkeley.edu
Thu Sep 4 01:11:47 UTC 2008



Hilmar Lapp wrote:
> 
> On Sep 3, 2008, at 6:44 PM, Nick Matzke wrote:
> 
>> Well, I'm not sure what I did, but some combination of these things 
>> seems to have worked.
> 
> Great we got you going!
> 
>> [...]
>> 3. (Make sure you have an empty version of the db, at least for me I 
>> got errors if I had already loaded sequences etc. into it...I got 
>> errors like this:
>>
>> ==========================================
>> note: node (28;331111;27;species;;) is retired; failed to delete: 
>> Cannot delete or update a parent row: a foreign key constraint fails 
>> (`bioseqdb/bioentry`, CONSTRAINT `FKtaxon_bioentry` FOREIGN KEY 
>> (`taxon_id`) REFERENCES `taxon` (`taxon_id`))
>> note: node (70;300268;69;species;;) is retired; failed to delete: 
>> Cannot delete or update a parent row: a foreign key constraint fails 
>> (`bioseqdb/bioentry`, CONSTRAINT `FKtaxon_bioentry` FOREIGN KEY 
>> (`taxon_id`) REFERENCES `taxon` (`taxon_id`))
>> note: node (77;3002
>> ==========================================
>>
> 
> These aren't fatal, right? What is basically means is that your 
> sequences referenced taxa that are not yet or not anymore in the NCBI 
> taxonomy download.


Those weren't fatal, but eventually I hit this and it crashed:

==========================================
note: node (4484;312017;4483;species;;) is retired; failed to delete: 
Cannot delete or update a parent row: a foreign key constraint fails 
(`bioseqdb/bioentry`, CONSTRAINT `FKtaxon_bioentry` FOREIGN KEY 
(`taxon_id`) REFERENCES `taxon` (`taxon_id`))
note: node (4490;324602;4489;species;;) is retired; failed to delete: 
Cannot delete or update a parent row: a foreign key constraint fails 
(`bioseqdb/bioentry`, CONSTRAINT `FKtaxon_bioentry` FOREIGN KEY 
(`taxon_id`) REFERENCES `taxon` (`taxon_id`))
failed to insert node (4577;4577;4575;species;1;1): Duplicate entry 
'4577' for key 2 at 
/bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl 
line 581.
==========================================

...but like I said it worked fine on an empty database which was fine 
for my purposes.

Thanks!




> 
> The script doesn't yet process the change log that NCBI also produces. 
> So if two nodes get merged into one, or one gets split into two new, the 
> script can't migrate the data that you already have. Nodes that are in 
> the database but not in the taxonomy dump from NCBI will be considered 
> retired, and the script tries to delete them if there aren't any 
> sequences yet pointing to them.
> 
>     -hilmar

-- 
====================================================
Nicholas J. Matzke
Ph.D. student, Graduate Student Researcher
Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html
Lab phone: 510-643-6299
Dept. fax: 510-643-6264
Cell phone: 510-301-0179
Email: matzke at berkeley.edu

Office hours for Bio1B, Spring 2008: Biology: Plants, Evolution, Ecology
VLSB 2013, Monday 1-1:30 (some TA there for all hours during work week)

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140
====================================================



More information about the BioSQL-l mailing list