Bug in loading duplicate but non-identical swissprot references
elia at tll.org.sg
Thu Apr 17 16:00:29 EDT 2003
we are finally finding the time to tackle properly BioSQL and we are
loading in entire databases, hoping to help fix issues either on the
BioSQL front and/or on the SeqIO front which are most likely to arise
as always not from bugs in the code but from dirt in the databases....
The first problem we have identified is that sometimes the same
references are cited without their MEDLINE identifiers and other times
their MEDLINE identifiers are included. This means the first time it is
encountered it is given a CRC-64 value and a NULL dbxref foreign key
and thus the UK check is done on the CRC, the next time it has a
MEDLINE id and so the UK check gets done on the dbxref_id and it gets
stored as if it was a new record... at which point the insert fails
because the CRC is duplicated.
Checking only for crc in the get_unique_key_query method of
ReferenceAdaptor solves the duplication prolbem and lets the medline
dbxref be stored when it is encountered, however it does not trigger
the update of the dbxref column in the reference table...
...I am still venturing in this wonderful world of UKs and FKs and
persistence so I got stuck at this point, suggestions? The main problem
seems to be that we want to convert an orphan (with no FK) to a child...
Bioinformatics Program Manager
Temasek Life Sciences Laboratory
1, Research Link
Tel. +65 6874 4945
Fax. +65 6872 7007
More information about the BioSQL-l