[BioSQL-l] Bug in loading duplicate but non-identical swissprot references

Hilmar Lapp hlapp at gnf.org
Thu Apr 17 02:01:26 EDT 2003


This has been seen and debated before. See

	http://open-bio.org/pipermail/biosql-l/2003-March/000277.html

for the start of the thread (it might be useful to read the whole 
thread if you hadn't followed it originally). There are some 
non-obvious implications.

I haven't gotten around to implement the solution. It involves special 
case code; I wanted to come up with an implementation that limits the 
damage.

	-hilmar

On Thursday, April 17, 2003, at 12:00  AM, Elia Stupka wrote:

> Hello there,
>
> we are finally finding the time to tackle properly BioSQL and we are 
> loading in entire databases, hoping to help fix issues either on the 
> BioSQL front and/or on the SeqIO front which are most likely to arise 
> as always not from bugs in the code but from dirt in the databases....
>
> The first problem we have identified is that sometimes the same 
> references are cited without their MEDLINE identifiers and other times 
> their MEDLINE identifiers are included. This means the first time it 
> is encountered it is given a CRC-64 value and a NULL dbxref foreign 
> key and thus the UK check is done on the CRC, the next time it has a 
> MEDLINE id and so the UK check gets done on the dbxref_id and it gets 
> stored as if it was a new record... at which point the insert fails 
> because the CRC is duplicated.
>
> Checking only for crc in the get_unique_key_query method of 
> ReferenceAdaptor solves the duplication prolbem and lets the medline 
> dbxref be stored when it is encountered, however it does not trigger 
> the update of the dbxref column in the reference table...
>
> ...I am still venturing in this wonderful world of UKs and FKs and 
> persistence so I got stuck at this point, suggestions? The main 
> problem seems to be that we want to convert an orphan (with no FK) to 
> a child...
>
> Elia
>
> ---
> Bioinformatics Program Manager
> Temasek Life Sciences Laboratory
> 1, Research Link
> Singapore 117604
> Tel. +65 6874 4945
> Fax. +65 6872 7007
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the BioSQL-l mailing list