[BioSQL-l] null title and CRC

Angel Pizarro angel at mail.med.upenn.edu
Thu Aug 10 20:05:08 UTC 2006


Here are a set of records that make a new install of biosql fail b/c of 
the CRC constraint using the script :
 bioperl-db/scripts/biosql/load_seqdatabase.pl

My test setup was latest CVS tarball (as of last week ;) ) of bioperl, 
mysql 5.0. Also recreated the error on a fresh postgres 7.4.8 (and 8.1) 
install.  I ran the script like so:

perl ~/bin/load_seqdatabase.pl --dsn "dbi:mysql:bstest" --format genbank 
--dbuser xxxx --dbpass xxxx --namespace gb --lookup 
test_load_seqdatabase_crc.gbff

Here is the debug error message from one of the runs I did:
> --------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, 
> values were ("","Danio rerio small nuclear ribonucleoprotein 
> polypeptide C, mRNA (cDNA clone MGC:109792 
> IMAGE:7292940)","Unpublished 
> (2005)","CRC-0E44E80E2C988097","1","159","") FKs (<NULL>)
> ERROR:  duplicate key violates unique constraint "reference_crc_key"

Cheers,
-angel


Angel Pizarro wrote:
> Hilmar Lapp wrote:
>   
>> I think I need to debug this. If bioperl-db stumbles over this, then 
>> it sounds like that's what needs to be fixed.
>>
>> Can you or somebody else provide with two sample records that 
>> exemplify (i.e., replicate) the problem and which I can turn into a 
>> test case?
>>
>>     
> Since these where bulk loads, I am not sure which records conflicted, 
> but I'll have a poke around and see if I can grab a test set for you.
> -angel
>
>   
>>     -hilmar
>>
>> On Aug 3, 2006, at 2:12 PM, Angel Pizarro wrote:
>>
>>     
>>>  From hilmar:
>>>       
>>>> The CRC for references uses the authors, title, and location
>>>> attributes in Bioperl-db, and empty (or null) strings default to the
>>>> string "<undef>".
>>>>
>>>> If title is empty and authors and location do not distinguish two
>>>> references, then why do you want to have two rows for those
>>>> references? Basically, there are identical for all intents and
>>>> purposes, or are they not?
>>>>
>>>>     -hilmar
>>>>         
>>> Sorry for not replying to the original thread, but I just joined this 
>>> list.
>>> This was an issue for me with bioperl loading as well, since I was using
>>> the same biosql instance to load two different biodatabases with the
>>> same entry. Specifically, I loaded IPI, which has no feature table in
>>> the entries, and the genbank equivalents to get the feature tables.
>>> Namely the constraint caused an error when the the genbank record was
>>> loaded.
>>>
>>> I think that this is primarily an issue with bioperl, but I raise it
>>> here to make the java folks aware of the potential pitfall and maybe ask
>>> if whether the CRC should be calculated with the biodatabase in mind?
>>> Probably not, since as hilmar states, it's still the same reference.
>>>
>>> BTW - I solved the issue by dropping the constraint, since I really
>>> don't care about references. Not optimal, but certainly easiest thing to
>>> do ;)
>>>
>>> -angel
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>
>>>       
>> --===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>     
>
>   

-- 
Angel Pizarro
Director, Bioinformatics Facility
Institute for Translational Medicine and Therapeutics
University of Pennsylvania
806 BRB II/III
421 Curie Blvd.
Philadelphia, PA 19104-6160

P: 215-573-3736
F: 215-573-9004
E: angel at mail.med.upenn.edu

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test_load_seqdatabase_crc.gbff
URL: <http://lists.open-bio.org/pipermail/biosql-l/attachments/20060810/1e7d3285/attachment.ksh>


More information about the BioSQL-l mailing list