[BioSQL-l] Re: memory error while loading SwissProt into Oracle using bioperl-db

Hilmar Lapp hlapp at gnf.org
Tue Jun 21 14:32:47 EDT 2005


Good to know that the memory leak is not constrained to MacOSX. BTW 
aside from using a multi-threading disabled perl, I could also get rid 
of the memory leak by using the Instant Client from Oracle (which is 
10g, but will connect fine to a 9i database). Again, that's on MacOSX 
but chances are it will have the same effect for you.

	-hilmar

On Jun 21, 2005, at 5:15 AM, Jana Bauckmann wrote:

> Hi,
>
> I solved my problems to import SwissProt. It turned out to be a 
> mixture of
> reasons -- so I thought it could be interesting for you:
>
> 1) An upgrade to BioPerl 1.5 solved my problems with integrity 
> constraint
> errors.
>
> 2) I got a memory leak with DBD::Oracle, Oracle 9.2 and multi-thread
> enabled perl 5.8.1 -- as you assumed. (The used memory growed up to 2GB
> while inserting 30000 records.) I installed perl as multi-thread 
> disabled
> version and everything worked fine.
>
> Thank you very much,
> Jana
>
>
> On Tue, 14 Jun 2005, Hilmar Lapp wrote:
>
>>
>> On Jun 14, 2005, at 2:52 AM, Jana Bauckmann wrote:
>>
>>> Hi,
>>>
>>> I would like to load SwissProt data into my Oracle 9.2 database with
>>> BioSQL as schema using load_seqdatabase.pl from bioperl-db. I've got
>>> two
>>> problems:
>>>
>>> 1) I get many (about 1300) warnings stating integrity constraint
>>> errors:
>>>
>>> ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) violated - 
>>> parent
>>> key not found (DBD ERROR: OCIStmtExecute)
>>>
>>> ORA-01400: cannot insert NULL into
>>> ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS")
>>> (DBD ERROR: OCIStmtExecute)
>>
>> If there is indeed no authors for the respective reference in the
>> respective SwissProt entries then this is expected because
>> Reference.Authors may not be NULL.
>>
>> You should, however, see more than just the error message above;
>> supposedly there is a warning message following or preceding it that
>> informs about not all foreign keys succeeded to insert, and the 
>> message
>> should give the primary key. This should be the primary key for the
>> bioentry that should have gotten the reference attached. Using SQL you
>> should then be able to identify which record it is and then you can
>> look it up on the Swissprot site or in your Swissprot source file.
>>
>> If the bioentry itself fails to load because of this problem then you
>> should see an error message to this effect, with full stack trace.
>> Otherwise the bioentry did load, just the reference didn't, and if you
>> don't really need this particular reference, you don't need to worry
>> about it.
>>
>> You may also want to consider trying to upgrade to a CVS snapshot from
>> either the 1.4 branch or the main trunk. There have been a few fixes 
>> to
>> modules that I believe include the swissprot parser.
>>
>>>
>>> 2) The script stops after 2 hours (34500 tuples in table BioEntry) 
>>> with
>>> message: Out of memory!
>>>
>>> I guess problem 1 causes problem 2. Is this reasonable or do I have 
>>> two
>>> separated problems?
>>
>> The one before may not even be a real problem, see above. It is
>> extremely unlikely that it causes the memory problem.
>>
>> Swissprot is is a large, very diverse, and richly annotated data
>> source, and because bioperl-db caches a lot of stuff like ontology
>> terms, references, and dbxrefs the loader process will eventually use
>> up anywhere between 500MB and 1.3GB of RAM.
>>
>> Given the amount of memory you have this shouldn't be a limitation
>> though at all, unless maybe if you gave all the memory to Oracle
>> running on the same machine.
>>
>> I've had a memory leak issue with DBD::Oracle, the Oracle 9iR2 client
>> library, and multi-thread enabled perl 5.8.1 on MacOSX. You may be
>> seeing a similar problem. Try watching the loader process in top and
>> see how fast the memory consumption grows. It will grow due to the
>> object cache filling up, but if you see it eating up more than 1GB
>> before 100,000 records loaded you're likely to have hit a memory leak.
>>
>> If that's the case you'll have to rebuild your own perl from source
>> with multi-threading disabled.
>>
>> 	-hilmar
>>
>>>
>>> I run Oracle and the load script on the same machine with:
>>> Suse Linux 9.0 (kernel 2.4.21-291-smp) with  12 GB RAM
>>> perl 5.8.1, built for i586-linux-thread-multi
>>> bioperl 1.4
>>> bioperl-db 0.1
>>
>> BTW I'm assuming this is not correct; otherwise the latest BioSQL
>> schema wouldn't be supported, let alone the Oracle version of it. You
>> probably obtained a snapshot from CVS?
>>
>>> DBI 1.48
>>> DBD::Oracle 1.16
>>> Oracle 9.2
>>> BioSQL schema for Oracle (downloaded from http://cvs.open-bio.org/ on
>>> 6th
>>> June 2005)
>>>
>>> Thanks for any suggestions,
>>> Jana
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>> --
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the BioSQL-l mailing list