[BioSQL-l] Re: memory error while loading SwissProt into Oracle using bioperl-db

Jana Bauckmann jana.bauckmann at informatik.hu-berlin.de
Tue Jun 21 08:15:01 EDT 2005


Hi,

I solved my problems to import SwissProt. It turned out to be a mixture of
reasons -- so I thought it could be interesting for you:

1) An upgrade to BioPerl 1.5 solved my problems with integrity constraint
errors.

2) I got a memory leak with DBD::Oracle, Oracle 9.2 and multi-thread
enabled perl 5.8.1 -- as you assumed. (The used memory growed up to 2GB
while inserting 30000 records.) I installed perl as multi-thread disabled
version and everything worked fine.

Thank you very much,
Jana


On Tue, 14 Jun 2005, Hilmar Lapp wrote:

>
> On Jun 14, 2005, at 2:52 AM, Jana Bauckmann wrote:
>
> > Hi,
> >
> > I would like to load SwissProt data into my Oracle 9.2 database with
> > BioSQL as schema using load_seqdatabase.pl from bioperl-db. I've got
> > two
> > problems:
> >
> > 1) I get many (about 1300) warnings stating integrity constraint
> > errors:
> >
> > ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) violated - parent
> > key not found (DBD ERROR: OCIStmtExecute)
> >
> > ORA-01400: cannot insert NULL into
> > ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS")
> > (DBD ERROR: OCIStmtExecute)
>
> If there is indeed no authors for the respective reference in the
> respective SwissProt entries then this is expected because
> Reference.Authors may not be NULL.
>
> You should, however, see more than just the error message above;
> supposedly there is a warning message following or preceding it that
> informs about not all foreign keys succeeded to insert, and the message
> should give the primary key. This should be the primary key for the
> bioentry that should have gotten the reference attached. Using SQL you
> should then be able to identify which record it is and then you can
> look it up on the Swissprot site or in your Swissprot source file.
>
> If the bioentry itself fails to load because of this problem then you
> should see an error message to this effect, with full stack trace.
> Otherwise the bioentry did load, just the reference didn't, and if you
> don't really need this particular reference, you don't need to worry
> about it.
>
> You may also want to consider trying to upgrade to a CVS snapshot from
> either the 1.4 branch or the main trunk. There have been a few fixes to
> modules that I believe include the swissprot parser.
>
> >
> > 2) The script stops after 2 hours (34500 tuples in table BioEntry) with
> > message: Out of memory!
> >
> > I guess problem 1 causes problem 2. Is this reasonable or do I have two
> > separated problems?
>
> The one before may not even be a real problem, see above. It is
> extremely unlikely that it causes the memory problem.
>
> Swissprot is is a large, very diverse, and richly annotated data
> source, and because bioperl-db caches a lot of stuff like ontology
> terms, references, and dbxrefs the loader process will eventually use
> up anywhere between 500MB and 1.3GB of RAM.
>
> Given the amount of memory you have this shouldn't be a limitation
> though at all, unless maybe if you gave all the memory to Oracle
> running on the same machine.
>
> I've had a memory leak issue with DBD::Oracle, the Oracle 9iR2 client
> library, and multi-thread enabled perl 5.8.1 on MacOSX. You may be
> seeing a similar problem. Try watching the loader process in top and
> see how fast the memory consumption grows. It will grow due to the
> object cache filling up, but if you see it eating up more than 1GB
> before 100,000 records loaded you're likely to have hit a memory leak.
>
> If that's the case you'll have to rebuild your own perl from source
> with multi-threading disabled.
>
> 	-hilmar
>
> >
> > I run Oracle and the load script on the same machine with:
> > Suse Linux 9.0 (kernel 2.4.21-291-smp) with  12 GB RAM
> > perl 5.8.1, built for i586-linux-thread-multi
> > bioperl 1.4
> > bioperl-db 0.1
>
> BTW I'm assuming this is not correct; otherwise the latest BioSQL
> schema wouldn't be supported, let alone the Oracle version of it. You
> probably obtained a snapshot from CVS?
>
> > DBI 1.48
> > DBD::Oracle 1.16
> > Oracle 9.2
> > BioSQL schema for Oracle (downloaded from http://cvs.open-bio.org/ on
> > 6th
> > June 2005)
> >
> > Thanks for any suggestions,
> > Jana
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>



More information about the BioSQL-l mailing list