[Biojava-l] A Simple Genbank Parser Runs Out of Memory

Fri, 1 Mar 2002 15:26:05 -0500

I created a simple parser based on the Genbank demo, to test a proof of concept.
I ran it on a large Genbank source file ( contains over 160,000 "sequences").
The program  processed 34,170 sequences then crashed with the
java.lang.OutOfMemoryError .   I ran this on an NT with 768mb ram, using the
1.2.2 JVM.

The interesting thing about this is that I was able watch the size of the JVM.
For the first 34,169 sequences it reached a steady state at about 14 mb with
normal expansion and contraction.  Then it processed a sequence and the JVM size
jumped up to 85 mb and crashed.  This scenario was exactly reproducible.  To
make sure it wasn't a data issue,  I took 5 sequences before and after the
"crashing" sequence and put them in a separate file.  I was able to process this
file with no interesting problems.

The net result was that I was able to "solve" this problem by pre-allocating a
larger JVM.  However, I am concerned when I see an expansion of the JVM by 70 mb
when it does a simple parse of a sequence.

Are any of you aware of perhaps garbage collecting problems in jdk 1.2.2 ?

Any other ideas?

Thanks and Best Regards,
Larry Cantey