[Biojava-l] A Simple Genbank Parser Runs Out of Memory
cantey.lg@pg.com
cantey.lg@pg.com
Fri, 1 Mar 2002 15:26:05 -0500
I created a simple parser based on the Genbank demo, to test a proof of concept.
I ran it on a large Genbank source file ( contains over 160,000 "sequences").
The program processed 34,170 sequences then crashed with the
java.lang.OutOfMemoryError . I ran this on an NT with 768mb ram, using the
1.2.2 JVM.
The interesting thing about this is that I was able watch the size of the JVM.
For the first 34,169 sequences it reached a steady state at about 14 mb with
normal expansion and contraction. Then it processed a sequence and the JVM size
jumped up to 85 mb and crashed. This scenario was exactly reproducible. To
make sure it wasn't a data issue, I took 5 sequences before and after the
"crashing" sequence and put them in a separate file. I was able to process this
file with no interesting problems.
The net result was that I was able to "solve" this problem by pre-allocating a
larger JVM. However, I am concerned when I see an expansion of the JVM by 70 mb
when it does a simple parse of a sequence.
Are any of you aware of perhaps garbage collecting problems in jdk 1.2.2 ?
Any other ideas?
Thanks and Best Regards,
Larry Cantey