[Biojava-dev] SequenceDB way too big!!!

Matthew Pocock matthew_pocock at yahoo.co.uk
Fri Jan 31 12:09:27 EST 2003


Hi Murat,

I think (though I may be wrong) that sequences of this sort of size get 
internaly represented using binary bit-packing. I will investigate for 
you. On unixy platforms (Tru64 is a bad offender on this), the vm often 
reserves much more memory than it actualy uses. It tends to grab blocks 
of memory in case it needs it in the future. If the memory is not used 
by the VM, then it is available to other applications, even though it 
shows up on top. Relatively trivial java apps on Tru64 can claim to be 
using 120M, but you can run loads of these apps, far exceeeding the 
physical memory of the box without causing it to swap. Having said this, 
you should not be getting memory-related crashes with this size of file. 
I'll take a peek.

As Mark said, the quick fix is to run java with the "-Xmx<foo>mb" switch 
- replace <foo> with some number of megabytes to allocate e.g. 500. It's 
an upper limit on how far the process can grow before running out of 
memory, and won't require the vm to use that amount of space.

Matthew

Murat Tasan wrote:
> I've just started using biojava to help in my development of sequence
> analysis and searching projects, but I have run into a huge (no pun
> intended) problem.  I make a call to SeqIOTools.readFasta(...), to read a
> FASTA file of approximately 14MB in size.  Because the file is only 14MB
> or so, I figure getting a SequenceDB object from it will AT WORST take up
> twice that in memory (~30MB) (with all of the extra information associated
> with sequences... although my FASTA file has just sequence data).
> Instead, my virtual machine eventually crashes out as I run out of memory.
> I watched the execution using 'top' and witnessed over 75MB being
> allocated to the running process.
> 
> Is there a more efficient implementation of this?  Better yet, can anyone
> tell me why so much space is being taken up for only 14MB of sequence
> data?
> 
> Thanks for any help!!!!
> 
> Murat
> 


-- 
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk



More information about the biojava-dev mailing list