[Biojava-dev] SequenceDB way too big!!!

Matthew Pocock matthew_pocock at yahoo.co.uk
Mon Feb 10 15:48:52 EST 2003


Ok.

We've done some poking, some swearing and David Huen has written some 
code. Before Davids modifications, I had to use -Xmx35mb to load in a 15 
mega base fasta sequence. Now, I need -Xmx5mb to make it run!!! This is 
on Windows 2k with the 1.4.1 VM. The newer shineyer code is in CVS and 
will be part of rc2.

Well done David.

Murat, does this help you?

Matthew

> Murat Tasan wrote:
> 
>> I've just started using biojava to help in my development of sequence
>> analysis and searching projects, but I have run into a huge (no pun
>> intended) problem.  I make a call to SeqIOTools.readFasta(...), to read a
>> FASTA file of approximately 14MB in size.  Because the file is only 14MB
>> or so, I figure getting a SequenceDB object from it will AT WORST take up
>> twice that in memory (~30MB) (with all of the extra information 
>> associated
>> with sequences... although my FASTA file has just sequence data).
>> Instead, my virtual machine eventually crashes out as I run out of 
>> memory.
>> I watched the execution using 'top' and witnessed over 75MB being
>> allocated to the running process.
>>
>> Is there a more efficient implementation of this?  Better yet, can anyone
>> tell me why so much space is being taken up for only 14MB of sequence
>> data?
>>
>> Thanks for any help!!!!
>>
>> Murat
>>
> 
> 


-- 
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk



More information about the biojava-dev mailing list