[Biojava-dev] SequenceDB way too big!!!

Murat Tasan tasan at eecs.cwru.edu
Mon Feb 10 12:57:35 EST 2003


i'll take a look later today and see the changes.  thanks much for all of
this!  i'd love to help as well by poking around the source, but i just
don't seem to have the time at the moment.

i'll get back you you regarding what i see with memory usage.

thanks again!

murat

On Mon, 10 Feb 2003, Matthew Pocock wrote:

> Ok.
>
> We've done some poking, some swearing and David Huen has written some
> code. Before Davids modifications, I had to use -Xmx35mb to load in a 15
> mega base fasta sequence. Now, I need -Xmx5mb to make it run!!! This is
> on Windows 2k with the 1.4.1 VM. The newer shineyer code is in CVS and
> will be part of rc2.
>
> Well done David.
>
> Murat, does this help you?
>
> Matthew
>
> > Murat Tasan wrote:
> >
> >> I've just started using biojava to help in my development of sequence
> >> analysis and searching projects, but I have run into a huge (no pun
> >> intended) problem.  I make a call to SeqIOTools.readFasta(...), to read a
> >> FASTA file of approximately 14MB in size.  Because the file is only 14MB
> >> or so, I figure getting a SequenceDB object from it will AT WORST take up
> >> twice that in memory (~30MB) (with all of the extra information
> >> associated
> >> with sequences... although my FASTA file has just sequence data).
> >> Instead, my virtual machine eventually crashes out as I run out of
> >> memory.
> >> I watched the execution using 'top' and witnessed over 75MB being
> >> allocated to the running process.
> >>
> >> Is there a more efficient implementation of this?  Better yet, can anyone
> >> tell me why so much space is being taken up for only 14MB of sequence
> >> data?
> >>
> >> Thanks for any help!!!!
> >>
> >> Murat
> >>
> >
> >
>
>
>

-- 
Murat Tasan
mxt6 at po.cwru.edu
tasan at eecs.cwru.edu
http://genomics.cwru.edu



More information about the biojava-dev mailing list