[Bioperl-l] Memory in Perl

John Peden johnp@molbiol.ox.ac.uk
Thu, 31 Aug 2000 12:10:26 +0100


> Has anybody in the Bioperl community found a workaround for perl's
> inefficient memory handling?
>
> I am trying to write some genome-scale applications in Perl, but I am
> having the following problems:
>
> 1. Reading more than 1 000 000 blast scores into hashes takes > 100 Mb
> memory. Thus ca 100 bytes per hash entry. Is there a way to make the work
> with large hashes more efficient?

In my experience Perl's arrays and hashes are not memory efficient enough
for genome sized problems.

It takes more coding but you could pack and manipulate scores in strings,
strings are a lot more memory efficient for storing simple types of data.

Alternatively bypass perl's arrays and store the scores using an external
database such as Berkley DB then access it using a Perl database interface,
last time I used the AnyDBM_File module.


> Numerical 2D-array is not a good solution either, because the size of the
> full array would be enormous (50000 genes x 50000 genes for example), and
> most of the slots in array will stay empty and unused.
>
> 2. Huge memory consumption wouldn't be a problem for our hardware - there
> is still plenty of RAM left, but perl reports "Out of Memory"
> after growing
> to about 125 Mb in size.

Sounds like the DU per process datasize limit try unlimiting
unlimit


> Is there a perl install-time option to change the limit of memory usage?
> Is this a system-dependent feature? We have Alphas with ca 1GB RAM + 3GB
> swap.
> Is perl able to use swap memory?

Yes

John

John Peden
MRC Haematology
IMM, Oxford, UK
+44 1865 222350