[Bioperl-l] Re: [Bioclusters] BioPerl and memory handling

Ian Korf iankorf at mac.com
Tue Nov 30 10:57:45 EST 2004


Perl does give memory back to the OS. If I do

my $dna = 'N' x 100000000;

the memory footprint is 192 MB.

undef $dna;

restores half the memory. This is not within a subroutine, but within 
the main program.

On Nov 30, 2004, at 1:57 AM, Tim Cutts wrote:

>
> On 29 Nov 2004, at 11:32 pm, Ian Korf wrote:
>> Here's something odd. The following labeled block looks like it 
>> should use no memory.
>>
>> 	BLOCK: {
>> 		my  $FOO = 'N' x 100000000;
>> 	}
>>
>> The weird thing is that after executing the block, the memory 
>> footprint is still 192 Mb as if it hadn't been garbage collected.
>
> Perl's garbage collection does not give the memory back to the OS; it 
> just marks the allocated memory for internal reuse by subsequent 
> allocations within perl.
>
> This is actually true of most UNIX programs; this is not unique to 
> perl.  free() does not necessarily give the memory back to the 
> operating system, it just marks it for re-use by the current process 
> the next time it calls malloc().  The memory doesn't become available 
> to the OS until the program exits.
>
> This is one reason why garbage collecting languages like perl and java 
> should not be relied on to keep memory under control; GC does *not* 
> absolve the programmer from the need to keep their memory usage tight.
>
> Consider the following C program (which you need to run on an OS which 
> actually populates all the contents of the rusage struct - Linux does 
> not, and neither does MacOS X, but Tru64 does):
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/time.h>
> #include <sys/resource.h>
>
> #define PRINT_RESOURCES(x) getrusage(RUSAGE_SELF, &r);\
>     printf(#x "\n\nShared: %lu\nUnshared: %lu\nStack: %lu\n\n",\
>            r.ru_ixrss, r.ru_idrss, r.ru_isrss)
>
> int main(void) {
>
>    char *p;
>    struct rusage r;
>    int i;
>
>    PRINT_RESOURCES("Program start");
>
>    p = malloc(100000000);
>
>    /* Use the memory */
>    for (i = 0; i<100000000; i++)
>         p[i] = 'N';
>
>    PRINT_RESOURCES("After malloc");
>
>    free(p);
>
>    PRINT_RESOURCES("After free");
>
>    return 0;
>
> }
>
> The output on this Tru64 machine is:
>
> 09:46:26 tjrc at ecs2d:~$ ./memtest
> "Program start"
>
> Shared: 0
> Unshared: 0
> Stack: 0
>
> "After malloc"
>
> Shared: 19
> Unshared: 116577
> Stack: 19
>
> "After free"
>
> Shared: 19
> Unshared: 116577
> Stack: 19
>
> As you can see, free() does not actually release the memory from the 
> process back to the operating system.
>>
>
>> sub foo {my $FOO = 'N' x 100000000}
>> for (my $i = 0; $i < 50; $i++) {foo()} # 29.420u 1.040s
>>
>> sub bar {my $BAR = 'N' x 100000000; undef $BAR}
>> for (my $i = 0; $i < 50; $i++) {bar()} # 26.880u 21.220s
>>
>> The increase from 1 sec to 21 sec system CPU time is all the extra 
>> memory allocation and freeing associated with the undef statement. 
>> Why the user time is less in the undef example is a mystery to me.
>
> I can explain this.  It's because you're forgetting that the final 
> statement in a perl subroutine is always its return value, even if you 
> don't specify 'return', so if you allocate 100MB of Ns, as in the 
> first case, and then return it (which you do because the allocation is 
> the last statement in the subroutine) you actually force perl to 
> *copy* that lexically scoped variable each time the routine is called. 
>  That's why the program uses 200MB of memory, not 100MB.
>
> In the second version, by explicitly freeing the memory, perl never 
> has to copy the return value, so its memory footprint is half.
>
> Using undef has not actually freed any memory at all, it's just 
> changed the return value from the function and stopped perl doubling 
> its memory use.
>
> The lesson here is therefore to be very careful in perl subroutines 
> where you don't care about the return value to make sure the return 
> value is something tiny.   Perl has no equivalent to a C void 
> function.
>
> Tim
>
> -- 
> Dr Tim Cutts
> Informatics Systems Group, Wellcome Trust Sanger Institute
> GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>



More information about the Bioperl-l mailing list