[Bioperl-l] Help with threads and shared variable

Jonathan Crabtree jonathancrabtree at gmail.com
Wed Dec 24 13:03:25 EST 2008


Hi Marco,

I can't be exactly sure what's going wrong in your examples, since you
haven't posted the crucial "make_uge_array" function, or told us which
version of Perl you are using.  However, I'd guess that perhaps you're
creating a multi-dimensional array without sharing anything except the
top-level array.  Here is a short test program, which runs correctly
on perl 5.8.8 and may help to illustrate how the Perl threads::shared
module expects you to create and share nested data structures.  You
have to manually share any nested references and I think that the
order in which the sharing calls are made may also be significant:

#!/usr/bin/perl

use strict;
use warnings;
use threads;
use threads::shared;

# threads::shared test/demo program
# creates a shared 2-dimensional array and checks that it can be seen
in a thread
# tested in perl v5.8.8 built for i486-linux-gnu-thread-multi

## ----------------------------------------
## globals
## ----------------------------------------

# set the width and height of the 2d array to this value:
my $ARRAY_SIZE = 10;

## ----------------------------------------
## main program
## ----------------------------------------

# calls to &share take place in here, so a shared value is returned
my $array = &make_shared_array();

# print array contents before running thread
print "shared array before running thread:\n";
&check_and_print_array($array);

# run thread
my $thr = threads->create(\&do_the_job, $array);

my $retval = $thr->join();
print "join() returned: $retval\n";

# print array contents after running thread
print "shared array after running thread:\n";
&check_and_print_array($array);

exit(0);

## ----------------------------------------
## subroutines
## ----------------------------------------

sub make_shared_array {
    # outermost array object must be made shared first
    my $a = &share([]);

    for (my $i = 0;$i < $ARRAY_SIZE;++$i) {
	# each of the rows must be explicitly shared
	my $row = &share([]);
	# and then added to the containing array
	$a->[$i] = $row;
	# assign each cell a unique integer for verification purposes
	my $base = $i * $ARRAY_SIZE;
	for (my $j = 0;$j < $ARRAY_SIZE;++$j) {
	    $row->[$j] = $base + $j;
	}
    }
    return $a;
}

# print out the array, checking that its dimensions match what we expect
sub check_and_print_array {
    my $arr = shift;
    die "not an array" if ((ref $arr) ne 'ARRAY');
    my $nr = scalar(@$arr);
    die "wrong number of rows in array" if ($nr != $ARRAY_SIZE);

    for (my $i = 0;$i < $nr;++$i) {
	my $row = $arr->[$i];
	die "row $i not an array" if ((ref $row) ne 'ARRAY');
	my $nc = scalar(@$row);
	die "wrong number of columns in row $i" if ($nc != $ARRAY_SIZE);
	
	for (my $j = 0;$j < $nc;++$j) {
	    my $val = $row->[$j];
	    printf("%10s", $val);
	}

	print "\n";
    }
}

# work to execute in the thread
sub do_the_job {
    my $var = shift;

    # print the array once more in the thread
    print "shared array in thread:\n";
    &check_and_print_array($var);

    return "do_the_job returned ok";
}

When I run it (on Ubuntu) the output looks like this:

shared array before running thread:
         0         1         2         3         4         5         6
        7         8         9
        10        11        12        13        14        15        16
       17        18        19
        20        21        22        23        24        25        26
       27        28        29
        30        31        32        33        34        35        36
       37        38        39
        40        41        42        43        44        45        46
       47        48        49
        50        51        52        53        54        55        56
       57        58        59
        60        61        62        63        64        65        66
       67        68        69
        70        71        72        73        74        75        76
       77        78        79
        80        81        82        83        84        85        86
       87        88        89
        90        91        92        93        94        95        96
       97        98        99
shared array in thread:
         0         1         2         3         4         5         6
        7         8         9
        10        11        12        13        14        15        16
       17        18        19
        20        21        22        23        24        25        26
       27        28        29
        30        31        32        33        34        35        36
       37        38        39
        40        41        42        43        44        45        46
       47        48        49
        50        51        52        53        54        55        56
       57        58        59
        60        61        62        63        64        65        66
       67        68        69
        70        71        72        73        74        75        76
       77        78        79
        80        81        82        83        84        85        86
       87        88        89
        90        91        92        93        94        95        96
       97        98        99
join() returned: do_the_job returned ok
shared array after running thread:
         0         1         2         3         4         5         6
        7         8         9
        10        11        12        13        14        15        16
       17        18        19
        20        21        22        23        24        25        26
       27        28        29
        30        31        32        33        34        35        36
       37        38        39
        40        41        42        43        44        45        46
       47        48        49
        50        51        52        53        54        55        56
       57        58        59
        60        61        62        63        64        65        66
       67        68        69
        70        71        72        73        74        75        76
       77        78        79
        80        81        82        83        84        85        86
       87        88        89
        90        91        92        93        94        95        96
       97        98        99

I haven't verified that doing this actually yields the memory savings
you're looking for, but I don't see why it shouldn't.  Hope this
helps,

Jonathan

On Sat, Dec 20, 2008 at 6:10 PM, Blanchette, Marco
<MAB at stowers-institute.org> wrote:
> Dear all,
>
> I am not sure this is the best place to post that questions but I don't really know where else to go... So, let's give it a shot.
>
> I am using the Perl threads utility to successfully multi threads several of my computing jobs on my workstation. My current problem is that I need to perform multiple processes using the same humongous array (more than 2x10e6 items). My problem is that the computing time for each iteration is not very long but I have a lot of iterations to do and every time a thread is created I am passing the huge array to the function and a fresh copy of the array is created. Thus, there is a huge amount of wasted resources (time and memory) use to create these data structures that are used by each threads but not modified.
>
> The logical alternative is to use shared memory where all thread would have access to the same copy of the huge array. In principal Perl provide such a mechanism through the module threads::shared but I am unable to understand how to use the shared variables.
>
> Anyone has experience to share on threads::shared? Here is a couple of unsuccessful attempts to use that module:
>
>
> ### first example
> my $var :shared; #create a shared scalar
> $var = make_uge_array; #return a pointer to a huge array and trying to assign it the the shared pointer
> my $thr = threads->create(\&doTheJob,$var); #spawn a thread
> $thr->join; #Wait for the thread to return
> ### Generate the following error
> ### Invalid value for shared scalar at ...
>
> ### second example
> my $var = make_uge_array; #return a pointer to a huge array
> print scalar(@{$var}), "\n"; #print 2,000,000
>
> share($var);
> print scalar(@{$var}), "\n"; #print 0
>
> my $thr = threads->create(\&doTheJob,$var); #spawn a thread
> $thr->join; #Wait for the thread to return
>
> ### third example
> my @array :shared; #create a share array
> make_uge_array(\@array) #pass a ref fo the array to a function populate it with 2,000,000 items
> print scalar(@array), "\n"; #print 2,000,000
>
> my $thr = threads->create(\&doTheJob,$var); #spawn a thread
> $thr->join; #Wait for the thread to return
>
> sub doTheJob{ scalar(@_), "\n"} ## print O
>
> Finally I tried to pass to the thread creation utility a ref of the huge shared array but the main process never stop at the join() utility, it bailed out with the thread still running.
>
> Any suggestion will be appreciated.
>
> Also, feel free to suggest me a better place to post this request.
>
> Many thanks,
>
> Marco
>
> --
> Marco Blanchette, Ph.D.
> Assistant Investigator
> Stowers Institute for Medical Research
> 1000 East 50th St.
>
> Kansas City, MO 64110
>
> Tel: 816-926-4071
> Cell: 816-726-8419
> Fax: 816-926-2018
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


More information about the Bioperl-l mailing list