[Bioperl-l] Help with threads and shared variable

Blanchette, Marco MAB at stowers-institute.org
Sat Dec 20 18:10:08 EST 2008


Dear all,

I am not sure this is the best place to post that questions but I don't really know where else to go... So, let's give it a shot.

I am using the Perl threads utility to successfully multi threads several of my computing jobs on my workstation. My current problem is that I need to perform multiple processes using the same humongous array (more than 2x10e6 items). My problem is that the computing time for each iteration is not very long but I have a lot of iterations to do and every time a thread is created I am passing the huge array to the function and a fresh copy of the array is created. Thus, there is a huge amount of wasted resources (time and memory) use to create these data structures that are used by each threads but not modified.

The logical alternative is to use shared memory where all thread would have access to the same copy of the huge array. In principal Perl provide such a mechanism through the module threads::shared but I am unable to understand how to use the shared variables.

Anyone has experience to share on threads::shared? Here is a couple of unsuccessful attempts to use that module:


### first example
my $var :shared; #create a shared scalar
$var = make_uge_array; #return a pointer to a huge array and trying to assign it the the shared pointer
my $thr = threads->create(\&doTheJob,$var); #spawn a thread
$thr->join; #Wait for the thread to return
### Generate the following error
### Invalid value for shared scalar at ...

### second example
my $var = make_uge_array; #return a pointer to a huge array
print scalar(@{$var}), "\n"; #print 2,000,000

share($var);
print scalar(@{$var}), "\n"; #print 0

my $thr = threads->create(\&doTheJob,$var); #spawn a thread
$thr->join; #Wait for the thread to return

### third example
my @array :shared; #create a share array
make_uge_array(\@array) #pass a ref fo the array to a function populate it with 2,000,000 items
print scalar(@array), "\n"; #print 2,000,000

my $thr = threads->create(\&doTheJob,$var); #spawn a thread
$thr->join; #Wait for the thread to return

sub doTheJob{ scalar(@_), "\n"} ## print O

Finally I tried to pass to the thread creation utility a ref of the huge shared array but the main process never stop at the join() utility, it bailed out with the thread still running.

Any suggestion will be appreciated.

Also, feel free to suggest me a better place to post this request.

Many thanks,

Marco

--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018



More information about the Bioperl-l mailing list