[Biojava-dev] Why BJ3 should be multithreaded

Andreas Prlic ap3 at sanger.ac.uk
Wed Apr 9 10:40:58 UTC 2008


Hi,

I like the idea of having support for multiple threads. Only thing  
is, when running BioJava on our compute farm, I am pretty sure our  
admins won't be happy if BJ would use more than just a single CPU,  
unless run on special hardware. As such there should be a BJ wide  
configuration management, which would allow to determine how many  
CPUs to be used (and the default could be all of them).

Andreas


On 9 Apr 2008, at 09:28, Andy Yates wrote:

> Lo,
>
> This is the kind of problem Java7 is attempting to solve with the  
> fork-join framework (which really is a rip-off of Google's  
> MapReduce). There's two ways of looking at thread safety & how to  
> implement it:
>
> * Packages which could be threaded or want to be threaded are  
> programmed with threading in mind using items from the  
> util.concurrent package to split, queue & work with data points.
>
> * Packages can be created as required & have data to process passed  
> to them for processing in a stateless manner; much in the same way  
> servlet engines and a lot of web frameworks run
>
> The first way does mean we can support environments with useful  
> multi-threaded support (no point in threading on a single CPU/core  
> box) from the word go. The second way would require some plumbing  
> on the user's behalf but this would be very easy plumbing; the  
> majority of which we could write (like wrapping things in instances  
> of Callables).
>
> Anyway my 2p worth :)
>
> Andy
>
> Mark Schreiber wrote:
>> Hi -
>> I was just playing with threads to see how efficient they are on  
>> one of our old 4 CPU IBM servers.  The following fairly naive  
>> program splits a large array of numbers and sums them all up.  The  
>> multi-threaded version is 2.5 times faster even allowing for  
>> thread overhead. The program could be even better if I make more  
>> use of the java1.5 concurrent package.
>> Similar tasks in biojava would be include training distributions  
>> which should see similar performance improvements. Much of the  
>> current biojava doesn't make use of threads and worse, requires  
>> the developer to manage all the thread safety themselves.
>> - Mark
>> /*
>>  * To change this template, choose Tools | Templates
>>  * and open the template in the editor.
>>  */
>> package concurrent;
>> import java.util.concurrent.atomic.AtomicInteger;
>> /**
>>  * This program demo's the use of threads to sum a large array of  
>> integers.
>>  * @author Mark Schreiber
>>  */
>> public class ThreadedAdder {
>>     static int processors = Runtime.getRuntime 
>> ().availableProcessors();
>>     int bigNumber = 10000000;
>>     int[] bigArray = new int[bigNumber * processors];
>>         public ThreadedAdder(){
>>         //make a big array of integers (10 000 000 numbers for  
>> each processor)
>>         for(int i = 0; i < bigArray.length; i++){
>>             //random number between 1 and 100
>>             bigArray[i] = (int)(Math.random() * 100.0);
>>         }
>>     }
>>     public void singleThreadedAdd(){
>>         int result = 0;
>>               //single threaded sum
>>         long start = System.currentTimeMillis();
>>         for(int number : bigArray){
>>             result += number;
>>         }
>>         long time = System.currentTimeMillis() - start;
>>         System.out.println("Calculation time = "+time+" ms");
>>         System.out.println("total = "+result);
>>             }
>>         public void multiThreadedAdd() throws InterruptedException{
>>         AtomicInteger total = new AtomicInteger();
>>         long start = System.currentTimeMillis();
>>         AddingThread[] threads = new AddingThread[processors];
>>         for(int i = 0; i < threads.length; i++){
>>             threads[i] = new AddingThread("Thread "+i, i *  
>> bigNumber, total);
>>             System.out.println(threads[i].getName()+" starting");
>>             threads[i].start();
>>         }
>>         for(Thread thread : threads){
>>             //make sure everyone is finished
>>             thread.join();
>>         }
>>         long time = System.currentTimeMillis() - start;
>>         System.out.println("Calculation time = "+time+" ms");
>>         System.out.println("total = "+total);
>>     }
>>         /**
>>      * @param args the command line arguments
>>      */
>>     public static void main(String[] args) throws Exception{
>>         //how many processors do I have?
>>         System.out.println("Available processors = "+processors);
>>         System.out.println("Initializing number array");
>>         ThreadedAdder adder = new ThreadedAdder();
>>                 System.out.println("single thread add");
>>         adder.singleThreadedAdd();
>>         System.out.println("multi thread add");
>>         adder.multiThreadedAdd();
>>     }
>>     public class AddingThread extends Thread{
>>         int internalTotal = 0;
>>         int offSet = 0;
>>         AtomicInteger callBackTotal;
>>                 public AddingThread(String name, int offSet,  
>> AtomicInteger callBackTotal){
>>             super(name);
>>             this.offSet = offSet;
>>             this.callBackTotal = callBackTotal;
>>         }
>>                 @Override
>>         public void run(){
>>             for(int i = offSet; i < offSet + bigNumber; i++){
>>                 internalTotal += bigArray[i];
>>             }
>>             callBackTotal.addAndGet(internalTotal);
>>             System.out.println(this.getName()+" complete");
>>         }
>>     }
>> }

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
                               +44 (0) 1223 49 6891

-----------------------------------------------------------------------




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the biojava-dev mailing list