From ap3 at sanger.ac.uk  Sat Apr  5 08:53:53 2008
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Sat, 5 Apr 2008 13:53:53 +0100
Subject: [Biojava-dev] preparations for release 1.6 - svn freeze
In-Reply-To: <78ECCE6A-F8CC-45AA-862B-F7D8BFC65EA0@sanger.ac.uk>
References: <78ECCE6A-F8CC-45AA-862B-F7D8BFC65EA0@sanger.ac.uk>
Message-ID: <65617D9D-0E0C-476F-A515-1222733DA9C2@sanger.ac.uk>

Hi,

In preparation for the 1.6 release, please do not commit any new  
features into svn from now until the release. Javadoc improvements  
are still welcome.

There were 2 patches end of last week, regarding the Genetic  
Algorithms and PDB file header parsing. I suggest to give those a  
week to make sure they are fine and target next weekend for the release.

Andreas


On 26 Mar 2008, at 13:55, Andreas Prlic wrote:

> Hi,
>
> The biojava 1.6 release candidate 1 has been available now for a  
> while and I would like to proceed with releasing the final biojava  
> 1.6.
>
> I ran doccheck on the latest SVN and we still could do with  some  
> javadoc improvements:
> http://www.spice-3d.org/doccheck/biojava-svn/biojava/ 
> PackageStatistics.html
>
> Please commit any remaining bug fixes to SVN until
>
> Friday, April 4th 18:00 GMT
>
> I will do the release (and SVN branch) after that.
>
> Cheers,
> Andreas
>
>
>
> ---------------------------------------------------------------------- 
> -
>
> Andreas Prlic      Wellcome Trust Sanger Institute
>                               Hinxton, Cambridge CB10 1SA, UK
>                               +44 (0) 1223 49 6891
>
> ---------------------------------------------------------------------- 
> -
>
>
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome  
> ResearchLimited, a charity registered in England with number  
> 1021457 and acompany registered in England with number 2742969,  
> whose registeredoffice is 215 Euston Road, London, NW1  
> 2BE._______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
                               +44 (0) 1223 49 6891

-----------------------------------------------------------------------


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From ap3 at sanger.ac.uk  Wed Apr  9 06:40:58 2008
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Wed, 9 Apr 2008 11:40:58 +0100
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <47FC7E3B.9000106@ebi.ac.uk>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>
	<47FC7E3B.9000106@ebi.ac.uk>
Message-ID: <77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>

Hi,

I like the idea of having support for multiple threads. Only thing  
is, when running BioJava on our compute farm, I am pretty sure our  
admins won't be happy if BJ would use more than just a single CPU,  
unless run on special hardware. As such there should be a BJ wide  
configuration management, which would allow to determine how many  
CPUs to be used (and the default could be all of them).

Andreas


On 9 Apr 2008, at 09:28, Andy Yates wrote:

> Lo,
>
> This is the kind of problem Java7 is attempting to solve with the  
> fork-join framework (which really is a rip-off of Google's  
> MapReduce). There's two ways of looking at thread safety & how to  
> implement it:
>
> * Packages which could be threaded or want to be threaded are  
> programmed with threading in mind using items from the  
> util.concurrent package to split, queue & work with data points.
>
> * Packages can be created as required & have data to process passed  
> to them for processing in a stateless manner; much in the same way  
> servlet engines and a lot of web frameworks run
>
> The first way does mean we can support environments with useful  
> multi-threaded support (no point in threading on a single CPU/core  
> box) from the word go. The second way would require some plumbing  
> on the user's behalf but this would be very easy plumbing; the  
> majority of which we could write (like wrapping things in instances  
> of Callables).
>
> Anyway my 2p worth :)
>
> Andy
>
> Mark Schreiber wrote:
>> Hi -
>> I was just playing with threads to see how efficient they are on  
>> one of our old 4 CPU IBM servers.  The following fairly naive  
>> program splits a large array of numbers and sums them all up.  The  
>> multi-threaded version is 2.5 times faster even allowing for  
>> thread overhead. The program could be even better if I make more  
>> use of the java1.5 concurrent package.
>> Similar tasks in biojava would be include training distributions  
>> which should see similar performance improvements. Much of the  
>> current biojava doesn't make use of threads and worse, requires  
>> the developer to manage all the thread safety themselves.
>> - Mark
>> /*
>>  * To change this template, choose Tools | Templates
>>  * and open the template in the editor.
>>  */
>> package concurrent;
>> import java.util.concurrent.atomic.AtomicInteger;
>> /**
>>  * This program demo's the use of threads to sum a large array of  
>> integers.
>>  * @author Mark Schreiber
>>  */
>> public class ThreadedAdder {
>>     static int processors = Runtime.getRuntime 
>> ().availableProcessors();
>>     int bigNumber = 10000000;
>>     int[] bigArray = new int[bigNumber * processors];
>>         public ThreadedAdder(){
>>         //make a big array of integers (10 000 000 numbers for  
>> each processor)
>>         for(int i = 0; i < bigArray.length; i++){
>>             //random number between 1 and 100
>>             bigArray[i] = (int)(Math.random() * 100.0);
>>         }
>>     }
>>     public void singleThreadedAdd(){
>>         int result = 0;
>>               //single threaded sum
>>         long start = System.currentTimeMillis();
>>         for(int number : bigArray){
>>             result += number;
>>         }
>>         long time = System.currentTimeMillis() - start;
>>         System.out.println("Calculation time = "+time+" ms");
>>         System.out.println("total = "+result);
>>             }
>>         public void multiThreadedAdd() throws InterruptedException{
>>         AtomicInteger total = new AtomicInteger();
>>         long start = System.currentTimeMillis();
>>         AddingThread[] threads = new AddingThread[processors];
>>         for(int i = 0; i < threads.length; i++){
>>             threads[i] = new AddingThread("Thread "+i, i *  
>> bigNumber, total);
>>             System.out.println(threads[i].getName()+" starting");
>>             threads[i].start();
>>         }
>>         for(Thread thread : threads){
>>             //make sure everyone is finished
>>             thread.join();
>>         }
>>         long time = System.currentTimeMillis() - start;
>>         System.out.println("Calculation time = "+time+" ms");
>>         System.out.println("total = "+total);
>>     }
>>         /**
>>      * @param args the command line arguments
>>      */
>>     public static void main(String[] args) throws Exception{
>>         //how many processors do I have?
>>         System.out.println("Available processors = "+processors);
>>         System.out.println("Initializing number array");
>>         ThreadedAdder adder = new ThreadedAdder();
>>                 System.out.println("single thread add");
>>         adder.singleThreadedAdd();
>>         System.out.println("multi thread add");
>>         adder.multiThreadedAdd();
>>     }
>>     public class AddingThread extends Thread{
>>         int internalTotal = 0;
>>         int offSet = 0;
>>         AtomicInteger callBackTotal;
>>                 public AddingThread(String name, int offSet,  
>> AtomicInteger callBackTotal){
>>             super(name);
>>             this.offSet = offSet;
>>             this.callBackTotal = callBackTotal;
>>         }
>>                 @Override
>>         public void run(){
>>             for(int i = offSet; i < offSet + bigNumber; i++){
>>                 internalTotal += bigArray[i];
>>             }
>>             callBackTotal.addAndGet(internalTotal);
>>             System.out.println(this.getName()+" complete");
>>         }
>>     }
>> }

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
                               +44 (0) 1223 49 6891

-----------------------------------------------------------------------


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From ayates at ebi.ac.uk  Wed Apr  9 07:03:19 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 09 Apr 2008 12:03:19 +0100
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>
	<47FC7E3B.9000106@ebi.ac.uk>
	<77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>
Message-ID: <47FCA277.2020401@ebi.ac.uk>


Most the time any kind of farm management software (like LSF & please 
correct me if I'm wrong) looks at the amount of CPU time a process takes 
up and the number of threads it detects; not only the number of 
processes you have in a queue. So a multi-threaded biojava should not 
pose a problem to these systems. Not to mention with the newer multiple 
core computers; threaded software is becoming the only way to take full 
advantage of the available power.

Where you would want to ignore multi-threading is if you are in a queue 
like LSF and your x number of Java processes all get chucked onto the 
same machine. Then if you've got so many processor hungry operations all 
trying to create threads ... well it's not going to behave as optimally 
as you might hope.

Personally though I'd still air on the side of caution WRT 
multi-threading and not to have it as part of the default tools but as 
an Object I can instantiate to do my multi-threading work (so it's a 
choice at the user's level rather than the framework level). Then using 
the Java5 executor framework we let users submit work to pools of 
threads to do their work. Couple this with forcing us to pass around 
immutable messages between threads/callables (since values shared by 
threads are probably the number one cause of **** ups) you'll have one 
heck of a kick-ass scalable framework ;-)

Andy

Andreas Prlic wrote:
> Hi,
> 
> I like the idea of having support for multiple threads. Only thing is, 
> when running BioJava on our compute farm, I am pretty sure our admins 
> won't be happy if BJ would use more than just a single CPU, unless run 
> on special hardware. As such there should be a BJ wide configuration 
> management, which would allow to determine how many CPUs to be used (and 
> the default could be all of them).
> 
> Andreas
> 
> 
> On 9 Apr 2008, at 09:28, Andy Yates wrote:
> 
>> Lo,
>>
>> This is the kind of problem Java7 is attempting to solve with the 
>> fork-join framework (which really is a rip-off of Google's MapReduce). 
>> There's two ways of looking at thread safety & how to implement it:
>>
>> * Packages which could be threaded or want to be threaded are 
>> programmed with threading in mind using items from the util.concurrent 
>> package to split, queue & work with data points.
>>
>> * Packages can be created as required & have data to process passed to 
>> them for processing in a stateless manner; much in the same way 
>> servlet engines and a lot of web frameworks run
>>
>> The first way does mean we can support environments with useful 
>> multi-threaded support (no point in threading on a single CPU/core 
>> box) from the word go. The second way would require some plumbing on 
>> the user's behalf but this would be very easy plumbing; the majority 
>> of which we could write (like wrapping things in instances of Callables).
>>
>> Anyway my 2p worth :)
>>
>> Andy
>>
>> Mark Schreiber wrote:
>>> Hi -
>>> I was just playing with threads to see how efficient they are on one 
>>> of our old 4 CPU IBM servers.  The following fairly naive program 
>>> splits a large array of numbers and sums them all up.  The 
>>> multi-threaded version is 2.5 times faster even allowing for thread 
>>> overhead. The program could be even better if I make more use of the 
>>> java1.5 concurrent package.
>>> Similar tasks in biojava would be include training distributions 
>>> which should see similar performance improvements. Much of the 
>>> current biojava doesn't make use of threads and worse, requires the 
>>> developer to manage all the thread safety themselves.
>>> - Mark
>>> /*
>>>  * To change this template, choose Tools | Templates
>>>  * and open the template in the editor.
>>>  */
>>> package concurrent;
>>> import java.util.concurrent.atomic.AtomicInteger;
>>> /**
>>>  * This program demo's the use of threads to sum a large array of 
>>> integers.
>>>  * @author Mark Schreiber
>>>  */
>>> public class ThreadedAdder {
>>>     static int processors = Runtime.getRuntime().availableProcessors();
>>>     int bigNumber = 10000000;
>>>     int[] bigArray = new int[bigNumber * processors];
>>>         public ThreadedAdder(){
>>>         //make a big array of integers (10 000 000 numbers for each 
>>> processor)
>>>         for(int i = 0; i < bigArray.length; i++){
>>>             //random number between 1 and 100
>>>             bigArray[i] = (int)(Math.random() * 100.0);
>>>         }
>>>     }
>>>     public void singleThreadedAdd(){
>>>         int result = 0;
>>>               //single threaded sum
>>>         long start = System.currentTimeMillis();
>>>         for(int number : bigArray){
>>>             result += number;
>>>         }
>>>         long time = System.currentTimeMillis() - start;
>>>         System.out.println("Calculation time = "+time+" ms");
>>>         System.out.println("total = "+result);
>>>             }
>>>         public void multiThreadedAdd() throws InterruptedException{
>>>         AtomicInteger total = new AtomicInteger();
>>>         long start = System.currentTimeMillis();
>>>         AddingThread[] threads = new AddingThread[processors];
>>>         for(int i = 0; i < threads.length; i++){
>>>             threads[i] = new AddingThread("Thread "+i, i * bigNumber, 
>>> total);
>>>             System.out.println(threads[i].getName()+" starting");
>>>             threads[i].start();
>>>         }
>>>         for(Thread thread : threads){
>>>             //make sure everyone is finished
>>>             thread.join();
>>>         }
>>>         long time = System.currentTimeMillis() - start;
>>>         System.out.println("Calculation time = "+time+" ms");
>>>         System.out.println("total = "+total);
>>>     }
>>>         /**
>>>      * @param args the command line arguments
>>>      */
>>>     public static void main(String[] args) throws Exception{
>>>         //how many processors do I have?
>>>         System.out.println("Available processors = "+processors);
>>>         System.out.println("Initializing number array");
>>>         ThreadedAdder adder = new ThreadedAdder();
>>>                 System.out.println("single thread add");
>>>         adder.singleThreadedAdd();
>>>         System.out.println("multi thread add");
>>>         adder.multiThreadedAdd();
>>>     }
>>>     public class AddingThread extends Thread{
>>>         int internalTotal = 0;
>>>         int offSet = 0;
>>>         AtomicInteger callBackTotal;
>>>                 public AddingThread(String name, int offSet, 
>>> AtomicInteger callBackTotal){
>>>             super(name);
>>>             this.offSet = offSet;
>>>             this.callBackTotal = callBackTotal;
>>>         }
>>>                 @Override
>>>         public void run(){
>>>             for(int i = offSet; i < offSet + bigNumber; i++){
>>>                 internalTotal += bigArray[i];
>>>             }
>>>             callBackTotal.addAndGet(internalTotal);
>>>             System.out.println(this.getName()+" complete");
>>>         }
>>>     }
>>> }
> 
> -----------------------------------------------------------------------
> 
> Andreas Prlic      Wellcome Trust Sanger Institute
>                               Hinxton, Cambridge CB10 1SA, UK
>                               +44 (0) 1223 49 6891
> 
> -----------------------------------------------------------------------
> 
> 
> 
> 

From markjschreiber at gmail.com  Wed Apr  9 07:45:16 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 9 Apr 2008 19:45:16 +0800
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <47FCA277.2020401@ebi.ac.uk>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>
	<47FC7E3B.9000106@ebi.ac.uk>
	<77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>
	<47FCA277.2020401@ebi.ac.uk>
Message-ID: <93b45ca50804090445t5e04c555ue7ce8ff90d852c97@mail.gmail.com>

Andy is right on this, a JVM can use at most the available CPUs on one
machine (and sometimes not even that).

Unless there is a very sophisticated farm management system that makes it
look like all 100 cores are on the same machine then there is no chance that
the JVM can take over more than one machine (unless you start another whole
JVM from within your program).

On Wed, Apr 9, 2008 at 7:03 PM, Andy Yates <ayates at ebi.ac.uk> wrote:

>
>
> Most the time any kind of farm management software (like LSF & please
> correct me if I'm wrong) looks at the amount of CPU time a process takes up
> and the number of threads it detects; not only the number of processes you
> have in a queue. So a multi-threaded biojava should not pose a problem to
> these systems. Not to mention with the newer multiple core computers;
> threaded software is becoming the only way to take full advantage of the
> available power.
>
> Where you would want to ignore multi-threading is if you are in a queue
> like LSF and your x number of Java processes all get chucked onto the same
> machine. Then if you've got so many processor hungry operations all trying
> to create threads ... well it's not going to behave as optimally as you
> might hope.
>
> Personally though I'd still air on the side of caution WRT multi-threading
> and not to have it as part of the default tools but as an Object I can
> instantiate to do my multi-threading work (so it's a choice at the user's
> level rather than the framework level). Then using the Java5 executor
> framework we let users submit work to pools of threads to do their work.
> Couple this with forcing us to pass around immutable messages between
> threads/callables (since values shared by threads are probably the number
> one cause of **** ups) you'll have one heck of a kick-ass scalable framework
> ;-)
>
> Andy
>
>
> Andreas Prlic wrote:
>
> > Hi,
> >
> > I like the idea of having support for multiple threads. Only thing is,
> > when running BioJava on our compute farm, I am pretty sure our admins won't
> > be happy if BJ would use more than just a single CPU, unless run on special
> > hardware. As such there should be a BJ wide configuration management, which
> > would allow to determine how many CPUs to be used (and the default could be
> > all of them).
> >
> > Andreas
> >
> >
> > On 9 Apr 2008, at 09:28, Andy Yates wrote:
> >
> > Lo,
> > >
> > > This is the kind of problem Java7 is attempting to solve with the
> > > fork-join framework (which really is a rip-off of Google's MapReduce).
> > > There's two ways of looking at thread safety & how to implement it:
> > >
> > > * Packages which could be threaded or want to be threaded are
> > > programmed with threading in mind using items from the util.concurrent
> > > package to split, queue & work with data points.
> > >
> > > * Packages can be created as required & have data to process passed to
> > > them for processing in a stateless manner; much in the same way servlet
> > > engines and a lot of web frameworks run
> > >
> > > The first way does mean we can support environments with useful
> > > multi-threaded support (no point in threading on a single CPU/core box) from
> > > the word go. The second way would require some plumbing on the user's behalf
> > > but this would be very easy plumbing; the majority of which we could write
> > > (like wrapping things in instances of Callables).
> > >
> > > Anyway my 2p worth :)
> > >
> > > Andy
> > >
> > > Mark Schreiber wrote:
> > >
> > > > Hi -
> > > > I was just playing with threads to see how efficient they are on one
> > > > of our old 4 CPU IBM servers.  The following fairly naive program splits a
> > > > large array of numbers and sums them all up.  The multi-threaded version is
> > > > 2.5 times faster even allowing for thread overhead. The program could be
> > > > even better if I make more use of the java1.5 concurrent package.
> > > > Similar tasks in biojava would be include training distributions
> > > > which should see similar performance improvements. Much of the current
> > > > biojava doesn't make use of threads and worse, requires the developer to
> > > > manage all the thread safety themselves.
> > > > - Mark
> > > > /*
> > > >  * To change this template, choose Tools | Templates
> > > >  * and open the template in the editor.
> > > >  */
> > > > package concurrent;
> > > > import java.util.concurrent.atomic.AtomicInteger;
> > > > /**
> > > >  * This program demo's the use of threads to sum a large array of
> > > > integers.
> > > >  * @author Mark Schreiber
> > > >  */
> > > > public class ThreadedAdder {
> > > >    static int processors =
> > > > Runtime.getRuntime().availableProcessors();
> > > >    int bigNumber = 10000000;
> > > >    int[] bigArray = new int[bigNumber * processors];
> > > >        public ThreadedAdder(){
> > > >        //make a big array of integers (10 000 000 numbers for each
> > > > processor)
> > > >        for(int i = 0; i < bigArray.length; i++){
> > > >            //random number between 1 and 100
> > > >            bigArray[i] = (int)(Math.random() * 100.0);
> > > >        }
> > > >    }
> > > >    public void singleThreadedAdd(){
> > > >        int result = 0;
> > > >              //single threaded sum
> > > >        long start = System.currentTimeMillis();
> > > >        for(int number : bigArray){
> > > >            result += number;
> > > >        }
> > > >        long time = System.currentTimeMillis() - start;
> > > >        System.out.println("Calculation time = "+time+" ms");
> > > >        System.out.println("total = "+result);
> > > >            }
> > > >        public void multiThreadedAdd() throws InterruptedException{
> > > >        AtomicInteger total = new AtomicInteger();
> > > >        long start = System.currentTimeMillis();
> > > >        AddingThread[] threads = new AddingThread[processors];
> > > >        for(int i = 0; i < threads.length; i++){
> > > >            threads[i] = new AddingThread("Thread "+i, i * bigNumber,
> > > > total);
> > > >            System.out.println(threads[i].getName()+" starting");
> > > >            threads[i].start();
> > > >        }
> > > >        for(Thread thread : threads){
> > > >            //make sure everyone is finished
> > > >            thread.join();
> > > >        }
> > > >        long time = System.currentTimeMillis() - start;
> > > >        System.out.println("Calculation time = "+time+" ms");
> > > >        System.out.println("total = "+total);
> > > >    }
> > > >        /**
> > > >     * @param args the command line arguments
> > > >     */
> > > >    public static void main(String[] args) throws Exception{
> > > >        //how many processors do I have?
> > > >        System.out.println("Available processors = "+processors);
> > > >        System.out.println("Initializing number array");
> > > >        ThreadedAdder adder = new ThreadedAdder();
> > > >                System.out.println("single thread add");
> > > >        adder.singleThreadedAdd();
> > > >        System.out.println("multi thread add");
> > > >        adder.multiThreadedAdd();
> > > >    }
> > > >    public class AddingThread extends Thread{
> > > >        int internalTotal = 0;
> > > >        int offSet = 0;
> > > >        AtomicInteger callBackTotal;
> > > >                public AddingThread(String name, int offSet,
> > > > AtomicInteger callBackTotal){
> > > >            super(name);
> > > >            this.offSet = offSet;
> > > >            this.callBackTotal = callBackTotal;
> > > >        }
> > > >                @Override
> > > >        public void run(){
> > > >            for(int i = offSet; i < offSet + bigNumber; i++){
> > > >                internalTotal += bigArray[i];
> > > >            }
> > > >            callBackTotal.addAndGet(internalTotal);
> > > >            System.out.println(this.getName()+" complete");
> > > >        }
> > > >    }
> > > > }
> > > >
> > >
> > -----------------------------------------------------------------------
> >
> > Andreas Prlic      Wellcome Trust Sanger Institute
> >                              Hinxton, Cambridge CB10 1SA, UK
> >                              +44 (0) 1223 49 6891
> >
> > -----------------------------------------------------------------------
> >
> >
> >
> >
> >

From markjschreiber at gmail.com  Wed Apr  9 07:54:06 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 9 Apr 2008 19:54:06 +0800
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <47FCA277.2020401@ebi.ac.uk>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>
	<47FC7E3B.9000106@ebi.ac.uk>
	<77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>
	<47FCA277.2020401@ebi.ac.uk>
Message-ID: <93b45ca50804090454j2f0ff061gbf3ddb1a247610@mail.gmail.com>

I'm not too sure which option I prefer, multi-threading by default (ie
all handled by the packages) or stateless immutable classes and
messages that can be multi-threaded.

There are arguments for both.  The former is recommended in a book I
am currently reading on concurrency which was written by the authors
of the java 1.5 concurrency package.  Essentially the classes can be
designed ahead of time to be thread safe and mutability (sometimes a
good thing) can be done with this in mind.

On the other hand stateless and immutable stuff is often safe enough
to put into a thread although _only_ as long as operations are truely
atomic.  Take for example Servlets and stateless Session Beans. They
are pretty thread safe by nescessity (use in app servers) but just
because they are stateless doens't mean you can't accedentally right
one that gives you stale data or a race condition.

In both cases thread safety needs to be designed from the start.

Currently BioJava is neither of these things and I imagine things will
start getting pretty interesting if you try to multi-thread a biojava
program right now.

- Mark

On Wed, Apr 9, 2008 at 7:03 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>
>
> Most the time any kind of farm management software (like LSF & please correct me if I'm wrong) looks at the amount of CPU time a process takes up and the number of threads it detects; not only the number of processes you have in a queue. So a multi-threaded biojava should not pose a problem to these systems. Not to mention with the newer multiple core computers; threaded software is becoming the only way to take full advantage of the available power.
>
> Where you would want to ignore multi-threading is if you are in a queue like LSF and your x number of Java processes all get chucked onto the same machine. Then if you've got so many processor hungry operations all trying to create threads ... well it's not going to behave as optimally as you might hope.
>
> Personally though I'd still air on the side of caution WRT multi-threading and not to have it as part of the default tools but as an Object I can instantiate to do my multi-threading work (so it's a choice at the user's level rather than the framework level). Then using the Java5 executor framework we let users submit work to pools of threads to do their work. Couple this with forcing us to pass around immutable messages between threads/callables (since values shared by threads are probably the number one cause of **** ups) you'll have one heck of a kick-ass scalable framework ;-)
>
> Andy
>
>
>
>
> Andreas Prlic wrote:
>
> > Hi,
> >
> > I like the idea of having support for multiple threads. Only thing is, when running BioJava on our compute farm, I am pretty sure our admins won't be happy if BJ would use more than just a single CPU, unless run on special hardware. As such there should be a BJ wide configuration management, which would allow to determine how many CPUs to be used (and the default could be all of them).
> >
> > Andreas
> >
> >
> > On 9 Apr 2008, at 09:28, Andy Yates wrote:
> >
> >
> > > Lo,
> > >
> > > This is the kind of problem Java7 is attempting to solve with the fork-join framework (which really is a rip-off of Google's MapReduce). There's two ways of looking at thread safety & how to implement it:
> > >
> > > * Packages which could be threaded or want to be threaded are programmed with threading in mind using items from the util.concurrent package to split, queue & work with data points.
> > >
> > > * Packages can be created as required & have data to process passed to them for processing in a stateless manner; much in the same way servlet engines and a lot of web frameworks run
> > >
> > > The first way does mean we can support environments with useful multi-threaded support (no point in threading on a single CPU/core box) from the word go. The second way would require some plumbing on the user's behalf but this would be very easy plumbing; the majority of which we could write (like wrapping things in instances of Callables).
> > >
> > > Anyway my 2p worth :)
> > >
> > > Andy
> > >
> > > Mark Schreiber wrote:
> > >
> > > > Hi -
> > > > I was just playing with threads to see how efficient they are on one of our old 4 CPU IBM servers.  The following fairly naive program splits a large array of numbers and sums them all up.  The multi-threaded version is 2.5 times faster even allowing for thread overhead. The program could be even better if I make more use of the java1.5 concurrent package.
> > > > Similar tasks in biojava would be include training distributions which should see similar performance improvements. Much of the current biojava doesn't make use of threads and worse, requires the developer to manage all the thread safety themselves.
> > > > - Mark
> > > > /*
> > > >  * To change this template, choose Tools | Templates
> > > >  * and open the template in the editor.
> > > >  */
> > > > package concurrent;
> > > > import java.util.concurrent.atomic.AtomicInteger;
> > > > /**
> > > >  * This program demo's the use of threads to sum a large array of integers.
> > > >  * @author Mark Schreiber
> > > >  */
> > > > public class ThreadedAdder {
> > > >    static int processors = Runtime.getRuntime().availableProcessors();
> > > >    int bigNumber = 10000000;
> > > >    int[] bigArray = new int[bigNumber * processors];
> > > >        public ThreadedAdder(){
> > > >        //make a big array of integers (10 000 000 numbers for each processor)
> > > >        for(int i = 0; i < bigArray.length; i++){
> > > >            //random number between 1 and 100
> > > >            bigArray[i] = (int)(Math.random() * 100.0);
> > > >        }
> > > >    }
> > > >    public void singleThreadedAdd(){
> > > >        int result = 0;
> > > >              //single threaded sum
> > > >        long start = System.currentTimeMillis();
> > > >        for(int number : bigArray){
> > > >            result += number;
> > > >        }
> > > >        long time = System.currentTimeMillis() - start;
> > > >        System.out.println("Calculation time = "+time+" ms");
> > > >        System.out.println("total = "+result);
> > > >            }
> > > >        public void multiThreadedAdd() throws InterruptedException{
> > > >        AtomicInteger total = new AtomicInteger();
> > > >        long start = System.currentTimeMillis();
> > > >        AddingThread[] threads = new AddingThread[processors];
> > > >        for(int i = 0; i < threads.length; i++){
> > > >            threads[i] = new AddingThread("Thread "+i, i * bigNumber, total);
> > > >            System.out.println(threads[i].getName()+" starting");
> > > >            threads[i].start();
> > > >        }
> > > >        for(Thread thread : threads){
> > > >            //make sure everyone is finished
> > > >            thread.join();
> > > >        }
> > > >        long time = System.currentTimeMillis() - start;
> > > >        System.out.println("Calculation time = "+time+" ms");
> > > >        System.out.println("total = "+total);
> > > >    }
> > > >        /**
> > > >     * @param args the command line arguments
> > > >     */
> > > >    public static void main(String[] args) throws Exception{
> > > >        //how many processors do I have?
> > > >        System.out.println("Available processors = "+processors);
> > > >        System.out.println("Initializing number array");
> > > >        ThreadedAdder adder = new ThreadedAdder();
> > > >                System.out.println("single thread add");
> > > >        adder.singleThreadedAdd();
> > > >        System.out.println("multi thread add");
> > > >        adder.multiThreadedAdd();
> > > >    }
> > > >    public class AddingThread extends Thread{
> > > >        int internalTotal = 0;
> > > >        int offSet = 0;
> > > >        AtomicInteger callBackTotal;
> > > >                public AddingThread(String name, int offSet, AtomicInteger callBackTotal){
> > > >            super(name);
> > > >            this.offSet = offSet;
> > > >            this.callBackTotal = callBackTotal;
> > > >        }
> > > >                @Override
> > > >        public void run(){
> > > >            for(int i = offSet; i < offSet + bigNumber; i++){
> > > >                internalTotal += bigArray[i];
> > > >            }
> > > >            callBackTotal.addAndGet(internalTotal);
> > > >            System.out.println(this.getName()+" complete");
> > > >        }
> > > >    }
> > > > }
> > > >
> > >
> >
> > -----------------------------------------------------------------------
> >
> > Andreas Prlic      Wellcome Trust Sanger Institute
> >                              Hinxton, Cambridge CB10 1SA, UK
> >                              +44 (0) 1223 49 6891
> >
> > -----------------------------------------------------------------------
> >
> >
> >
> >
> >
>

From markjschreiber at gmail.com  Wed Apr  9 09:12:52 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 9 Apr 2008 21:12:52 +0800
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <93b45ca50804090454j2f0ff061gbf3ddb1a247610@mail.gmail.com>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>
	<47FC7E3B.9000106@ebi.ac.uk>
	<77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>
	<47FCA277.2020401@ebi.ac.uk>
	<93b45ca50804090454j2f0ff061gbf3ddb1a247610@mail.gmail.com>
Message-ID: <93b45ca50804090612x7ba0b3b2jbb8d1e031e030dc4@mail.gmail.com>

> > Personally though I'd still air on the side of caution WRT multi-threading and not to have it as part of the default tools but as an Object I can instantiate to do my multi-threading work (so it's a choice at the user's level rather than the framework level). Then using the Java5 executor framework we let users submit work to pools of threads to do their work. Couple this with forcing us to pass around immutable messages between threads/callables (since values shared by threads are probably the number one cause of **** ups) you'll have one heck of a kick-ass scalable framework ;-)
> >
> > Andy


One area where you could get an interesting mixture of stateless and
synchronized access to a mutable would be threaded parsing of large
sequence files.  In my experience the BioJava parsers are not normally
I/O bound due to all the object building they do.  Given this a
filereader could for example read a feature block and hand it off to a
threaded stateless feature handler which produces a Feature object and
then adds it (synchronized) to the BioJava Sequence that is being
built. As long as I/O doesn't limit then you would get improved
parsing performance.  It would also be a case where the threading
should happen internally as it could be pretty hard to coordinate the
process from the outside.

This also highlights the difference between encapsulation and
immutability. Even if access to variables is controlled by package and
protected setters the class is still mutable (but not by the user).
Immutability can only be achieved by not providing any setter methods
which has obvious severe limitations.  Currently BioJava Sequence
objects have restricted mutability (use of Edit objects) but are
certainly not immutable.

Again messages need not be immutable as long as they have appropriate
locks and or synchronized getters and setters.  Many java frameworks
work best when messages or DTO's are beans (with parameterless
constructors and public getters and setters), being able to use these
is often very desirable. These beans can still be threadsafe if you
code them right.

- Mark

From ayates at ebi.ac.uk  Wed Apr  9 10:00:29 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 09 Apr 2008 15:00:29 +0100
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <93b45ca50804090454j2f0ff061gbf3ddb1a247610@mail.gmail.com>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>	
	<47FC7E3B.9000106@ebi.ac.uk>	
	<77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>	
	<47FCA277.2020401@ebi.ac.uk>
	<93b45ca50804090454j2f0ff061gbf3ddb1a247610@mail.gmail.com>
Message-ID: <47FCCBFD.1030805@ebi.ac.uk>

I admit mutability is a good thing sometimes (and as Java programmers is 
the way we've been taught to work in).

Oh I've triggered more than enough race conditions working with so 
called 'stateless' services assuming too much about how stateless they 
were (or more to the point how stateful I had made them). Anyway yes 
race conditions can occur anywhere in any bit of code but the majority 
of time I see them appearing when 'static' is used.

Yeah I would be worried about someone making a multi-threded app with 
BJ. Not impossible (far from it) but I can imagine a few edge cases 
coming in.

Andy

Mark Schreiber wrote:
> I'm not too sure which option I prefer, multi-threading by default (ie
> all handled by the packages) or stateless immutable classes and
> messages that can be multi-threaded.
> 
> There are arguments for both.  The former is recommended in a book I
> am currently reading on concurrency which was written by the authors
> of the java 1.5 concurrency package.  Essentially the classes can be
> designed ahead of time to be thread safe and mutability (sometimes a
> good thing) can be done with this in mind.
> 
> On the other hand stateless and immutable stuff is often safe enough
> to put into a thread although _only_ as long as operations are truely
> atomic.  Take for example Servlets and stateless Session Beans. They
> are pretty thread safe by nescessity (use in app servers) but just
> because they are stateless doens't mean you can't accedentally right
> one that gives you stale data or a race condition.
> 
> In both cases thread safety needs to be designed from the start.
> 
> Currently BioJava is neither of these things and I imagine things will
> start getting pretty interesting if you try to multi-thread a biojava
> program right now.
> 
> - Mark
> 
> On Wed, Apr 9, 2008 at 7:03 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>>
>> Most the time any kind of farm management software (like LSF & please correct me if I'm wrong) looks at the amount of CPU time a process takes up and the number of threads it detects; not only the number of processes you have in a queue. So a multi-threaded biojava should not pose a problem to these systems. Not to mention with the newer multiple core computers; threaded software is becoming the only way to take full advantage of the available power.
>>
>> Where you would want to ignore multi-threading is if you are in a queue like LSF and your x number of Java processes all get chucked onto the same machine. Then if you've got so many processor hungry operations all trying to create threads ... well it's not going to behave as optimally as you might hope.
>>
>> Personally though I'd still air on the side of caution WRT multi-threading and not to have it as part of the default tools but as an Object I can instantiate to do my multi-threading work (so it's a choice at the user's level rather than the framework level). Then using the Java5 executor framework we let users submit work to pools of threads to do their work. Couple this with forcing us to pass around immutable messages between threads/callables (since values shared by threads are probably the number one cause of **** ups) you'll have one heck of a kick-ass scalable framework ;-)
>>
>> Andy
>>
>>
>>
>>
>> Andreas Prlic wrote:
>>
>>> Hi,
>>>
>>> I like the idea of having support for multiple threads. Only thing is, when running BioJava on our compute farm, I am pretty sure our admins won't be happy if BJ would use more than just a single CPU, unless run on special hardware. As such there should be a BJ wide configuration management, which would allow to determine how many CPUs to be used (and the default could be all of them).
>>>
>>> Andreas
>>>
>>>
>>> On 9 Apr 2008, at 09:28, Andy Yates wrote:
>>>
>>>
>>>> Lo,
>>>>
>>>> This is the kind of problem Java7 is attempting to solve with the fork-join framework (which really is a rip-off of Google's MapReduce). There's two ways of looking at thread safety & how to implement it:
>>>>
>>>> * Packages which could be threaded or want to be threaded are programmed with threading in mind using items from the util.concurrent package to split, queue & work with data points.
>>>>
>>>> * Packages can be created as required & have data to process passed to them for processing in a stateless manner; much in the same way servlet engines and a lot of web frameworks run
>>>>
>>>> The first way does mean we can support environments with useful multi-threaded support (no point in threading on a single CPU/core box) from the word go. The second way would require some plumbing on the user's behalf but this would be very easy plumbing; the majority of which we could write (like wrapping things in instances of Callables).
>>>>
>>>> Anyway my 2p worth :)
>>>>
>>>> Andy
>>>>
>>>> Mark Schreiber wrote:
>>>>
>>>>> Hi -
>>>>> I was just playing with threads to see how efficient they are on one of our old 4 CPU IBM servers.  The following fairly naive program splits a large array of numbers and sums them all up.  The multi-threaded version is 2.5 times faster even allowing for thread overhead. The program could be even better if I make more use of the java1.5 concurrent package.
>>>>> Similar tasks in biojava would be include training distributions which should see similar performance improvements. Much of the current biojava doesn't make use of threads and worse, requires the developer to manage all the thread safety themselves.
>>>>> - Mark
>>>>> /*
>>>>>  * To change this template, choose Tools | Templates
>>>>>  * and open the template in the editor.
>>>>>  */
>>>>> package concurrent;
>>>>> import java.util.concurrent.atomic.AtomicInteger;
>>>>> /**
>>>>>  * This program demo's the use of threads to sum a large array of integers.
>>>>>  * @author Mark Schreiber
>>>>>  */
>>>>> public class ThreadedAdder {
>>>>>    static int processors = Runtime.getRuntime().availableProcessors();
>>>>>    int bigNumber = 10000000;
>>>>>    int[] bigArray = new int[bigNumber * processors];
>>>>>        public ThreadedAdder(){
>>>>>        //make a big array of integers (10 000 000 numbers for each processor)
>>>>>        for(int i = 0; i < bigArray.length; i++){
>>>>>            //random number between 1 and 100
>>>>>            bigArray[i] = (int)(Math.random() * 100.0);
>>>>>        }
>>>>>    }
>>>>>    public void singleThreadedAdd(){
>>>>>        int result = 0;
>>>>>              //single threaded sum
>>>>>        long start = System.currentTimeMillis();
>>>>>        for(int number : bigArray){
>>>>>            result += number;
>>>>>        }
>>>>>        long time = System.currentTimeMillis() - start;
>>>>>        System.out.println("Calculation time = "+time+" ms");
>>>>>        System.out.println("total = "+result);
>>>>>            }
>>>>>        public void multiThreadedAdd() throws InterruptedException{
>>>>>        AtomicInteger total = new AtomicInteger();
>>>>>        long start = System.currentTimeMillis();
>>>>>        AddingThread[] threads = new AddingThread[processors];
>>>>>        for(int i = 0; i < threads.length; i++){
>>>>>            threads[i] = new AddingThread("Thread "+i, i * bigNumber, total);
>>>>>            System.out.println(threads[i].getName()+" starting");
>>>>>            threads[i].start();
>>>>>        }
>>>>>        for(Thread thread : threads){
>>>>>            //make sure everyone is finished
>>>>>            thread.join();
>>>>>        }
>>>>>        long time = System.currentTimeMillis() - start;
>>>>>        System.out.println("Calculation time = "+time+" ms");
>>>>>        System.out.println("total = "+total);
>>>>>    }
>>>>>        /**
>>>>>     * @param args the command line arguments
>>>>>     */
>>>>>    public static void main(String[] args) throws Exception{
>>>>>        //how many processors do I have?
>>>>>        System.out.println("Available processors = "+processors);
>>>>>        System.out.println("Initializing number array");
>>>>>        ThreadedAdder adder = new ThreadedAdder();
>>>>>                System.out.println("single thread add");
>>>>>        adder.singleThreadedAdd();
>>>>>        System.out.println("multi thread add");
>>>>>        adder.multiThreadedAdd();
>>>>>    }
>>>>>    public class AddingThread extends Thread{
>>>>>        int internalTotal = 0;
>>>>>        int offSet = 0;
>>>>>        AtomicInteger callBackTotal;
>>>>>                public AddingThread(String name, int offSet, AtomicInteger callBackTotal){
>>>>>            super(name);
>>>>>            this.offSet = offSet;
>>>>>            this.callBackTotal = callBackTotal;
>>>>>        }
>>>>>                @Override
>>>>>        public void run(){
>>>>>            for(int i = offSet; i < offSet + bigNumber; i++){
>>>>>                internalTotal += bigArray[i];
>>>>>            }
>>>>>            callBackTotal.addAndGet(internalTotal);
>>>>>            System.out.println(this.getName()+" complete");
>>>>>        }
>>>>>    }
>>>>> }
>>>>>
>>> -----------------------------------------------------------------------
>>>
>>> Andreas Prlic      Wellcome Trust Sanger Institute
>>>                              Hinxton, Cambridge CB10 1SA, UK
>>>                              +44 (0) 1223 49 6891
>>>
>>> -----------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>

From ayates at ebi.ac.uk  Wed Apr  9 10:09:33 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 09 Apr 2008 15:09:33 +0100
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <93b45ca50804090612x7ba0b3b2jbb8d1e031e030dc4@mail.gmail.com>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>	
	<47FC7E3B.9000106@ebi.ac.uk>	
	<77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>	
	<47FCA277.2020401@ebi.ac.uk>	
	<93b45ca50804090454j2f0ff061gbf3ddb1a247610@mail.gmail.com>
	<93b45ca50804090612x7ba0b3b2jbb8d1e031e030dc4@mail.gmail.com>
Message-ID: <47FCCE1D.8050107@ebi.ac.uk>

That is an interesting bit of usage. You could queue the events out from 
the feature builders into the thread/callable which constructs the final 
Sequence object quite easily. Yeah very very true :)

The majority of objects are mutable in BJ I think. I'm not saying this 
is a bad thing nor suggesting everything needs to be immutable :). It's 
more about making sure only one thread is working on one object at a 
given point in the program. If there are going to be mutable objects 
hanging around then Queues are probably the best way to work with them.

Andy

> 
> One area where you could get an interesting mixture of stateless and 
> synchronized access to a mutable would be threaded parsing of large 
> sequence files.  In my experience the BioJava parsers are not
> normally I/O bound due to all the object building they do.  Given
> this a filereader could for example read a feature block and hand it
> off to a threaded stateless feature handler which produces a Feature
> object and then adds it (synchronized) to the BioJava Sequence that
> is being built. As long as I/O doesn't limit then you would get
> improved parsing performance.  It would also be a case where the
> threading should happen internally as it could be pretty hard to
> coordinate the process from the outside.
> 
> This also highlights the difference between encapsulation and 
> immutability. Even if access to variables is controlled by package
> and protected setters the class is still mutable (but not by the
> user). Immutability can only be achieved by not providing any setter
> methods which has obvious severe limitations.  Currently BioJava
> Sequence objects have restricted mutability (use of Edit objects) but
> are certainly not immutable.
> 
> Again messages need not be immutable as long as they have appropriate
>  locks and or synchronized getters and setters.  Many java frameworks
>  work best when messages or DTO's are beans (with parameterless 
> constructors and public getters and setters), being able to use these
>  is often very desirable. These beans can still be threadsafe if you 
> code them right.
> 
> - Mark

From heuermh at acm.org  Wed Apr  9 12:34:40 2008
From: heuermh at acm.org (Michael Heuer)
Date: Wed, 9 Apr 2008 12:34:40 -0400 (EDT)
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <47FCCE1D.8050107@ebi.ac.uk>
Message-ID: <Pine.GSO.4.44.0804091148510.10808-100000@shell3.shore.net>

On Wed, 9 Apr 2008, Andy Yates wrote:

> That is an interesting bit of usage. You could queue the events out from
> the feature builders into the thread/callable which constructs the final
> Sequence object quite easily. Yeah very very true :)
>
> The majority of objects are mutable in BJ I think. I'm not saying this
> is a bad thing nor suggesting everything needs to be immutable :). It's
> more about making sure only one thread is working on one object at a
> given point in the program. If there are going to be mutable objects
> hanging around then Queues are probably the best way to work with them.

I am going to crib directly from the book I think Mark was referring to
earlier:

 - It's the mutable state, stupid

  All concurrency issues boil down to coordinating access to mutable
state.  The less mutable state, the easier it is to ensure thread safety.

 - Make fields final unless they need to be mutable

 - Immutable objects are automatically thread-safe

  Immutable objects simplify concurrent programming tremendously.  They
are simper and safer, and can be shared freely without locking or
defensive copying.

"Java Concurrency in Practice", Goetz et al., 2006, p110.
http://www.javaconcurrencyinpractice.com/


The Immutable with Copy Mutators pattern provides "setter"-like methods
that return copies of the immutable object:

  /**
   * Return a copy of this foo with the bar set to <code>bar</code>.
   *
   * <p>Foo is immutable, so there are no set methods.  Instead, this
   * method returns a new instance of Foo copied from <code>this</code>
   * with the value of bar changed.</p>
   *
   * @param bar bar for the copy of this foo
   * @return a copy of this fo with the bar set to <code>bar</code>
   */
  public Foo withBar(final Bar bar)
  {
    Foo copy = new Foo(..., bar);
    return copy;
  }

This is used in JodaTime, JSR-310, and elsewhere.  I have a template I use
to generate classes in this style at

http://tinyurl.com/6n2nhp


> > Mark Schreiber wrote:
> > One area where you could get an interesting mixture of stateless and
> > synchronized access to a mutable would be threaded parsing of large
> > sequence files.  In my experience the BioJava parsers are not
> > normally I/O bound due to all the object building they do.  Given
> > this a filereader could for example read a feature block and hand it
> > off to a threaded stateless feature handler which produces a Feature
> > object and then adds it (synchronized) to the BioJava Sequence that
> > is being built. As long as I/O doesn't limit then you would get
> > improved parsing performance.  It would also be a case where the
> > threading should happen internally as it could be pretty hard to
> > coordinate the process from the outside.
> >
> > This also highlights the difference between encapsulation and
> > immutability. Even if access to variables is controlled by package
> > and protected setters the class is still mutable (but not by the
> > user). Immutability can only be achieved by not providing any setter
> > methods which has obvious severe limitations.  Currently BioJava
> > Sequence objects have restricted mutability (use of Edit objects) but
> > are certainly not immutable.
> >
> > Again messages need not be immutable as long as they have appropriate
> >  locks and or synchronized getters and setters.  Many java frameworks
> >  work best when messages or DTO's are beans (with parameterless
> > constructors and public getters and setters), being able to use these
> >  is often very desirable. These beans can still be threadsafe if you
> > code them right.

What might that look like?

I have to think in most cases (DTOs, form beans, etc) are safe only
because the container is managing the lifecycle of those beans.


Perhaps we might want to copy some of this discussion to

http://biojava.org/wiki/Talk:BioJava3_Design

or a new page about concurrency issues when we are finished.

   michael


From ayates at ebi.ac.uk  Thu Apr 10 04:36:41 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Thu, 10 Apr 2008 09:36:41 +0100
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <Pine.GSO.4.44.0804091148510.10808-100000@shell3.shore.net>
References: <Pine.GSO.4.44.0804091148510.10808-100000@shell3.shore.net>
Message-ID: <47FDD199.4010606@ebi.ac.uk>

All of that looks very reasonable to me; I really should get round to 
reading that book soon :). The only thing that worries me about the 
constructor copy is object churn but as far as I'm aware that is a worry 
from the older days of Java & doesn't hold up with the later VMs.

It seems as we have two use-cases for concurrency in the 'newer' biojava:

* Using concurrency to speed up a process which is not CPU limited & is 
part of the core API

* Using concurrency to speed up a process which is CPU limited but can 
be sped up on machines with more that one core

Each scenario needs a different way of 'triggering' the concurrency. The 
first as people have said some kind of System property might be a good 
way to either enable multiple threads or disable it completely; this 
also needs to be designed with good concurrent practice in mind from the 
start. The second way is by user intention i.e. they use the 
multi-threaded pyhlogenetics package.

Does that sound okay?

Andy

Michael Heuer wrote:
> On Wed, 9 Apr 2008, Andy Yates wrote:
> 
>> That is an interesting bit of usage. You could queue the events out from
>> the feature builders into the thread/callable which constructs the final
>> Sequence object quite easily. Yeah very very true :)
>>
>> The majority of objects are mutable in BJ I think. I'm not saying this
>> is a bad thing nor suggesting everything needs to be immutable :). It's
>> more about making sure only one thread is working on one object at a
>> given point in the program. If there are going to be mutable objects
>> hanging around then Queues are probably the best way to work with them.
> 
> I am going to crib directly from the book I think Mark was referring to
> earlier:
> 
>  - It's the mutable state, stupid
> 
>   All concurrency issues boil down to coordinating access to mutable
> state.  The less mutable state, the easier it is to ensure thread safety.
> 
>  - Make fields final unless they need to be mutable
> 
>  - Immutable objects are automatically thread-safe
> 
>   Immutable objects simplify concurrent programming tremendously.  They
> are simper and safer, and can be shared freely without locking or
> defensive copying.
> 
> "Java Concurrency in Practice", Goetz et al., 2006, p110.
> http://www.javaconcurrencyinpractice.com/
> 
> 
> The Immutable with Copy Mutators pattern provides "setter"-like methods
> that return copies of the immutable object:
> 
>   /**
>    * Return a copy of this foo with the bar set to <code>bar</code>.
>    *
>    * <p>Foo is immutable, so there are no set methods.  Instead, this
>    * method returns a new instance of Foo copied from <code>this</code>
>    * with the value of bar changed.</p>
>    *
>    * @param bar bar for the copy of this foo
>    * @return a copy of this fo with the bar set to <code>bar</code>
>    */
>   public Foo withBar(final Bar bar)
>   {
>     Foo copy = new Foo(..., bar);
>     return copy;
>   }
> 
> This is used in JodaTime, JSR-310, and elsewhere.  I have a template I use
> to generate classes in this style at
> 
> http://tinyurl.com/6n2nhp
> 
> 
>>> Mark Schreiber wrote:
>>> One area where you could get an interesting mixture of stateless and
>>> synchronized access to a mutable would be threaded parsing of large
>>> sequence files.  In my experience the BioJava parsers are not
>>> normally I/O bound due to all the object building they do.  Given
>>> this a filereader could for example read a feature block and hand it
>>> off to a threaded stateless feature handler which produces a Feature
>>> object and then adds it (synchronized) to the BioJava Sequence that
>>> is being built. As long as I/O doesn't limit then you would get
>>> improved parsing performance.  It would also be a case where the
>>> threading should happen internally as it could be pretty hard to
>>> coordinate the process from the outside.
>>>
>>> This also highlights the difference between encapsulation and
>>> immutability. Even if access to variables is controlled by package
>>> and protected setters the class is still mutable (but not by the
>>> user). Immutability can only be achieved by not providing any setter
>>> methods which has obvious severe limitations.  Currently BioJava
>>> Sequence objects have restricted mutability (use of Edit objects) but
>>> are certainly not immutable.
>>>
>>> Again messages need not be immutable as long as they have appropriate
>>>  locks and or synchronized getters and setters.  Many java frameworks
>>>  work best when messages or DTO's are beans (with parameterless
>>> constructors and public getters and setters), being able to use these
>>>  is often very desirable. These beans can still be threadsafe if you
>>> code them right.
> 
> What might that look like?
> 
> I have to think in most cases (DTOs, form beans, etc) are safe only
> because the container is managing the lifecycle of those beans.
> 
> 
> Perhaps we might want to copy some of this discussion to
> 
> http://biojava.org/wiki/Talk:BioJava3_Design
> 
> or a new page about concurrency issues when we are finished.
> 
>    michael

From markjschreiber at gmail.com  Thu Apr 10 07:40:44 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Thu, 10 Apr 2008 19:40:44 +0800
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <47FDD199.4010606@ebi.ac.uk>
References: <Pine.GSO.4.44.0804091148510.10808-100000@shell3.shore.net>
	<47FDD199.4010606@ebi.ac.uk>
Message-ID: <93b45ca50804100440u5afacfa0o650ed162aef6a9c1@mail.gmail.com>

> * Using concurrency to speed up a process which is not CPU limited & is part
> of the core API
>

Do you have a specific example in mind? Something blocking that needs
to be non-blocking? The parseing example could be one (as i/o blocks
during parsing) but I think it actually might be CPU limited as well.


> * Using concurrency to speed up a process which is CPU limited but can be
> sped up on machines with more that one core
>

Yes. Seems almost ever modern machine is dual core nowadays, we should
take advantage of this.

> Each scenario needs a different way of 'triggering' the concurrency. The
> first as people have said some kind of System property might be a good way
> to either enable multiple threads or disable it completely; this also needs
> to be designed with good concurrent practice in mind from the start. The

It would be good to make it configurable via the presence of a
properties file or similar. Default could be to use all available
processors, which can be determined from the Runtime object. This
approach would let users control how much of their machines grunt is
used for heavy lifting.

This approach would also allow users to test and tune for any
installation. In recent tests I have noticed that a task has to be
reasonably expensive to be worth spawning more threads (to get a
quicker run time). The definition of expensive really depends on the
machine. One task on an old linux 4 CPU machine got a 2 fold speed up
by using all CPUs. The exact same task on a new dual core laptop
actually slowed down as the thread spawning was slower than the
calculation. A much harder calculation on this machine did improve
with threading.  Control of this via a property would let you set the
appropriate strategy on any deployment.

> second way is by user intention i.e. they use the multi-threaded
> pyhlogenetics package.
>

Some packages should be threaded even if there is only one processor
to prevent blocking. For example parsing should spawn at least one
thread that is seperate from the i/o thread even on a single CPU
system. Much as swing is threaded to prevent GUI blocking.

- Mark


> Does that sound okay?
>
> Andy
>
>
>
> Michael Heuer wrote:
> > On Wed, 9 Apr 2008, Andy Yates wrote:
> >
> >
> > > That is an interesting bit of usage. You could queue the events out from
> > > the feature builders into the thread/callable which constructs the final
> > > Sequence object quite easily. Yeah very very true :)
> > >
> > > The majority of objects are mutable in BJ I think. I'm not saying this
> > > is a bad thing nor suggesting everything needs to be immutable :). It's
> > > more about making sure only one thread is working on one object at a
> > > given point in the program. If there are going to be mutable objects
> > > hanging around then Queues are probably the best way to work with them.
> > >
> >
> > I am going to crib directly from the book I think Mark was referring to
> > earlier:
> >
> >  - It's the mutable state, stupid
> >
> >  All concurrency issues boil down to coordinating access to mutable
> > state.  The less mutable state, the easier it is to ensure thread safety.
> >
> >  - Make fields final unless they need to be mutable
> >
> >  - Immutable objects are automatically thread-safe
> >
> >  Immutable objects simplify concurrent programming tremendously.  They
> > are simper and safer, and can be shared freely without locking or
> > defensive copying.
> >
> > "Java Concurrency in Practice", Goetz et al., 2006, p110.
> > http://www.javaconcurrencyinpractice.com/
> >
> >
> > The Immutable with Copy Mutators pattern provides "setter"-like methods
> > that return copies of the immutable object:
> >
> >  /**
> >   * Return a copy of this foo with the bar set to <code>bar</code>.
> >   *
> >   * <p>Foo is immutable, so there are no set methods.  Instead, this
> >   * method returns a new instance of Foo copied from <code>this</code>
> >   * with the value of bar changed.</p>
> >   *
> >   * @param bar bar for the copy of this foo
> >   * @return a copy of this fo with the bar set to <code>bar</code>
> >   */
> >  public Foo withBar(final Bar bar)
> >  {
> >    Foo copy = new Foo(..., bar);
> >    return copy;
> >  }
> >
> > This is used in JodaTime, JSR-310, and elsewhere.  I have a template I use
> > to generate classes in this style at
> >
> > http://tinyurl.com/6n2nhp
> >
> >
> >
> > >
> > > > Mark Schreiber wrote:
> > > > One area where you could get an interesting mixture of stateless and
> > > > synchronized access to a mutable would be threaded parsing of large
> > > > sequence files.  In my experience the BioJava parsers are not
> > > > normally I/O bound due to all the object building they do.  Given
> > > > this a filereader could for example read a feature block and hand it
> > > > off to a threaded stateless feature handler which produces a Feature
> > > > object and then adds it (synchronized) to the BioJava Sequence that
> > > > is being built. As long as I/O doesn't limit then you would get
> > > > improved parsing performance.  It would also be a case where the
> > > > threading should happen internally as it could be pretty hard to
> > > > coordinate the process from the outside.
> > > >
> > > > This also highlights the difference between encapsulation and
> > > > immutability. Even if access to variables is controlled by package
> > > > and protected setters the class is still mutable (but not by the
> > > > user). Immutability can only be achieved by not providing any setter
> > > > methods which has obvious severe limitations.  Currently BioJava
> > > > Sequence objects have restricted mutability (use of Edit objects) but
> > > > are certainly not immutable.
> > > >
> > > > Again messages need not be immutable as long as they have appropriate
> > > >  locks and or synchronized getters and setters.  Many java frameworks
> > > >  work best when messages or DTO's are beans (with parameterless
> > > > constructors and public getters and setters), being able to use these
> > > >  is often very desirable. These beans can still be threadsafe if you
> > > > code them right.
> > > >
> > >
> >
> > What might that look like?
> >
> > I have to think in most cases (DTOs, form beans, etc) are safe only
> > because the container is managing the lifecycle of those beans.
> >
> >
> > Perhaps we might want to copy some of this discussion to
> >
> > http://biojava.org/wiki/Talk:BioJava3_Design
> >
> > or a new page about concurrency issues when we are finished.
> >
> >   michael
> >
>

From ap3 at sanger.ac.uk  Sun Apr 13 14:02:41 2008
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Sun, 13 Apr 2008 19:02:41 +0100
Subject: [Biojava-dev] biojava 1.6 released
Message-ID: <0A060667-C24C-4D41-8D10-ED1D449A5F62@sanger.ac.uk>


Biojava 1.6 has been released and is available from http:// 
biojava.org/wiki/BioJava:Download

Biojava 1.6 offers more functionality and stability over the previous  
official releases. BioJava now depends on Java 1.5+. We highly  
recommend you to upgrade as soon as possible.

In detail, the phylo package org.biojavax.bio.phylo was improved and  
expanded by our GSOC'07 student Boh-Yun Lee. It now contains fully- 
functional Nexus and Phylip parsers, and tools for calculating UPGMA  
and Neighbour Joining, Jukes-Kantor and Kimura Two Parameter, and MP.  
It uses JGraphT to represent parsed trees.

The PDB file parser was improved by Jules Jacobsen for better dealing  
with PDB header records. Andreas Draeger provided several patches for  
improving the Genetic Algorithm modules. Additionally this release  
contains numerous bug fixes and documentation improvements.

Thanks to the entire biojava community for making this possible!

Happy Biojava-ing,

Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
                               +44 (0) 1223 49 6891

-----------------------------------------------------------------------


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From darin.london at duke.edu  Tue Apr 29 12:48:33 2008
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Tue, 29 Apr 2008 12:48:33 -0400
Subject: [Biojava-dev] BOSC 2008 Announcement and Call For Submissions
Message-ID: <200804291648.m3TGmXk7020802@tenero.duhs.duke.edu>


BOSC 2008 Call for Abstracts Reminder

The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008).

This is a reminder to submit your proposals for talks to the BOSC submission system before May 11.

Submission Process:
All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php).
The form will ask for a small Abstract Text to be pasted into it, and a full paper.  The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details)
Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom.  The full-length abstract should include the title, authors, and affiliations.  We prefer your abstract to be in PDF format, although plain t

Important Dates:
May 11: Abstract submission deadline.
June 2: Notification of accepted talks.
June 4: Early registration discount cut-off.
July 18-19: BOSC 2008!

We hope to see you at BOSC 2008!

Kam Dahlquist and Darin London
BOSC 2008 Co-organizers

			 
From ap3 at sanger.ac.uk  Wed Apr 30 06:49:21 2008
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Wed, 30 Apr 2008 11:49:21 +0100
Subject: [Biojava-dev] new uniprot file format
Message-ID: <00FA5524-C0B6-4293-84B8-496934B56398@sanger.ac.uk>

Hi,

There is a change in the uniprot file format coming up beginning of July

http://ca.expasy.org/sprot/relnotes/sp_soon.html

Having had a quick look at the code I think we will need a patch to  
allow access to the EC numbers and other sub-category data...

Cheers,
Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
                               +44 (0) 1223 49 6891

-----------------------------------------------------------------------


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From thpar at psb.ugent.be  Mon Apr 14 05:28:03 2008
From: thpar at psb.ugent.be (Thomas Van Parys)
Date: Mon, 14 Apr 2008 09:28:03 -0000
Subject: [Biojava-dev] [Biojava-l] biojava 1.6 released
In-Reply-To: <0A060667-C24C-4D41-8D10-ED1D449A5F62@sanger.ac.uk>
References: <0A060667-C24C-4D41-8D10-ED1D449A5F62@sanger.ac.uk>
Message-ID: <48032123.7010803@psb.ugent.be>

Andreas Prlic schreef:
> 
> Biojava 1.6 has been released and is available from 
> http://biojava.org/wiki/BioJava:Download
> 

Hi,

Thanks for the new release, but is there any chance that there's 
something wrong with the download?
Firefox hangs when trying to download and wget gives me a jar file that 
doesn't contain the source code.

http://www.biojava.org/download/bj16/all/biojava-1.6-all.jar


regards,
Thomas

-- 
==================================================================
Thomas Van Parys
Tel:+32 (0)9 331 36 95                        fax:+32 (0)9 3313809
VIB Department of Plant Systems Biology, Ghent University
Technologiepark 927, 9052 Gent, BELGIUM
thomas.vanparys at psb.ugent.be    http://bioinformatics.psb.ugent.be
==================================================================

From Stefan.Pinkernell at awi.de  Mon Apr 14 07:06:02 2008
From: Stefan.Pinkernell at awi.de (Stefan Pinkernell)
Date: Mon, 14 Apr 2008 11:06:02 -0000
Subject: [Biojava-dev] biojava 1.6 released
In-Reply-To: <0A060667-C24C-4D41-8D10-ED1D449A5F62@sanger.ac.uk>
References: <0A060667-C24C-4D41-8D10-ED1D449A5F62@sanger.ac.uk>
Message-ID: <48033501.4050804@awi.de>

Dear all,
I just loaded the new Biojava 1.6 package (biojava-all.jar) but it seems 
the sources are missing. Where can I find them?

Best regards,

   Stefan

Andreas Prlic schrieb:
>
> Biojava 1.6 has been released and is available from 
> http://biojava.org/wiki/BioJava:Download
>


From ap3 at sanger.ac.uk  Sat Apr  5 12:53:53 2008
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Sat, 5 Apr 2008 13:53:53 +0100
Subject: [Biojava-dev] preparations for release 1.6 - svn freeze
In-Reply-To: <78ECCE6A-F8CC-45AA-862B-F7D8BFC65EA0@sanger.ac.uk>
References: <78ECCE6A-F8CC-45AA-862B-F7D8BFC65EA0@sanger.ac.uk>
Message-ID: <65617D9D-0E0C-476F-A515-1222733DA9C2@sanger.ac.uk>

Hi,

In preparation for the 1.6 release, please do not commit any new  
features into svn from now until the release. Javadoc improvements  
are still welcome.

There were 2 patches end of last week, regarding the Genetic  
Algorithms and PDB file header parsing. I suggest to give those a  
week to make sure they are fine and target next weekend for the release.

Andreas


On 26 Mar 2008, at 13:55, Andreas Prlic wrote:

> Hi,
>
> The biojava 1.6 release candidate 1 has been available now for a  
> while and I would like to proceed with releasing the final biojava  
> 1.6.
>
> I ran doccheck on the latest SVN and we still could do with  some  
> javadoc improvements:
> http://www.spice-3d.org/doccheck/biojava-svn/biojava/ 
> PackageStatistics.html
>
> Please commit any remaining bug fixes to SVN until
>
> Friday, April 4th 18:00 GMT
>
> I will do the release (and SVN branch) after that.
>
> Cheers,
> Andreas
>
>
>
> ---------------------------------------------------------------------- 
> -
>
> Andreas Prlic      Wellcome Trust Sanger Institute
>                               Hinxton, Cambridge CB10 1SA, UK
>                               +44 (0) 1223 49 6891
>
> ---------------------------------------------------------------------- 
> -
>
>
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome  
> ResearchLimited, a charity registered in England with number  
> 1021457 and acompany registered in England with number 2742969,  
> whose registeredoffice is 215 Euston Road, London, NW1  
> 2BE._______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
                               +44 (0) 1223 49 6891

-----------------------------------------------------------------------


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From ap3 at sanger.ac.uk  Wed Apr  9 10:40:58 2008
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Wed, 9 Apr 2008 11:40:58 +0100
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <47FC7E3B.9000106@ebi.ac.uk>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>
	<47FC7E3B.9000106@ebi.ac.uk>
Message-ID: <77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>

Hi,

I like the idea of having support for multiple threads. Only thing  
is, when running BioJava on our compute farm, I am pretty sure our  
admins won't be happy if BJ would use more than just a single CPU,  
unless run on special hardware. As such there should be a BJ wide  
configuration management, which would allow to determine how many  
CPUs to be used (and the default could be all of them).

Andreas


On 9 Apr 2008, at 09:28, Andy Yates wrote:

> Lo,
>
> This is the kind of problem Java7 is attempting to solve with the  
> fork-join framework (which really is a rip-off of Google's  
> MapReduce). There's two ways of looking at thread safety & how to  
> implement it:
>
> * Packages which could be threaded or want to be threaded are  
> programmed with threading in mind using items from the  
> util.concurrent package to split, queue & work with data points.
>
> * Packages can be created as required & have data to process passed  
> to them for processing in a stateless manner; much in the same way  
> servlet engines and a lot of web frameworks run
>
> The first way does mean we can support environments with useful  
> multi-threaded support (no point in threading on a single CPU/core  
> box) from the word go. The second way would require some plumbing  
> on the user's behalf but this would be very easy plumbing; the  
> majority of which we could write (like wrapping things in instances  
> of Callables).
>
> Anyway my 2p worth :)
>
> Andy
>
> Mark Schreiber wrote:
>> Hi -
>> I was just playing with threads to see how efficient they are on  
>> one of our old 4 CPU IBM servers.  The following fairly naive  
>> program splits a large array of numbers and sums them all up.  The  
>> multi-threaded version is 2.5 times faster even allowing for  
>> thread overhead. The program could be even better if I make more  
>> use of the java1.5 concurrent package.
>> Similar tasks in biojava would be include training distributions  
>> which should see similar performance improvements. Much of the  
>> current biojava doesn't make use of threads and worse, requires  
>> the developer to manage all the thread safety themselves.
>> - Mark
>> /*
>>  * To change this template, choose Tools | Templates
>>  * and open the template in the editor.
>>  */
>> package concurrent;
>> import java.util.concurrent.atomic.AtomicInteger;
>> /**
>>  * This program demo's the use of threads to sum a large array of  
>> integers.
>>  * @author Mark Schreiber
>>  */
>> public class ThreadedAdder {
>>     static int processors = Runtime.getRuntime 
>> ().availableProcessors();
>>     int bigNumber = 10000000;
>>     int[] bigArray = new int[bigNumber * processors];
>>         public ThreadedAdder(){
>>         //make a big array of integers (10 000 000 numbers for  
>> each processor)
>>         for(int i = 0; i < bigArray.length; i++){
>>             //random number between 1 and 100
>>             bigArray[i] = (int)(Math.random() * 100.0);
>>         }
>>     }
>>     public void singleThreadedAdd(){
>>         int result = 0;
>>               //single threaded sum
>>         long start = System.currentTimeMillis();
>>         for(int number : bigArray){
>>             result += number;
>>         }
>>         long time = System.currentTimeMillis() - start;
>>         System.out.println("Calculation time = "+time+" ms");
>>         System.out.println("total = "+result);
>>             }
>>         public void multiThreadedAdd() throws InterruptedException{
>>         AtomicInteger total = new AtomicInteger();
>>         long start = System.currentTimeMillis();
>>         AddingThread[] threads = new AddingThread[processors];
>>         for(int i = 0; i < threads.length; i++){
>>             threads[i] = new AddingThread("Thread "+i, i *  
>> bigNumber, total);
>>             System.out.println(threads[i].getName()+" starting");
>>             threads[i].start();
>>         }
>>         for(Thread thread : threads){
>>             //make sure everyone is finished
>>             thread.join();
>>         }
>>         long time = System.currentTimeMillis() - start;
>>         System.out.println("Calculation time = "+time+" ms");
>>         System.out.println("total = "+total);
>>     }
>>         /**
>>      * @param args the command line arguments
>>      */
>>     public static void main(String[] args) throws Exception{
>>         //how many processors do I have?
>>         System.out.println("Available processors = "+processors);
>>         System.out.println("Initializing number array");
>>         ThreadedAdder adder = new ThreadedAdder();
>>                 System.out.println("single thread add");
>>         adder.singleThreadedAdd();
>>         System.out.println("multi thread add");
>>         adder.multiThreadedAdd();
>>     }
>>     public class AddingThread extends Thread{
>>         int internalTotal = 0;
>>         int offSet = 0;
>>         AtomicInteger callBackTotal;
>>                 public AddingThread(String name, int offSet,  
>> AtomicInteger callBackTotal){
>>             super(name);
>>             this.offSet = offSet;
>>             this.callBackTotal = callBackTotal;
>>         }
>>                 @Override
>>         public void run(){
>>             for(int i = offSet; i < offSet + bigNumber; i++){
>>                 internalTotal += bigArray[i];
>>             }
>>             callBackTotal.addAndGet(internalTotal);
>>             System.out.println(this.getName()+" complete");
>>         }
>>     }
>> }

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
                               +44 (0) 1223 49 6891

-----------------------------------------------------------------------


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From ayates at ebi.ac.uk  Wed Apr  9 11:03:19 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 09 Apr 2008 12:03:19 +0100
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>
	<47FC7E3B.9000106@ebi.ac.uk>
	<77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>
Message-ID: <47FCA277.2020401@ebi.ac.uk>


Most the time any kind of farm management software (like LSF & please 
correct me if I'm wrong) looks at the amount of CPU time a process takes 
up and the number of threads it detects; not only the number of 
processes you have in a queue. So a multi-threaded biojava should not 
pose a problem to these systems. Not to mention with the newer multiple 
core computers; threaded software is becoming the only way to take full 
advantage of the available power.

Where you would want to ignore multi-threading is if you are in a queue 
like LSF and your x number of Java processes all get chucked onto the 
same machine. Then if you've got so many processor hungry operations all 
trying to create threads ... well it's not going to behave as optimally 
as you might hope.

Personally though I'd still air on the side of caution WRT 
multi-threading and not to have it as part of the default tools but as 
an Object I can instantiate to do my multi-threading work (so it's a 
choice at the user's level rather than the framework level). Then using 
the Java5 executor framework we let users submit work to pools of 
threads to do their work. Couple this with forcing us to pass around 
immutable messages between threads/callables (since values shared by 
threads are probably the number one cause of **** ups) you'll have one 
heck of a kick-ass scalable framework ;-)

Andy

Andreas Prlic wrote:
> Hi,
> 
> I like the idea of having support for multiple threads. Only thing is, 
> when running BioJava on our compute farm, I am pretty sure our admins 
> won't be happy if BJ would use more than just a single CPU, unless run 
> on special hardware. As such there should be a BJ wide configuration 
> management, which would allow to determine how many CPUs to be used (and 
> the default could be all of them).
> 
> Andreas
> 
> 
> On 9 Apr 2008, at 09:28, Andy Yates wrote:
> 
>> Lo,
>>
>> This is the kind of problem Java7 is attempting to solve with the 
>> fork-join framework (which really is a rip-off of Google's MapReduce). 
>> There's two ways of looking at thread safety & how to implement it:
>>
>> * Packages which could be threaded or want to be threaded are 
>> programmed with threading in mind using items from the util.concurrent 
>> package to split, queue & work with data points.
>>
>> * Packages can be created as required & have data to process passed to 
>> them for processing in a stateless manner; much in the same way 
>> servlet engines and a lot of web frameworks run
>>
>> The first way does mean we can support environments with useful 
>> multi-threaded support (no point in threading on a single CPU/core 
>> box) from the word go. The second way would require some plumbing on 
>> the user's behalf but this would be very easy plumbing; the majority 
>> of which we could write (like wrapping things in instances of Callables).
>>
>> Anyway my 2p worth :)
>>
>> Andy
>>
>> Mark Schreiber wrote:
>>> Hi -
>>> I was just playing with threads to see how efficient they are on one 
>>> of our old 4 CPU IBM servers.  The following fairly naive program 
>>> splits a large array of numbers and sums them all up.  The 
>>> multi-threaded version is 2.5 times faster even allowing for thread 
>>> overhead. The program could be even better if I make more use of the 
>>> java1.5 concurrent package.
>>> Similar tasks in biojava would be include training distributions 
>>> which should see similar performance improvements. Much of the 
>>> current biojava doesn't make use of threads and worse, requires the 
>>> developer to manage all the thread safety themselves.
>>> - Mark
>>> /*
>>>  * To change this template, choose Tools | Templates
>>>  * and open the template in the editor.
>>>  */
>>> package concurrent;
>>> import java.util.concurrent.atomic.AtomicInteger;
>>> /**
>>>  * This program demo's the use of threads to sum a large array of 
>>> integers.
>>>  * @author Mark Schreiber
>>>  */
>>> public class ThreadedAdder {
>>>     static int processors = Runtime.getRuntime().availableProcessors();
>>>     int bigNumber = 10000000;
>>>     int[] bigArray = new int[bigNumber * processors];
>>>         public ThreadedAdder(){
>>>         //make a big array of integers (10 000 000 numbers for each 
>>> processor)
>>>         for(int i = 0; i < bigArray.length; i++){
>>>             //random number between 1 and 100
>>>             bigArray[i] = (int)(Math.random() * 100.0);
>>>         }
>>>     }
>>>     public void singleThreadedAdd(){
>>>         int result = 0;
>>>               //single threaded sum
>>>         long start = System.currentTimeMillis();
>>>         for(int number : bigArray){
>>>             result += number;
>>>         }
>>>         long time = System.currentTimeMillis() - start;
>>>         System.out.println("Calculation time = "+time+" ms");
>>>         System.out.println("total = "+result);
>>>             }
>>>         public void multiThreadedAdd() throws InterruptedException{
>>>         AtomicInteger total = new AtomicInteger();
>>>         long start = System.currentTimeMillis();
>>>         AddingThread[] threads = new AddingThread[processors];
>>>         for(int i = 0; i < threads.length; i++){
>>>             threads[i] = new AddingThread("Thread "+i, i * bigNumber, 
>>> total);
>>>             System.out.println(threads[i].getName()+" starting");
>>>             threads[i].start();
>>>         }
>>>         for(Thread thread : threads){
>>>             //make sure everyone is finished
>>>             thread.join();
>>>         }
>>>         long time = System.currentTimeMillis() - start;
>>>         System.out.println("Calculation time = "+time+" ms");
>>>         System.out.println("total = "+total);
>>>     }
>>>         /**
>>>      * @param args the command line arguments
>>>      */
>>>     public static void main(String[] args) throws Exception{
>>>         //how many processors do I have?
>>>         System.out.println("Available processors = "+processors);
>>>         System.out.println("Initializing number array");
>>>         ThreadedAdder adder = new ThreadedAdder();
>>>                 System.out.println("single thread add");
>>>         adder.singleThreadedAdd();
>>>         System.out.println("multi thread add");
>>>         adder.multiThreadedAdd();
>>>     }
>>>     public class AddingThread extends Thread{
>>>         int internalTotal = 0;
>>>         int offSet = 0;
>>>         AtomicInteger callBackTotal;
>>>                 public AddingThread(String name, int offSet, 
>>> AtomicInteger callBackTotal){
>>>             super(name);
>>>             this.offSet = offSet;
>>>             this.callBackTotal = callBackTotal;
>>>         }
>>>                 @Override
>>>         public void run(){
>>>             for(int i = offSet; i < offSet + bigNumber; i++){
>>>                 internalTotal += bigArray[i];
>>>             }
>>>             callBackTotal.addAndGet(internalTotal);
>>>             System.out.println(this.getName()+" complete");
>>>         }
>>>     }
>>> }
> 
> -----------------------------------------------------------------------
> 
> Andreas Prlic      Wellcome Trust Sanger Institute
>                               Hinxton, Cambridge CB10 1SA, UK
>                               +44 (0) 1223 49 6891
> 
> -----------------------------------------------------------------------
> 
> 
> 
> 


From markjschreiber at gmail.com  Wed Apr  9 11:45:16 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 9 Apr 2008 19:45:16 +0800
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <47FCA277.2020401@ebi.ac.uk>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>
	<47FC7E3B.9000106@ebi.ac.uk>
	<77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>
	<47FCA277.2020401@ebi.ac.uk>
Message-ID: <93b45ca50804090445t5e04c555ue7ce8ff90d852c97@mail.gmail.com>

Andy is right on this, a JVM can use at most the available CPUs on one
machine (and sometimes not even that).

Unless there is a very sophisticated farm management system that makes it
look like all 100 cores are on the same machine then there is no chance that
the JVM can take over more than one machine (unless you start another whole
JVM from within your program).

On Wed, Apr 9, 2008 at 7:03 PM, Andy Yates <ayates at ebi.ac.uk> wrote:

>
>
> Most the time any kind of farm management software (like LSF & please
> correct me if I'm wrong) looks at the amount of CPU time a process takes up
> and the number of threads it detects; not only the number of processes you
> have in a queue. So a multi-threaded biojava should not pose a problem to
> these systems. Not to mention with the newer multiple core computers;
> threaded software is becoming the only way to take full advantage of the
> available power.
>
> Where you would want to ignore multi-threading is if you are in a queue
> like LSF and your x number of Java processes all get chucked onto the same
> machine. Then if you've got so many processor hungry operations all trying
> to create threads ... well it's not going to behave as optimally as you
> might hope.
>
> Personally though I'd still air on the side of caution WRT multi-threading
> and not to have it as part of the default tools but as an Object I can
> instantiate to do my multi-threading work (so it's a choice at the user's
> level rather than the framework level). Then using the Java5 executor
> framework we let users submit work to pools of threads to do their work.
> Couple this with forcing us to pass around immutable messages between
> threads/callables (since values shared by threads are probably the number
> one cause of **** ups) you'll have one heck of a kick-ass scalable framework
> ;-)
>
> Andy
>
>
> Andreas Prlic wrote:
>
> > Hi,
> >
> > I like the idea of having support for multiple threads. Only thing is,
> > when running BioJava on our compute farm, I am pretty sure our admins won't
> > be happy if BJ would use more than just a single CPU, unless run on special
> > hardware. As such there should be a BJ wide configuration management, which
> > would allow to determine how many CPUs to be used (and the default could be
> > all of them).
> >
> > Andreas
> >
> >
> > On 9 Apr 2008, at 09:28, Andy Yates wrote:
> >
> > Lo,
> > >
> > > This is the kind of problem Java7 is attempting to solve with the
> > > fork-join framework (which really is a rip-off of Google's MapReduce).
> > > There's two ways of looking at thread safety & how to implement it:
> > >
> > > * Packages which could be threaded or want to be threaded are
> > > programmed with threading in mind using items from the util.concurrent
> > > package to split, queue & work with data points.
> > >
> > > * Packages can be created as required & have data to process passed to
> > > them for processing in a stateless manner; much in the same way servlet
> > > engines and a lot of web frameworks run
> > >
> > > The first way does mean we can support environments with useful
> > > multi-threaded support (no point in threading on a single CPU/core box) from
> > > the word go. The second way would require some plumbing on the user's behalf
> > > but this would be very easy plumbing; the majority of which we could write
> > > (like wrapping things in instances of Callables).
> > >
> > > Anyway my 2p worth :)
> > >
> > > Andy
> > >
> > > Mark Schreiber wrote:
> > >
> > > > Hi -
> > > > I was just playing with threads to see how efficient they are on one
> > > > of our old 4 CPU IBM servers.  The following fairly naive program splits a
> > > > large array of numbers and sums them all up.  The multi-threaded version is
> > > > 2.5 times faster even allowing for thread overhead. The program could be
> > > > even better if I make more use of the java1.5 concurrent package.
> > > > Similar tasks in biojava would be include training distributions
> > > > which should see similar performance improvements. Much of the current
> > > > biojava doesn't make use of threads and worse, requires the developer to
> > > > manage all the thread safety themselves.
> > > > - Mark
> > > > /*
> > > >  * To change this template, choose Tools | Templates
> > > >  * and open the template in the editor.
> > > >  */
> > > > package concurrent;
> > > > import java.util.concurrent.atomic.AtomicInteger;
> > > > /**
> > > >  * This program demo's the use of threads to sum a large array of
> > > > integers.
> > > >  * @author Mark Schreiber
> > > >  */
> > > > public class ThreadedAdder {
> > > >    static int processors =
> > > > Runtime.getRuntime().availableProcessors();
> > > >    int bigNumber = 10000000;
> > > >    int[] bigArray = new int[bigNumber * processors];
> > > >        public ThreadedAdder(){
> > > >        //make a big array of integers (10 000 000 numbers for each
> > > > processor)
> > > >        for(int i = 0; i < bigArray.length; i++){
> > > >            //random number between 1 and 100
> > > >            bigArray[i] = (int)(Math.random() * 100.0);
> > > >        }
> > > >    }
> > > >    public void singleThreadedAdd(){
> > > >        int result = 0;
> > > >              //single threaded sum
> > > >        long start = System.currentTimeMillis();
> > > >        for(int number : bigArray){
> > > >            result += number;
> > > >        }
> > > >        long time = System.currentTimeMillis() - start;
> > > >        System.out.println("Calculation time = "+time+" ms");
> > > >        System.out.println("total = "+result);
> > > >            }
> > > >        public void multiThreadedAdd() throws InterruptedException{
> > > >        AtomicInteger total = new AtomicInteger();
> > > >        long start = System.currentTimeMillis();
> > > >        AddingThread[] threads = new AddingThread[processors];
> > > >        for(int i = 0; i < threads.length; i++){
> > > >            threads[i] = new AddingThread("Thread "+i, i * bigNumber,
> > > > total);
> > > >            System.out.println(threads[i].getName()+" starting");
> > > >            threads[i].start();
> > > >        }
> > > >        for(Thread thread : threads){
> > > >            //make sure everyone is finished
> > > >            thread.join();
> > > >        }
> > > >        long time = System.currentTimeMillis() - start;
> > > >        System.out.println("Calculation time = "+time+" ms");
> > > >        System.out.println("total = "+total);
> > > >    }
> > > >        /**
> > > >     * @param args the command line arguments
> > > >     */
> > > >    public static void main(String[] args) throws Exception{
> > > >        //how many processors do I have?
> > > >        System.out.println("Available processors = "+processors);
> > > >        System.out.println("Initializing number array");
> > > >        ThreadedAdder adder = new ThreadedAdder();
> > > >                System.out.println("single thread add");
> > > >        adder.singleThreadedAdd();
> > > >        System.out.println("multi thread add");
> > > >        adder.multiThreadedAdd();
> > > >    }
> > > >    public class AddingThread extends Thread{
> > > >        int internalTotal = 0;
> > > >        int offSet = 0;
> > > >        AtomicInteger callBackTotal;
> > > >                public AddingThread(String name, int offSet,
> > > > AtomicInteger callBackTotal){
> > > >            super(name);
> > > >            this.offSet = offSet;
> > > >            this.callBackTotal = callBackTotal;
> > > >        }
> > > >                @Override
> > > >        public void run(){
> > > >            for(int i = offSet; i < offSet + bigNumber; i++){
> > > >                internalTotal += bigArray[i];
> > > >            }
> > > >            callBackTotal.addAndGet(internalTotal);
> > > >            System.out.println(this.getName()+" complete");
> > > >        }
> > > >    }
> > > > }
> > > >
> > >
> > -----------------------------------------------------------------------
> >
> > Andreas Prlic      Wellcome Trust Sanger Institute
> >                              Hinxton, Cambridge CB10 1SA, UK
> >                              +44 (0) 1223 49 6891
> >
> > -----------------------------------------------------------------------
> >
> >
> >
> >
> >


From markjschreiber at gmail.com  Wed Apr  9 11:54:06 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 9 Apr 2008 19:54:06 +0800
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <47FCA277.2020401@ebi.ac.uk>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>
	<47FC7E3B.9000106@ebi.ac.uk>
	<77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>
	<47FCA277.2020401@ebi.ac.uk>
Message-ID: <93b45ca50804090454j2f0ff061gbf3ddb1a247610@mail.gmail.com>

I'm not too sure which option I prefer, multi-threading by default (ie
all handled by the packages) or stateless immutable classes and
messages that can be multi-threaded.

There are arguments for both.  The former is recommended in a book I
am currently reading on concurrency which was written by the authors
of the java 1.5 concurrency package.  Essentially the classes can be
designed ahead of time to be thread safe and mutability (sometimes a
good thing) can be done with this in mind.

On the other hand stateless and immutable stuff is often safe enough
to put into a thread although _only_ as long as operations are truely
atomic.  Take for example Servlets and stateless Session Beans. They
are pretty thread safe by nescessity (use in app servers) but just
because they are stateless doens't mean you can't accedentally right
one that gives you stale data or a race condition.

In both cases thread safety needs to be designed from the start.

Currently BioJava is neither of these things and I imagine things will
start getting pretty interesting if you try to multi-thread a biojava
program right now.

- Mark

On Wed, Apr 9, 2008 at 7:03 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>
>
> Most the time any kind of farm management software (like LSF & please correct me if I'm wrong) looks at the amount of CPU time a process takes up and the number of threads it detects; not only the number of processes you have in a queue. So a multi-threaded biojava should not pose a problem to these systems. Not to mention with the newer multiple core computers; threaded software is becoming the only way to take full advantage of the available power.
>
> Where you would want to ignore multi-threading is if you are in a queue like LSF and your x number of Java processes all get chucked onto the same machine. Then if you've got so many processor hungry operations all trying to create threads ... well it's not going to behave as optimally as you might hope.
>
> Personally though I'd still air on the side of caution WRT multi-threading and not to have it as part of the default tools but as an Object I can instantiate to do my multi-threading work (so it's a choice at the user's level rather than the framework level). Then using the Java5 executor framework we let users submit work to pools of threads to do their work. Couple this with forcing us to pass around immutable messages between threads/callables (since values shared by threads are probably the number one cause of **** ups) you'll have one heck of a kick-ass scalable framework ;-)
>
> Andy
>
>
>
>
> Andreas Prlic wrote:
>
> > Hi,
> >
> > I like the idea of having support for multiple threads. Only thing is, when running BioJava on our compute farm, I am pretty sure our admins won't be happy if BJ would use more than just a single CPU, unless run on special hardware. As such there should be a BJ wide configuration management, which would allow to determine how many CPUs to be used (and the default could be all of them).
> >
> > Andreas
> >
> >
> > On 9 Apr 2008, at 09:28, Andy Yates wrote:
> >
> >
> > > Lo,
> > >
> > > This is the kind of problem Java7 is attempting to solve with the fork-join framework (which really is a rip-off of Google's MapReduce). There's two ways of looking at thread safety & how to implement it:
> > >
> > > * Packages which could be threaded or want to be threaded are programmed with threading in mind using items from the util.concurrent package to split, queue & work with data points.
> > >
> > > * Packages can be created as required & have data to process passed to them for processing in a stateless manner; much in the same way servlet engines and a lot of web frameworks run
> > >
> > > The first way does mean we can support environments with useful multi-threaded support (no point in threading on a single CPU/core box) from the word go. The second way would require some plumbing on the user's behalf but this would be very easy plumbing; the majority of which we could write (like wrapping things in instances of Callables).
> > >
> > > Anyway my 2p worth :)
> > >
> > > Andy
> > >
> > > Mark Schreiber wrote:
> > >
> > > > Hi -
> > > > I was just playing with threads to see how efficient they are on one of our old 4 CPU IBM servers.  The following fairly naive program splits a large array of numbers and sums them all up.  The multi-threaded version is 2.5 times faster even allowing for thread overhead. The program could be even better if I make more use of the java1.5 concurrent package.
> > > > Similar tasks in biojava would be include training distributions which should see similar performance improvements. Much of the current biojava doesn't make use of threads and worse, requires the developer to manage all the thread safety themselves.
> > > > - Mark
> > > > /*
> > > >  * To change this template, choose Tools | Templates
> > > >  * and open the template in the editor.
> > > >  */
> > > > package concurrent;
> > > > import java.util.concurrent.atomic.AtomicInteger;
> > > > /**
> > > >  * This program demo's the use of threads to sum a large array of integers.
> > > >  * @author Mark Schreiber
> > > >  */
> > > > public class ThreadedAdder {
> > > >    static int processors = Runtime.getRuntime().availableProcessors();
> > > >    int bigNumber = 10000000;
> > > >    int[] bigArray = new int[bigNumber * processors];
> > > >        public ThreadedAdder(){
> > > >        //make a big array of integers (10 000 000 numbers for each processor)
> > > >        for(int i = 0; i < bigArray.length; i++){
> > > >            //random number between 1 and 100
> > > >            bigArray[i] = (int)(Math.random() * 100.0);
> > > >        }
> > > >    }
> > > >    public void singleThreadedAdd(){
> > > >        int result = 0;
> > > >              //single threaded sum
> > > >        long start = System.currentTimeMillis();
> > > >        for(int number : bigArray){
> > > >            result += number;
> > > >        }
> > > >        long time = System.currentTimeMillis() - start;
> > > >        System.out.println("Calculation time = "+time+" ms");
> > > >        System.out.println("total = "+result);
> > > >            }
> > > >        public void multiThreadedAdd() throws InterruptedException{
> > > >        AtomicInteger total = new AtomicInteger();
> > > >        long start = System.currentTimeMillis();
> > > >        AddingThread[] threads = new AddingThread[processors];
> > > >        for(int i = 0; i < threads.length; i++){
> > > >            threads[i] = new AddingThread("Thread "+i, i * bigNumber, total);
> > > >            System.out.println(threads[i].getName()+" starting");
> > > >            threads[i].start();
> > > >        }
> > > >        for(Thread thread : threads){
> > > >            //make sure everyone is finished
> > > >            thread.join();
> > > >        }
> > > >        long time = System.currentTimeMillis() - start;
> > > >        System.out.println("Calculation time = "+time+" ms");
> > > >        System.out.println("total = "+total);
> > > >    }
> > > >        /**
> > > >     * @param args the command line arguments
> > > >     */
> > > >    public static void main(String[] args) throws Exception{
> > > >        //how many processors do I have?
> > > >        System.out.println("Available processors = "+processors);
> > > >        System.out.println("Initializing number array");
> > > >        ThreadedAdder adder = new ThreadedAdder();
> > > >                System.out.println("single thread add");
> > > >        adder.singleThreadedAdd();
> > > >        System.out.println("multi thread add");
> > > >        adder.multiThreadedAdd();
> > > >    }
> > > >    public class AddingThread extends Thread{
> > > >        int internalTotal = 0;
> > > >        int offSet = 0;
> > > >        AtomicInteger callBackTotal;
> > > >                public AddingThread(String name, int offSet, AtomicInteger callBackTotal){
> > > >            super(name);
> > > >            this.offSet = offSet;
> > > >            this.callBackTotal = callBackTotal;
> > > >        }
> > > >                @Override
> > > >        public void run(){
> > > >            for(int i = offSet; i < offSet + bigNumber; i++){
> > > >                internalTotal += bigArray[i];
> > > >            }
> > > >            callBackTotal.addAndGet(internalTotal);
> > > >            System.out.println(this.getName()+" complete");
> > > >        }
> > > >    }
> > > > }
> > > >
> > >
> >
> > -----------------------------------------------------------------------
> >
> > Andreas Prlic      Wellcome Trust Sanger Institute
> >                              Hinxton, Cambridge CB10 1SA, UK
> >                              +44 (0) 1223 49 6891
> >
> > -----------------------------------------------------------------------
> >
> >
> >
> >
> >
>


From markjschreiber at gmail.com  Wed Apr  9 13:12:52 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 9 Apr 2008 21:12:52 +0800
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <93b45ca50804090454j2f0ff061gbf3ddb1a247610@mail.gmail.com>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>
	<47FC7E3B.9000106@ebi.ac.uk>
	<77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>
	<47FCA277.2020401@ebi.ac.uk>
	<93b45ca50804090454j2f0ff061gbf3ddb1a247610@mail.gmail.com>
Message-ID: <93b45ca50804090612x7ba0b3b2jbb8d1e031e030dc4@mail.gmail.com>

> > Personally though I'd still air on the side of caution WRT multi-threading and not to have it as part of the default tools but as an Object I can instantiate to do my multi-threading work (so it's a choice at the user's level rather than the framework level). Then using the Java5 executor framework we let users submit work to pools of threads to do their work. Couple this with forcing us to pass around immutable messages between threads/callables (since values shared by threads are probably the number one cause of **** ups) you'll have one heck of a kick-ass scalable framework ;-)
> >
> > Andy


One area where you could get an interesting mixture of stateless and
synchronized access to a mutable would be threaded parsing of large
sequence files.  In my experience the BioJava parsers are not normally
I/O bound due to all the object building they do.  Given this a
filereader could for example read a feature block and hand it off to a
threaded stateless feature handler which produces a Feature object and
then adds it (synchronized) to the BioJava Sequence that is being
built. As long as I/O doesn't limit then you would get improved
parsing performance.  It would also be a case where the threading
should happen internally as it could be pretty hard to coordinate the
process from the outside.

This also highlights the difference between encapsulation and
immutability. Even if access to variables is controlled by package and
protected setters the class is still mutable (but not by the user).
Immutability can only be achieved by not providing any setter methods
which has obvious severe limitations.  Currently BioJava Sequence
objects have restricted mutability (use of Edit objects) but are
certainly not immutable.

Again messages need not be immutable as long as they have appropriate
locks and or synchronized getters and setters.  Many java frameworks
work best when messages or DTO's are beans (with parameterless
constructors and public getters and setters), being able to use these
is often very desirable. These beans can still be threadsafe if you
code them right.

- Mark


From ayates at ebi.ac.uk  Wed Apr  9 14:00:29 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 09 Apr 2008 15:00:29 +0100
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <93b45ca50804090454j2f0ff061gbf3ddb1a247610@mail.gmail.com>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>	
	<47FC7E3B.9000106@ebi.ac.uk>	
	<77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>	
	<47FCA277.2020401@ebi.ac.uk>
	<93b45ca50804090454j2f0ff061gbf3ddb1a247610@mail.gmail.com>
Message-ID: <47FCCBFD.1030805@ebi.ac.uk>

I admit mutability is a good thing sometimes (and as Java programmers is 
the way we've been taught to work in).

Oh I've triggered more than enough race conditions working with so 
called 'stateless' services assuming too much about how stateless they 
were (or more to the point how stateful I had made them). Anyway yes 
race conditions can occur anywhere in any bit of code but the majority 
of time I see them appearing when 'static' is used.

Yeah I would be worried about someone making a multi-threded app with 
BJ. Not impossible (far from it) but I can imagine a few edge cases 
coming in.

Andy

Mark Schreiber wrote:
> I'm not too sure which option I prefer, multi-threading by default (ie
> all handled by the packages) or stateless immutable classes and
> messages that can be multi-threaded.
> 
> There are arguments for both.  The former is recommended in a book I
> am currently reading on concurrency which was written by the authors
> of the java 1.5 concurrency package.  Essentially the classes can be
> designed ahead of time to be thread safe and mutability (sometimes a
> good thing) can be done with this in mind.
> 
> On the other hand stateless and immutable stuff is often safe enough
> to put into a thread although _only_ as long as operations are truely
> atomic.  Take for example Servlets and stateless Session Beans. They
> are pretty thread safe by nescessity (use in app servers) but just
> because they are stateless doens't mean you can't accedentally right
> one that gives you stale data or a race condition.
> 
> In both cases thread safety needs to be designed from the start.
> 
> Currently BioJava is neither of these things and I imagine things will
> start getting pretty interesting if you try to multi-thread a biojava
> program right now.
> 
> - Mark
> 
> On Wed, Apr 9, 2008 at 7:03 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>>
>> Most the time any kind of farm management software (like LSF & please correct me if I'm wrong) looks at the amount of CPU time a process takes up and the number of threads it detects; not only the number of processes you have in a queue. So a multi-threaded biojava should not pose a problem to these systems. Not to mention with the newer multiple core computers; threaded software is becoming the only way to take full advantage of the available power.
>>
>> Where you would want to ignore multi-threading is if you are in a queue like LSF and your x number of Java processes all get chucked onto the same machine. Then if you've got so many processor hungry operations all trying to create threads ... well it's not going to behave as optimally as you might hope.
>>
>> Personally though I'd still air on the side of caution WRT multi-threading and not to have it as part of the default tools but as an Object I can instantiate to do my multi-threading work (so it's a choice at the user's level rather than the framework level). Then using the Java5 executor framework we let users submit work to pools of threads to do their work. Couple this with forcing us to pass around immutable messages between threads/callables (since values shared by threads are probably the number one cause of **** ups) you'll have one heck of a kick-ass scalable framework ;-)
>>
>> Andy
>>
>>
>>
>>
>> Andreas Prlic wrote:
>>
>>> Hi,
>>>
>>> I like the idea of having support for multiple threads. Only thing is, when running BioJava on our compute farm, I am pretty sure our admins won't be happy if BJ would use more than just a single CPU, unless run on special hardware. As such there should be a BJ wide configuration management, which would allow to determine how many CPUs to be used (and the default could be all of them).
>>>
>>> Andreas
>>>
>>>
>>> On 9 Apr 2008, at 09:28, Andy Yates wrote:
>>>
>>>
>>>> Lo,
>>>>
>>>> This is the kind of problem Java7 is attempting to solve with the fork-join framework (which really is a rip-off of Google's MapReduce). There's two ways of looking at thread safety & how to implement it:
>>>>
>>>> * Packages which could be threaded or want to be threaded are programmed with threading in mind using items from the util.concurrent package to split, queue & work with data points.
>>>>
>>>> * Packages can be created as required & have data to process passed to them for processing in a stateless manner; much in the same way servlet engines and a lot of web frameworks run
>>>>
>>>> The first way does mean we can support environments with useful multi-threaded support (no point in threading on a single CPU/core box) from the word go. The second way would require some plumbing on the user's behalf but this would be very easy plumbing; the majority of which we could write (like wrapping things in instances of Callables).
>>>>
>>>> Anyway my 2p worth :)
>>>>
>>>> Andy
>>>>
>>>> Mark Schreiber wrote:
>>>>
>>>>> Hi -
>>>>> I was just playing with threads to see how efficient they are on one of our old 4 CPU IBM servers.  The following fairly naive program splits a large array of numbers and sums them all up.  The multi-threaded version is 2.5 times faster even allowing for thread overhead. The program could be even better if I make more use of the java1.5 concurrent package.
>>>>> Similar tasks in biojava would be include training distributions which should see similar performance improvements. Much of the current biojava doesn't make use of threads and worse, requires the developer to manage all the thread safety themselves.
>>>>> - Mark
>>>>> /*
>>>>>  * To change this template, choose Tools | Templates
>>>>>  * and open the template in the editor.
>>>>>  */
>>>>> package concurrent;
>>>>> import java.util.concurrent.atomic.AtomicInteger;
>>>>> /**
>>>>>  * This program demo's the use of threads to sum a large array of integers.
>>>>>  * @author Mark Schreiber
>>>>>  */
>>>>> public class ThreadedAdder {
>>>>>    static int processors = Runtime.getRuntime().availableProcessors();
>>>>>    int bigNumber = 10000000;
>>>>>    int[] bigArray = new int[bigNumber * processors];
>>>>>        public ThreadedAdder(){
>>>>>        //make a big array of integers (10 000 000 numbers for each processor)
>>>>>        for(int i = 0; i < bigArray.length; i++){
>>>>>            //random number between 1 and 100
>>>>>            bigArray[i] = (int)(Math.random() * 100.0);
>>>>>        }
>>>>>    }
>>>>>    public void singleThreadedAdd(){
>>>>>        int result = 0;
>>>>>              //single threaded sum
>>>>>        long start = System.currentTimeMillis();
>>>>>        for(int number : bigArray){
>>>>>            result += number;
>>>>>        }
>>>>>        long time = System.currentTimeMillis() - start;
>>>>>        System.out.println("Calculation time = "+time+" ms");
>>>>>        System.out.println("total = "+result);
>>>>>            }
>>>>>        public void multiThreadedAdd() throws InterruptedException{
>>>>>        AtomicInteger total = new AtomicInteger();
>>>>>        long start = System.currentTimeMillis();
>>>>>        AddingThread[] threads = new AddingThread[processors];
>>>>>        for(int i = 0; i < threads.length; i++){
>>>>>            threads[i] = new AddingThread("Thread "+i, i * bigNumber, total);
>>>>>            System.out.println(threads[i].getName()+" starting");
>>>>>            threads[i].start();
>>>>>        }
>>>>>        for(Thread thread : threads){
>>>>>            //make sure everyone is finished
>>>>>            thread.join();
>>>>>        }
>>>>>        long time = System.currentTimeMillis() - start;
>>>>>        System.out.println("Calculation time = "+time+" ms");
>>>>>        System.out.println("total = "+total);
>>>>>    }
>>>>>        /**
>>>>>     * @param args the command line arguments
>>>>>     */
>>>>>    public static void main(String[] args) throws Exception{
>>>>>        //how many processors do I have?
>>>>>        System.out.println("Available processors = "+processors);
>>>>>        System.out.println("Initializing number array");
>>>>>        ThreadedAdder adder = new ThreadedAdder();
>>>>>                System.out.println("single thread add");
>>>>>        adder.singleThreadedAdd();
>>>>>        System.out.println("multi thread add");
>>>>>        adder.multiThreadedAdd();
>>>>>    }
>>>>>    public class AddingThread extends Thread{
>>>>>        int internalTotal = 0;
>>>>>        int offSet = 0;
>>>>>        AtomicInteger callBackTotal;
>>>>>                public AddingThread(String name, int offSet, AtomicInteger callBackTotal){
>>>>>            super(name);
>>>>>            this.offSet = offSet;
>>>>>            this.callBackTotal = callBackTotal;
>>>>>        }
>>>>>                @Override
>>>>>        public void run(){
>>>>>            for(int i = offSet; i < offSet + bigNumber; i++){
>>>>>                internalTotal += bigArray[i];
>>>>>            }
>>>>>            callBackTotal.addAndGet(internalTotal);
>>>>>            System.out.println(this.getName()+" complete");
>>>>>        }
>>>>>    }
>>>>> }
>>>>>
>>> -----------------------------------------------------------------------
>>>
>>> Andreas Prlic      Wellcome Trust Sanger Institute
>>>                              Hinxton, Cambridge CB10 1SA, UK
>>>                              +44 (0) 1223 49 6891
>>>
>>> -----------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>


From ayates at ebi.ac.uk  Wed Apr  9 14:09:33 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 09 Apr 2008 15:09:33 +0100
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <93b45ca50804090612x7ba0b3b2jbb8d1e031e030dc4@mail.gmail.com>
References: <93b45ca50804090051h42632f43u6b977574c54853c7@mail.gmail.com>	
	<47FC7E3B.9000106@ebi.ac.uk>	
	<77FF0938-D653-490F-8933-B34306068727@sanger.ac.uk>	
	<47FCA277.2020401@ebi.ac.uk>	
	<93b45ca50804090454j2f0ff061gbf3ddb1a247610@mail.gmail.com>
	<93b45ca50804090612x7ba0b3b2jbb8d1e031e030dc4@mail.gmail.com>
Message-ID: <47FCCE1D.8050107@ebi.ac.uk>

That is an interesting bit of usage. You could queue the events out from 
the feature builders into the thread/callable which constructs the final 
Sequence object quite easily. Yeah very very true :)

The majority of objects are mutable in BJ I think. I'm not saying this 
is a bad thing nor suggesting everything needs to be immutable :). It's 
more about making sure only one thread is working on one object at a 
given point in the program. If there are going to be mutable objects 
hanging around then Queues are probably the best way to work with them.

Andy

> 
> One area where you could get an interesting mixture of stateless and 
> synchronized access to a mutable would be threaded parsing of large 
> sequence files.  In my experience the BioJava parsers are not
> normally I/O bound due to all the object building they do.  Given
> this a filereader could for example read a feature block and hand it
> off to a threaded stateless feature handler which produces a Feature
> object and then adds it (synchronized) to the BioJava Sequence that
> is being built. As long as I/O doesn't limit then you would get
> improved parsing performance.  It would also be a case where the
> threading should happen internally as it could be pretty hard to
> coordinate the process from the outside.
> 
> This also highlights the difference between encapsulation and 
> immutability. Even if access to variables is controlled by package
> and protected setters the class is still mutable (but not by the
> user). Immutability can only be achieved by not providing any setter
> methods which has obvious severe limitations.  Currently BioJava
> Sequence objects have restricted mutability (use of Edit objects) but
> are certainly not immutable.
> 
> Again messages need not be immutable as long as they have appropriate
>  locks and or synchronized getters and setters.  Many java frameworks
>  work best when messages or DTO's are beans (with parameterless 
> constructors and public getters and setters), being able to use these
>  is often very desirable. These beans can still be threadsafe if you 
> code them right.
> 
> - Mark


From heuermh at acm.org  Wed Apr  9 16:34:40 2008
From: heuermh at acm.org (Michael Heuer)
Date: Wed, 9 Apr 2008 12:34:40 -0400 (EDT)
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <47FCCE1D.8050107@ebi.ac.uk>
Message-ID: <Pine.GSO.4.44.0804091148510.10808-100000@shell3.shore.net>

On Wed, 9 Apr 2008, Andy Yates wrote:

> That is an interesting bit of usage. You could queue the events out from
> the feature builders into the thread/callable which constructs the final
> Sequence object quite easily. Yeah very very true :)
>
> The majority of objects are mutable in BJ I think. I'm not saying this
> is a bad thing nor suggesting everything needs to be immutable :). It's
> more about making sure only one thread is working on one object at a
> given point in the program. If there are going to be mutable objects
> hanging around then Queues are probably the best way to work with them.

I am going to crib directly from the book I think Mark was referring to
earlier:

 - It's the mutable state, stupid

  All concurrency issues boil down to coordinating access to mutable
state.  The less mutable state, the easier it is to ensure thread safety.

 - Make fields final unless they need to be mutable

 - Immutable objects are automatically thread-safe

  Immutable objects simplify concurrent programming tremendously.  They
are simper and safer, and can be shared freely without locking or
defensive copying.

"Java Concurrency in Practice", Goetz et al., 2006, p110.
http://www.javaconcurrencyinpractice.com/


The Immutable with Copy Mutators pattern provides "setter"-like methods
that return copies of the immutable object:

  /**
   * Return a copy of this foo with the bar set to <code>bar</code>.
   *
   * <p>Foo is immutable, so there are no set methods.  Instead, this
   * method returns a new instance of Foo copied from <code>this</code>
   * with the value of bar changed.</p>
   *
   * @param bar bar for the copy of this foo
   * @return a copy of this fo with the bar set to <code>bar</code>
   */
  public Foo withBar(final Bar bar)
  {
    Foo copy = new Foo(..., bar);
    return copy;
  }

This is used in JodaTime, JSR-310, and elsewhere.  I have a template I use
to generate classes in this style at

http://tinyurl.com/6n2nhp


> > Mark Schreiber wrote:
> > One area where you could get an interesting mixture of stateless and
> > synchronized access to a mutable would be threaded parsing of large
> > sequence files.  In my experience the BioJava parsers are not
> > normally I/O bound due to all the object building they do.  Given
> > this a filereader could for example read a feature block and hand it
> > off to a threaded stateless feature handler which produces a Feature
> > object and then adds it (synchronized) to the BioJava Sequence that
> > is being built. As long as I/O doesn't limit then you would get
> > improved parsing performance.  It would also be a case where the
> > threading should happen internally as it could be pretty hard to
> > coordinate the process from the outside.
> >
> > This also highlights the difference between encapsulation and
> > immutability. Even if access to variables is controlled by package
> > and protected setters the class is still mutable (but not by the
> > user). Immutability can only be achieved by not providing any setter
> > methods which has obvious severe limitations.  Currently BioJava
> > Sequence objects have restricted mutability (use of Edit objects) but
> > are certainly not immutable.
> >
> > Again messages need not be immutable as long as they have appropriate
> >  locks and or synchronized getters and setters.  Many java frameworks
> >  work best when messages or DTO's are beans (with parameterless
> > constructors and public getters and setters), being able to use these
> >  is often very desirable. These beans can still be threadsafe if you
> > code them right.

What might that look like?

I have to think in most cases (DTOs, form beans, etc) are safe only
because the container is managing the lifecycle of those beans.


Perhaps we might want to copy some of this discussion to

http://biojava.org/wiki/Talk:BioJava3_Design

or a new page about concurrency issues when we are finished.

   michael


From ayates at ebi.ac.uk  Thu Apr 10 08:36:41 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Thu, 10 Apr 2008 09:36:41 +0100
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <Pine.GSO.4.44.0804091148510.10808-100000@shell3.shore.net>
References: <Pine.GSO.4.44.0804091148510.10808-100000@shell3.shore.net>
Message-ID: <47FDD199.4010606@ebi.ac.uk>

All of that looks very reasonable to me; I really should get round to 
reading that book soon :). The only thing that worries me about the 
constructor copy is object churn but as far as I'm aware that is a worry 
from the older days of Java & doesn't hold up with the later VMs.

It seems as we have two use-cases for concurrency in the 'newer' biojava:

* Using concurrency to speed up a process which is not CPU limited & is 
part of the core API

* Using concurrency to speed up a process which is CPU limited but can 
be sped up on machines with more that one core

Each scenario needs a different way of 'triggering' the concurrency. The 
first as people have said some kind of System property might be a good 
way to either enable multiple threads or disable it completely; this 
also needs to be designed with good concurrent practice in mind from the 
start. The second way is by user intention i.e. they use the 
multi-threaded pyhlogenetics package.

Does that sound okay?

Andy

Michael Heuer wrote:
> On Wed, 9 Apr 2008, Andy Yates wrote:
> 
>> That is an interesting bit of usage. You could queue the events out from
>> the feature builders into the thread/callable which constructs the final
>> Sequence object quite easily. Yeah very very true :)
>>
>> The majority of objects are mutable in BJ I think. I'm not saying this
>> is a bad thing nor suggesting everything needs to be immutable :). It's
>> more about making sure only one thread is working on one object at a
>> given point in the program. If there are going to be mutable objects
>> hanging around then Queues are probably the best way to work with them.
> 
> I am going to crib directly from the book I think Mark was referring to
> earlier:
> 
>  - It's the mutable state, stupid
> 
>   All concurrency issues boil down to coordinating access to mutable
> state.  The less mutable state, the easier it is to ensure thread safety.
> 
>  - Make fields final unless they need to be mutable
> 
>  - Immutable objects are automatically thread-safe
> 
>   Immutable objects simplify concurrent programming tremendously.  They
> are simper and safer, and can be shared freely without locking or
> defensive copying.
> 
> "Java Concurrency in Practice", Goetz et al., 2006, p110.
> http://www.javaconcurrencyinpractice.com/
> 
> 
> The Immutable with Copy Mutators pattern provides "setter"-like methods
> that return copies of the immutable object:
> 
>   /**
>    * Return a copy of this foo with the bar set to <code>bar</code>.
>    *
>    * <p>Foo is immutable, so there are no set methods.  Instead, this
>    * method returns a new instance of Foo copied from <code>this</code>
>    * with the value of bar changed.</p>
>    *
>    * @param bar bar for the copy of this foo
>    * @return a copy of this fo with the bar set to <code>bar</code>
>    */
>   public Foo withBar(final Bar bar)
>   {
>     Foo copy = new Foo(..., bar);
>     return copy;
>   }
> 
> This is used in JodaTime, JSR-310, and elsewhere.  I have a template I use
> to generate classes in this style at
> 
> http://tinyurl.com/6n2nhp
> 
> 
>>> Mark Schreiber wrote:
>>> One area where you could get an interesting mixture of stateless and
>>> synchronized access to a mutable would be threaded parsing of large
>>> sequence files.  In my experience the BioJava parsers are not
>>> normally I/O bound due to all the object building they do.  Given
>>> this a filereader could for example read a feature block and hand it
>>> off to a threaded stateless feature handler which produces a Feature
>>> object and then adds it (synchronized) to the BioJava Sequence that
>>> is being built. As long as I/O doesn't limit then you would get
>>> improved parsing performance.  It would also be a case where the
>>> threading should happen internally as it could be pretty hard to
>>> coordinate the process from the outside.
>>>
>>> This also highlights the difference between encapsulation and
>>> immutability. Even if access to variables is controlled by package
>>> and protected setters the class is still mutable (but not by the
>>> user). Immutability can only be achieved by not providing any setter
>>> methods which has obvious severe limitations.  Currently BioJava
>>> Sequence objects have restricted mutability (use of Edit objects) but
>>> are certainly not immutable.
>>>
>>> Again messages need not be immutable as long as they have appropriate
>>>  locks and or synchronized getters and setters.  Many java frameworks
>>>  work best when messages or DTO's are beans (with parameterless
>>> constructors and public getters and setters), being able to use these
>>>  is often very desirable. These beans can still be threadsafe if you
>>> code them right.
> 
> What might that look like?
> 
> I have to think in most cases (DTOs, form beans, etc) are safe only
> because the container is managing the lifecycle of those beans.
> 
> 
> Perhaps we might want to copy some of this discussion to
> 
> http://biojava.org/wiki/Talk:BioJava3_Design
> 
> or a new page about concurrency issues when we are finished.
> 
>    michael


From markjschreiber at gmail.com  Thu Apr 10 11:40:44 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Thu, 10 Apr 2008 19:40:44 +0800
Subject: [Biojava-dev] Why BJ3 should be multithreaded
In-Reply-To: <47FDD199.4010606@ebi.ac.uk>
References: <Pine.GSO.4.44.0804091148510.10808-100000@shell3.shore.net>
	<47FDD199.4010606@ebi.ac.uk>
Message-ID: <93b45ca50804100440u5afacfa0o650ed162aef6a9c1@mail.gmail.com>

> * Using concurrency to speed up a process which is not CPU limited & is part
> of the core API
>

Do you have a specific example in mind? Something blocking that needs
to be non-blocking? The parseing example could be one (as i/o blocks
during parsing) but I think it actually might be CPU limited as well.


> * Using concurrency to speed up a process which is CPU limited but can be
> sped up on machines with more that one core
>

Yes. Seems almost ever modern machine is dual core nowadays, we should
take advantage of this.

> Each scenario needs a different way of 'triggering' the concurrency. The
> first as people have said some kind of System property might be a good way
> to either enable multiple threads or disable it completely; this also needs
> to be designed with good concurrent practice in mind from the start. The

It would be good to make it configurable via the presence of a
properties file or similar. Default could be to use all available
processors, which can be determined from the Runtime object. This
approach would let users control how much of their machines grunt is
used for heavy lifting.

This approach would also allow users to test and tune for any
installation. In recent tests I have noticed that a task has to be
reasonably expensive to be worth spawning more threads (to get a
quicker run time). The definition of expensive really depends on the
machine. One task on an old linux 4 CPU machine got a 2 fold speed up
by using all CPUs. The exact same task on a new dual core laptop
actually slowed down as the thread spawning was slower than the
calculation. A much harder calculation on this machine did improve
with threading.  Control of this via a property would let you set the
appropriate strategy on any deployment.

> second way is by user intention i.e. they use the multi-threaded
> pyhlogenetics package.
>

Some packages should be threaded even if there is only one processor
to prevent blocking. For example parsing should spawn at least one
thread that is seperate from the i/o thread even on a single CPU
system. Much as swing is threaded to prevent GUI blocking.

- Mark


> Does that sound okay?
>
> Andy
>
>
>
> Michael Heuer wrote:
> > On Wed, 9 Apr 2008, Andy Yates wrote:
> >
> >
> > > That is an interesting bit of usage. You could queue the events out from
> > > the feature builders into the thread/callable which constructs the final
> > > Sequence object quite easily. Yeah very very true :)
> > >
> > > The majority of objects are mutable in BJ I think. I'm not saying this
> > > is a bad thing nor suggesting everything needs to be immutable :). It's
> > > more about making sure only one thread is working on one object at a
> > > given point in the program. If there are going to be mutable objects
> > > hanging around then Queues are probably the best way to work with them.
> > >
> >
> > I am going to crib directly from the book I think Mark was referring to
> > earlier:
> >
> >  - It's the mutable state, stupid
> >
> >  All concurrency issues boil down to coordinating access to mutable
> > state.  The less mutable state, the easier it is to ensure thread safety.
> >
> >  - Make fields final unless they need to be mutable
> >
> >  - Immutable objects are automatically thread-safe
> >
> >  Immutable objects simplify concurrent programming tremendously.  They
> > are simper and safer, and can be shared freely without locking or
> > defensive copying.
> >
> > "Java Concurrency in Practice", Goetz et al., 2006, p110.
> > http://www.javaconcurrencyinpractice.com/
> >
> >
> > The Immutable with Copy Mutators pattern provides "setter"-like methods
> > that return copies of the immutable object:
> >
> >  /**
> >   * Return a copy of this foo with the bar set to <code>bar</code>.
> >   *
> >   * <p>Foo is immutable, so there are no set methods.  Instead, this
> >   * method returns a new instance of Foo copied from <code>this</code>
> >   * with the value of bar changed.</p>
> >   *
> >   * @param bar bar for the copy of this foo
> >   * @return a copy of this fo with the bar set to <code>bar</code>
> >   */
> >  public Foo withBar(final Bar bar)
> >  {
> >    Foo copy = new Foo(..., bar);
> >    return copy;
> >  }
> >
> > This is used in JodaTime, JSR-310, and elsewhere.  I have a template I use
> > to generate classes in this style at
> >
> > http://tinyurl.com/6n2nhp
> >
> >
> >
> > >
> > > > Mark Schreiber wrote:
> > > > One area where you could get an interesting mixture of stateless and
> > > > synchronized access to a mutable would be threaded parsing of large
> > > > sequence files.  In my experience the BioJava parsers are not
> > > > normally I/O bound due to all the object building they do.  Given
> > > > this a filereader could for example read a feature block and hand it
> > > > off to a threaded stateless feature handler which produces a Feature
> > > > object and then adds it (synchronized) to the BioJava Sequence that
> > > > is being built. As long as I/O doesn't limit then you would get
> > > > improved parsing performance.  It would also be a case where the
> > > > threading should happen internally as it could be pretty hard to
> > > > coordinate the process from the outside.
> > > >
> > > > This also highlights the difference between encapsulation and
> > > > immutability. Even if access to variables is controlled by package
> > > > and protected setters the class is still mutable (but not by the
> > > > user). Immutability can only be achieved by not providing any setter
> > > > methods which has obvious severe limitations.  Currently BioJava
> > > > Sequence objects have restricted mutability (use of Edit objects) but
> > > > are certainly not immutable.
> > > >
> > > > Again messages need not be immutable as long as they have appropriate
> > > >  locks and or synchronized getters and setters.  Many java frameworks
> > > >  work best when messages or DTO's are beans (with parameterless
> > > > constructors and public getters and setters), being able to use these
> > > >  is often very desirable. These beans can still be threadsafe if you
> > > > code them right.
> > > >
> > >
> >
> > What might that look like?
> >
> > I have to think in most cases (DTOs, form beans, etc) are safe only
> > because the container is managing the lifecycle of those beans.
> >
> >
> > Perhaps we might want to copy some of this discussion to
> >
> > http://biojava.org/wiki/Talk:BioJava3_Design
> >
> > or a new page about concurrency issues when we are finished.
> >
> >   michael
> >
>


From ap3 at sanger.ac.uk  Sun Apr 13 18:02:41 2008
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Sun, 13 Apr 2008 19:02:41 +0100
Subject: [Biojava-dev] biojava 1.6 released
Message-ID: <0A060667-C24C-4D41-8D10-ED1D449A5F62@sanger.ac.uk>


Biojava 1.6 has been released and is available from http:// 
biojava.org/wiki/BioJava:Download

Biojava 1.6 offers more functionality and stability over the previous  
official releases. BioJava now depends on Java 1.5+. We highly  
recommend you to upgrade as soon as possible.

In detail, the phylo package org.biojavax.bio.phylo was improved and  
expanded by our GSOC'07 student Boh-Yun Lee. It now contains fully- 
functional Nexus and Phylip parsers, and tools for calculating UPGMA  
and Neighbour Joining, Jukes-Kantor and Kimura Two Parameter, and MP.  
It uses JGraphT to represent parsed trees.

The PDB file parser was improved by Jules Jacobsen for better dealing  
with PDB header records. Andreas Draeger provided several patches for  
improving the Genetic Algorithm modules. Additionally this release  
contains numerous bug fixes and documentation improvements.

Thanks to the entire biojava community for making this possible!

Happy Biojava-ing,

Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
                               +44 (0) 1223 49 6891

-----------------------------------------------------------------------


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From darin.london at duke.edu  Tue Apr 29 16:48:33 2008
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Tue, 29 Apr 2008 12:48:33 -0400
Subject: [Biojava-dev] BOSC 2008 Announcement and Call For Submissions
Message-ID: <200804291648.m3TGmXk7020802@tenero.duhs.duke.edu>


BOSC 2008 Call for Abstracts Reminder

The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008).

This is a reminder to submit your proposals for talks to the BOSC submission system before May 11.

Submission Process:
All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php).
The form will ask for a small Abstract Text to be pasted into it, and a full paper.  The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details)
Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom.  The full-length abstract should include the title, authors, and affiliations.  We prefer your abstract to be in PDF format, although plain t

Important Dates:
May 11: Abstract submission deadline.
June 2: Notification of accepted talks.
June 4: Early registration discount cut-off.
July 18-19: BOSC 2008!

We hope to see you at BOSC 2008!

Kam Dahlquist and Darin London
BOSC 2008 Co-organizers

			 
From ap3 at sanger.ac.uk  Wed Apr 30 10:49:21 2008
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Wed, 30 Apr 2008 11:49:21 +0100
Subject: [Biojava-dev] new uniprot file format
Message-ID: <00FA5524-C0B6-4293-84B8-496934B56398@sanger.ac.uk>

Hi,

There is a change in the uniprot file format coming up beginning of July

http://ca.expasy.org/sprot/relnotes/sp_soon.html

Having had a quick look at the code I think we will need a patch to  
allow access to the EC numbers and other sub-category data...

Cheers,
Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
                               +44 (0) 1223 49 6891

-----------------------------------------------------------------------


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From thpar at psb.ugent.be  Mon Apr 14 09:28:03 2008
From: thpar at psb.ugent.be (Thomas Van Parys)
Date: Mon, 14 Apr 2008 09:28:03 -0000
Subject: [Biojava-dev] [Biojava-l] biojava 1.6 released
In-Reply-To: <0A060667-C24C-4D41-8D10-ED1D449A5F62@sanger.ac.uk>
References: <0A060667-C24C-4D41-8D10-ED1D449A5F62@sanger.ac.uk>
Message-ID: <48032123.7010803@psb.ugent.be>

Andreas Prlic schreef:
> 
> Biojava 1.6 has been released and is available from 
> http://biojava.org/wiki/BioJava:Download
> 

Hi,

Thanks for the new release, but is there any chance that there's 
something wrong with the download?
Firefox hangs when trying to download and wget gives me a jar file that 
doesn't contain the source code.

http://www.biojava.org/download/bj16/all/biojava-1.6-all.jar


regards,
Thomas

-- 
==================================================================
Thomas Van Parys
Tel:+32 (0)9 331 36 95                        fax:+32 (0)9 3313809
VIB Department of Plant Systems Biology, Ghent University
Technologiepark 927, 9052 Gent, BELGIUM
thomas.vanparys at psb.ugent.be    http://bioinformatics.psb.ugent.be
==================================================================


From Stefan.Pinkernell at awi.de  Mon Apr 14 11:06:02 2008
From: Stefan.Pinkernell at awi.de (Stefan Pinkernell)
Date: Mon, 14 Apr 2008 11:06:02 -0000
Subject: [Biojava-dev] biojava 1.6 released
In-Reply-To: <0A060667-C24C-4D41-8D10-ED1D449A5F62@sanger.ac.uk>
References: <0A060667-C24C-4D41-8D10-ED1D449A5F62@sanger.ac.uk>
Message-ID: <48033501.4050804@awi.de>

Dear all,
I just loaded the new Biojava 1.6 package (biojava-all.jar) but it seems 
the sources are missing. Where can I find them?

Best regards,

   Stefan

Andreas Prlic schrieb:
>
> Biojava 1.6 has been released and is available from 
> http://biojava.org/wiki/BioJava:Download
>