[Biojava-dev] Why BJ3 should be multithreaded
Andy Yates
ayates at ebi.ac.uk
Thu Apr 10 08:36:41 UTC 2008
All of that looks very reasonable to me; I really should get round to
reading that book soon :). The only thing that worries me about the
constructor copy is object churn but as far as I'm aware that is a worry
from the older days of Java & doesn't hold up with the later VMs.
It seems as we have two use-cases for concurrency in the 'newer' biojava:
* Using concurrency to speed up a process which is not CPU limited & is
part of the core API
* Using concurrency to speed up a process which is CPU limited but can
be sped up on machines with more that one core
Each scenario needs a different way of 'triggering' the concurrency. The
first as people have said some kind of System property might be a good
way to either enable multiple threads or disable it completely; this
also needs to be designed with good concurrent practice in mind from the
start. The second way is by user intention i.e. they use the
multi-threaded pyhlogenetics package.
Does that sound okay?
Andy
Michael Heuer wrote:
> On Wed, 9 Apr 2008, Andy Yates wrote:
>
>> That is an interesting bit of usage. You could queue the events out from
>> the feature builders into the thread/callable which constructs the final
>> Sequence object quite easily. Yeah very very true :)
>>
>> The majority of objects are mutable in BJ I think. I'm not saying this
>> is a bad thing nor suggesting everything needs to be immutable :). It's
>> more about making sure only one thread is working on one object at a
>> given point in the program. If there are going to be mutable objects
>> hanging around then Queues are probably the best way to work with them.
>
> I am going to crib directly from the book I think Mark was referring to
> earlier:
>
> - It's the mutable state, stupid
>
> All concurrency issues boil down to coordinating access to mutable
> state. The less mutable state, the easier it is to ensure thread safety.
>
> - Make fields final unless they need to be mutable
>
> - Immutable objects are automatically thread-safe
>
> Immutable objects simplify concurrent programming tremendously. They
> are simper and safer, and can be shared freely without locking or
> defensive copying.
>
> "Java Concurrency in Practice", Goetz et al., 2006, p110.
> http://www.javaconcurrencyinpractice.com/
>
>
> The Immutable with Copy Mutators pattern provides "setter"-like methods
> that return copies of the immutable object:
>
> /**
> * Return a copy of this foo with the bar set to <code>bar</code>.
> *
> * <p>Foo is immutable, so there are no set methods. Instead, this
> * method returns a new instance of Foo copied from <code>this</code>
> * with the value of bar changed.</p>
> *
> * @param bar bar for the copy of this foo
> * @return a copy of this fo with the bar set to <code>bar</code>
> */
> public Foo withBar(final Bar bar)
> {
> Foo copy = new Foo(..., bar);
> return copy;
> }
>
> This is used in JodaTime, JSR-310, and elsewhere. I have a template I use
> to generate classes in this style at
>
> http://tinyurl.com/6n2nhp
>
>
>>> Mark Schreiber wrote:
>>> One area where you could get an interesting mixture of stateless and
>>> synchronized access to a mutable would be threaded parsing of large
>>> sequence files. In my experience the BioJava parsers are not
>>> normally I/O bound due to all the object building they do. Given
>>> this a filereader could for example read a feature block and hand it
>>> off to a threaded stateless feature handler which produces a Feature
>>> object and then adds it (synchronized) to the BioJava Sequence that
>>> is being built. As long as I/O doesn't limit then you would get
>>> improved parsing performance. It would also be a case where the
>>> threading should happen internally as it could be pretty hard to
>>> coordinate the process from the outside.
>>>
>>> This also highlights the difference between encapsulation and
>>> immutability. Even if access to variables is controlled by package
>>> and protected setters the class is still mutable (but not by the
>>> user). Immutability can only be achieved by not providing any setter
>>> methods which has obvious severe limitations. Currently BioJava
>>> Sequence objects have restricted mutability (use of Edit objects) but
>>> are certainly not immutable.
>>>
>>> Again messages need not be immutable as long as they have appropriate
>>> locks and or synchronized getters and setters. Many java frameworks
>>> work best when messages or DTO's are beans (with parameterless
>>> constructors and public getters and setters), being able to use these
>>> is often very desirable. These beans can still be threadsafe if you
>>> code them right.
>
> What might that look like?
>
> I have to think in most cases (DTOs, form beans, etc) are safe only
> because the container is managing the lifecycle of those beans.
>
>
> Perhaps we might want to copy some of this discussion to
>
> http://biojava.org/wiki/Talk:BioJava3_Design
>
> or a new page about concurrency issues when we are finished.
>
> michael
More information about the biojava-dev
mailing list