[Biojava-dev] Plans for next biojava release - modularization

Andy Yates ayates at ebi.ac.uk
Tue May 12 08:27:52 UTC 2009


I agree with Mark.

Later versions of the Java environment will make concurrent programming
easier not to mention languages already available on the VM (Scala &
Clojure) that make it very easy indeed. Our goal in biojava must be to
write code which will behave well in one of these environments.

I don't want us to fall into the trap of earlier biojava where things
like own implementations of database connection pooling data sources
(sorry I don't mean to pick on any one part of the code but it
highlights very well what we should avoid). We're
bioinformaticians/engineers; lets do what we do best and work well
within our chosen field. Let other people like Doug Lea deal with the
pain that is concurrent programming & the alike :)

Andy

mark.schreiber at novartis.com wrote:
> Hi -
> 
> This was one thing we discussed previously with respect to biojava 3. 
> Generally I support the idea because almost all computers are now 
> multi-core and as you say cloud or utility computing is already a reality.
> 
> However, I tend to think that biojava should not control threading or 
> concurrency. This should be done by the developer. This is because 
> sometimes mutithreading can be fast on a slow computer but slow on a fast 
> computer (due to the overhead in spawning threads) so programs need to be 
> tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc 
> don't like people attempting to control their own threads.  What BioJava 
> should do is expose granular and thread-safe operations that can be 
> threaded or form discrete tasks on a utility grid or complete in 
> SessionBeans on an App server.  For example it would be better if BioJava 
> had a single threaded method to calculate the GC of a single sequence 
> rather than a multi-threaded method that calculates the GC of multiple 
> sequences.  This would let the developer make a multithreaded version if 
> desired or distribute multiple tasks based on the single threaded version 
> to a compute cloud (and let the cloud manage all the tasks).
> 
> Possibly the best situation would be to have the single threaded fine 
> grain operations that let developers or grid engines control threading and 
> then higher level APIs that do it for you (or good cookbook examples that 
> show you how to do it).  Another idea that was discussed was the use of 
> properties files to allow people to set how many CPUs they wanted to make 
> available to the JVM or name packages that can or cannot use threading.
> 
> Finally, there are lots of times when it is highly desirable to use Java 
> beans because they play well with dozens of Java api's however beans don't 
> work well with threads because they have public setter methods.  I would 
> like to see a lot more bean use in a future BioJava because it would make 
> life so much easier but a lot of care would need to be taken to make sure 
> thread safety is preserved.  There are many patterns that can be used such 
> as synchronization locks etc to make things thread safe so I think this 
> can be achieved as long as we are disciplined and consider that all 
> methods may be used in a multi-threaded application (even if we write the 
> method as a single thread).  If there are code checkers that make 
> suggestions on thread safety it would be great to have these as part of 
> the standard build process.  Good documentation would go a long way as 
> well.  Are there unit test patterns that can catch these problems as well? 
>  Suggestions would be great.
> 
> Progress Listener patterns are good but it depends on the situation and 
> might be better handled in high level APIs or left to the developer.  For 
> example in your NJ code a progress listener would be good if someone fed 
> 1000 sequences into the method but not if they only put in 10. Also code 
> running on an old machine might need a progress listener but the same 
> problem on a new machine may complete almost instantly.  Probably a 
> pluggable listener would be the way to go.  Also it might be possible to 
> do this using the new JDK APIs that let you take a peek at the stack 
> trace. Even if your NJ method didn't allow for a progress listener a 
> developer could still make one by looking at the method calls in the 
> stack. As long as your NJ method called other methods internally for each 
> sequence (quite likely) it would be possible to observe the cycle of 
> method calls from the stack.  This might make it possible to have a very 
> general BioJava progress listener that can be told to count the number of 
> times a method is called in the stack. The name of the method would be the 
> argument.  If the application runs in a Java App server you can also do 
> this very easily with a method Interceptor.
> 
> - Mark
> 
> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
> 
>> Andreas
>>
>> Another theme that should be considered is providing a multi-thread
>> version of any module with long run time. This would have a couple
>> elements. A progress listener interface should be standard where core
>> code would update progress messages to listeners that can be used by
>> external code to display feedback to the user. I did this with the
>> Neighbor Joining code for tree construction and it provides needed
>> feedback in a GUI. If not the user gets frustrated because they don't
>> know the code they are about to execute may take 10 minutes or 8 hours
>> to complete and they think the software is not working. The reverse is
>> also true for canceling an operation where you want to have core code
>> stop processing a long running loop. Once the code has completed then
>> the listener interface for process complete is called allowing the next
>> step in the external code to continue. The developer would have the
>> choice to call the "process" method or run it in a thread and wait for
>> the callback complete method to be called. 
>>
>> This is the first step in the ability to have the core/long running
>> processes take advantage of multiple threads to complete the
>> computational task faster. Not all code can be parallelized easily but
>> if the algorithm can take advantage of running in parallel then it
>> should. This then opens up a couple of cloud computing frameworks that
>> extend the multi-threaded concepts in Java across a cluster
>> http://www.terracotta.org/. If we put an emphasis on having code that
>> runs well in a thread we are one step closer to an architecture that can
>> run in a cloud. The computational problems are only going to get bigger
>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>> computational IO cycles are going to be cheap as long as the
>> software/libraries can easily take advantage of it.
>>
>> Thanks
>>
>> Scooter
>>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>> Prlic
>> Sent: Monday, May 11, 2009 12:27 AM
>> To: biojava-dev
>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release.  I  would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release.  Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> _________________________
> 
> CONFIDENTIALITY NOTICE
> 
> The information contained in this e-mail message is intended only for the 
> exclusive use of the individual or entity named above and may contain 
> information that is privileged, confidential or exempt from disclosure 
> under applicable law. If the reader of this message is not the intended 
> recipient, or the employee or agent responsible for delivery of the 
> message to the intended recipient, you are hereby notified that any 
> dissemination, distribution or copying of this communication is strictly 
> prohibited. If you have received this communication in error, please 
> notify the sender immediately by e-mail and delete the material from any 
> computer.  Thank you.
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev



More information about the biojava-dev mailing list