From harryzs1981 at gmail.com Wed May 6 09:13:42 2009 From: harryzs1981 at gmail.com (sheng zhao) Date: Wed, 6 May 2009 15:13:42 +0200 Subject: [Biojava-dev] Biojava-doc in chm forma Message-ID: <3d23b1eb0905060613m643adf87sdef55a05a083dd51@mail.gmail.com> Hi Where can I find Biojava-doc in chm format?? Thanks ! harry From andreas at sdsc.edu Mon May 11 00:26:58 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 10 May 2009 21:26:58 -0700 Subject: [Biojava-dev] Plans for next biojava release - modularization Message-ID: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> Hi biojava-devs, It is time to start working on the next biojava release. I would like to modularize the current code base and apply some of the ideas that have emerged around Richard's "biojava 3" code. In principle the idea is that all changes should be backwards compatible with the interfaces provided by the current biojava 1.7 release. Backwards compatibility shall only be broken if the functionality is being replaced with something that works better, and gets documented accordingly. For the build functionality I would suggest to stick with what Richard's biojava 3 code base already is providing. Since we will try to be backwards compatible all code development should be part of the biojava-trunk and the first step will be to move the ant-build scripts to a maven build process. Following this procedure will allow to use e.g. the code refactoring tools provided by Eclipse, which should come in handy. The modules I would like to see should provide self-contained functionality and cross dependencies should be restricted to a minimum. I would suggest to have the following modules: biojava-core: Contains everything that can not easily be modularized or nobody volunteers to become a module maintainer. biojava-phylogeny: Scooter expressed some interested to provide such a module and become package maintainer for it. biojava-structure: Everything protein structure related. I would be package maintainer. biojava-blast: Blast parsing is a frequently requested functionality and it would be good to have this code self-contained. A package maintainer for this still will need to be nominated at a later stage. Any suggestions for other modules? Let me know what you think about this. Andreas From HWillis at scripps.edu Mon May 11 09:50:58 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 11 May 2009 09:50:58 -0400 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> Andreas Another theme that should be considered is providing a multi-thread version of any module with long run time. This would have a couple elements. A progress listener interface should be standard where core code would update progress messages to listeners that can be used by external code to display feedback to the user. I did this with the Neighbor Joining code for tree construction and it provides needed feedback in a GUI. If not the user gets frustrated because they don't know the code they are about to execute may take 10 minutes or 8 hours to complete and they think the software is not working. The reverse is also true for canceling an operation where you want to have core code stop processing a long running loop. Once the code has completed then the listener interface for process complete is called allowing the next step in the external code to continue. The developer would have the choice to call the "process" method or run it in a thread and wait for the callback complete method to be called. This is the first step in the ability to have the core/long running processes take advantage of multiple threads to complete the computational task faster. Not all code can be parallelized easily but if the algorithm can take advantage of running in parallel then it should. This then opens up a couple of cloud computing frameworks that extend the multi-threaded concepts in Java across a cluster http://www.terracotta.org/. If we put an emphasis on having code that runs well in a thread we are one step closer to an architecture that can run in a cloud. The computational problems are only going to get bigger and with Amazon EC2 and http://www.eucalyptus.com/ approaches computational IO cycles are going to be cheap as long as the software/libraries can easily take advantage of it. Thanks Scooter -----Original Message----- From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic Sent: Monday, May 11, 2009 12:27 AM To: biojava-dev Subject: [Biojava-dev] Plans for next biojava release - modularization Hi biojava-devs, It is time to start working on the next biojava release. I would like to modularize the current code base and apply some of the ideas that have emerged around Richard's "biojava 3" code. In principle the idea is that all changes should be backwards compatible with the interfaces provided by the current biojava 1.7 release. Backwards compatibility shall only be broken if the functionality is being replaced with something that works better, and gets documented accordingly. For the build functionality I would suggest to stick with what Richard's biojava 3 code base already is providing. Since we will try to be backwards compatible all code development should be part of the biojava-trunk and the first step will be to move the ant-build scripts to a maven build process. Following this procedure will allow to use e.g. the code refactoring tools provided by Eclipse, which should come in handy. The modules I would like to see should provide self-contained functionality and cross dependencies should be restricted to a minimum. I would suggest to have the following modules: biojava-core: Contains everything that can not easily be modularized or nobody volunteers to become a module maintainer. biojava-phylogeny: Scooter expressed some interested to provide such a module and become package maintainer for it. biojava-structure: Everything protein structure related. I would be package maintainer. biojava-blast: Blast parsing is a frequently requested functionality and it would be good to have this code self-contained. A package maintainer for this still will need to be nominated at a later stage. Any suggestions for other modules? Let me know what you think about this. Andreas _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Mon May 11 18:53:14 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 11 May 2009 15:53:14 -0700 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> Message-ID: <59a41c430905111553n743dbcb3hbb21ec59294cb723@mail.gmail.com> Hi Scooter, I like the idea of supporting multiple threads and parallelizing code where possible. Is there a reference implementation that you would recommend for how progress listeners should be implemented? I suppose the neighbor joining code you mention below is not part of biojava... Andreas On Mon, May 11, 2009 at 6:50 AM, Scooter Willis wrote: > Andreas > > Another theme that should be considered is providing a multi-thread > version of any module with long run time. This would have a couple > elements. A progress listener interface should be standard where core > code would update progress messages to listeners that can be used by > external code to display feedback to the user. I did this with the > Neighbor Joining code for tree construction and it provides needed > feedback in a GUI. If not the user gets frustrated because they don't > know the code they are about to execute may take 10 minutes or 8 hours > to complete and they think the software is not working. The reverse is > also true for canceling an operation where you want to have core code > stop processing a long running loop. Once the code has completed then > the listener interface for process complete is called allowing the next > step in the external code to continue. The developer would have the > choice to call the "process" method or run it in a thread and wait for > the callback complete method to be called. > > This is the first step in the ability to have the core/long running > processes take advantage of multiple threads to complete the > computational task faster. Not all code can be parallelized easily but > if the algorithm can take advantage of running in parallel then it > should. This then opens up a couple of cloud computing frameworks that > extend the multi-threaded concepts in Java across a cluster > http://www.terracotta.org/. If we put an emphasis on having code that > runs well in a thread we are one step closer to an architecture that can > run in a cloud. The computational problems are only going to get bigger > and with Amazon EC2 and http://www.eucalyptus.com/ approaches > computational IO cycles are going to be cheap as long as the > software/libraries can easily take advantage of it. > > Thanks > > Scooter > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas > Prlic > Sent: Monday, May 11, 2009 12:27 AM > To: biojava-dev > Subject: [Biojava-dev] Plans for next biojava release - modularization > > Hi biojava-devs, > > It is time to start working on the next biojava release. ?I ?would > like to modularize the current code base and apply some of the ideas > that have emerged around Richard's "biojava 3" code. In principle the > idea is that all changes should be backwards compatible with the > interfaces provided by the current biojava 1.7 release. ?Backwards > compatibility shall only be broken if the functionality is being > replaced with something that works better, and gets documented > accordingly. For the build functionality I would suggest to stick with > what Richard's biojava 3 code base already is providing. Since we will > try to be backwards compatible all code development should be part of > the biojava-trunk and the first step will be to move the ant-build > scripts to a maven build process. Following this procedure will allow > to use e.g. the code refactoring tools provided by Eclipse, which > should come in handy. > > The modules I would like to see should provide self-contained > functionality and cross dependencies should be restricted to a > minimum. I would suggest to have the following modules: > > biojava-core: Contains everything that can not easily be modularized > or nobody volunteers to become a module maintainer. > biojava-phylogeny: Scooter expressed some interested to provide such a > module and become package maintainer for it. > biojava-structure: Everything protein structure related. I would be > package maintainer. > biojava-blast: Blast parsing is a frequently requested functionality > and it would be good to have this code self-contained. A package > maintainer for this still will need to be nominated at a later stage. > Any suggestions for other modules? > > Let me know what you think about this. > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From HWillis at scripps.edu Mon May 11 20:34:11 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 11 May 2009 20:34:11 -0400 Subject: [Biojava-dev] Plans for next biojava release - modularization References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com><061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> <59a41c430905111553n743dbcb3hbb21ec59294cb723@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C84F@FLMAIL1.fl.ad.scripps.edu> Andreas This is what I put together for the tree code as the interface. In the loop code of the algorithm you simply call the appropriate progress message where it could be cleaned up to have one progress method and a float for percentage complete. Passing the instance of NJTree was required for this specific case because all the work was done when the NJTree class was instantiated. It really should be cleaned up so that it has a process method and is runnable in a thread if needed. The progress listener could be generic for all long running classes. I have wrapped the NJTree code in a TreeConstructor class which bridges the biojava framework and allows the NJTree code to be replaced by something that is compatible with the BioJava open source license if needed. I am still playing around with performance optimizations and need to see if Jalview would contribute the NJTree code to BioJava. If not, I would do my own implementation as the algorithm is not difficult. I was also thinking that we could have Java code that provides functionality such as Blast by making a web service call to an external publicly supported service. Instead of parsing Blast results flat files you can make a call to an external service http://www.ebi.ac.uk/Tools/webservices/services/wublast via web services and get well structured results. Scooter package org.biojavax.phylo; import org.biojavax.phylo.jalview.NJTree; /** * * @author willishf */ public interface NJTreeProgressListener { public void progress(NJTree njtree,String state, int percentageComplete); public void progress(NJTree njtree,String state, int currentCount,int totalCount); public void complete(NJTree njtree); public void canceled(NJTree njtree); } ********************************************************************************************** This code could be abstracted out into a base class or simply added into a class that needs to notify external listeners ********************************************************************************************** Vector progessListenerVector = new Vector(); public void addProgessListener(NJTreeProgressListener treeProgessListener) { if (treeProgessListener != null) { progessListenerVector.add(treeProgessListener); } } public void removeProgessListener(NJTreeProgressListener treeProgessListener) { if (treeProgessListener != null) { progessListenerVector.remove(treeProgessListener); } } public void broadcastComplete() { for (NJTreeProgressListener treeProgressListener : progessListenerVector) { treeProgressListener.complete(this); } } public void updateProgress(String state, int percentage) { for (NJTreeProgressListener treeProgressListener : progessListenerVector) { treeProgressListener.progress(this,state, percentage); } } public void updateProgress(String state, int currentCount, int totalCount) { for (NJTreeProgressListener treeProgressListener : progessListenerVector) { treeProgressListener.progress(this,state, currentCount, totalCount); } } *************************************************************************************** /* * To change this template, choose Tools | Templates * and open the template in the editor. */ package org.biojavax.phylo; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.util.ArrayList; import java.util.Vector; import org.biojava.bio.BioException; import org.biojavax.phylo.jalview.NJTreeNew; import org.biojavax.phylo.jalview.TreeConstructionAlgorithm; import org.biojavax.phylo.jalview.TreeType; import org.biojava.bio.seq.*; import org.biojavax.SimpleNamespace; import org.biojavax.bio.seq.RichSequence; import org.biojavax.bio.seq.RichSequenceIterator; import org.biojavax.phylo.jalview.NJSequence; import org.biojavax.phylo.jalview.NJTree; /** * * @author willishf */ public class TreeConstructor extends Thread { NJTree njtree = null; NJSequence[] sequences = null; TreeType treeType; TreeConstructionAlgorithm treeConstructionAlgorithm; NJTreeProgressListener treeProgessListener; public TreeConstructor(SequenceIterator iter, TreeType _treeType, TreeConstructionAlgorithm _treeConstructionAlgorithm, NJTreeProgressListener _treeProgessListener) { treeType = _treeType; treeConstructionAlgorithm = _treeConstructionAlgorithm; treeProgessListener = _treeProgessListener; ArrayList sequenceArray = new ArrayList(); while (iter.hasNext()) { try { Sequence seq = iter.nextSequence(); NJSequence njsequence = new NJSequence(seq.getName(), seq.seqString()); sequenceArray.add(njsequence); } catch (Exception e) { e.printStackTrace(); } } sequences = new NJSequence[sequenceArray.size()]; sequenceArray.toArray(sequences); } public TreeConstructor(Vector sequenceVector, TreeType _treeType, TreeConstructionAlgorithm _treeConstructionAlgorithm, NJTreeProgressListener _treeProgessListener) { treeType = _treeType; treeConstructionAlgorithm = _treeConstructionAlgorithm; treeProgessListener = _treeProgessListener; sequences = new NJSequence[sequenceVector.size()]; int index = 0; for (RichSequence seq : sequenceVector) { NJSequence njsequence = new NJSequence(seq.getName(), seq.seqString()); sequences[index] = njsequence; index++; } } public void cancel(){ if(njtree != null) njtree.cancel(); } public void process() throws Exception { njtree = new NJTree(sequences, treeType, treeConstructionAlgorithm, treeProgessListener); } @Override public void run() { try { process(); } catch (Exception e) { e.printStackTrace(); } } public String getNewickString() { if (njtree != null) { return njtree.toString(); } return ""; } public static void main(String[] args) { if (args.length == 0) { args = new String[3]; args[0] = "C:\\MutualInformation\\project\\hiv\\hiv-genes-genome.fasta"; } try { //prepare a BufferedReader for file io BufferedReader br = new BufferedReader(new FileReader(args[0])); SimpleNamespace ns = new SimpleNamespace("biojava"); // You can use any of the convenience methods found in the BioJava 1.6 API RichSequenceIterator rsi = RichSequence.IOTools.readFastaProtein(br, ns); long readTime = System.currentTimeMillis(); TreeConstructor treeConstructor = new TreeConstructor(rsi, TreeType.NJ, TreeConstructionAlgorithm.PID, new ProgessListenerStub()); treeConstructor.process(); long treeTime = System.currentTimeMillis(); String newick = treeConstructor.getNewickString(); System.out.println("Tree time " + (treeTime - readTime)); System.out.println(newick); } catch (FileNotFoundException ex) { //can't find file specified by args[0] ex.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } } } -----Original Message----- From: andreas.prlic at gmail.com on behalf of Andreas Prlic Sent: Mon 5/11/2009 6:53 PM To: Scooter Willis Cc: biojava-dev Subject: Re: [Biojava-dev] Plans for next biojava release - modularization Hi Scooter, I like the idea of supporting multiple threads and parallelizing code where possible. Is there a reference implementation that you would recommend for how progress listeners should be implemented? I suppose the neighbor joining code you mention below is not part of biojava... Andreas On Mon, May 11, 2009 at 6:50 AM, Scooter Willis wrote: > Andreas > > Another theme that should be considered is providing a multi-thread > version of any module with long run time. This would have a couple > elements. A progress listener interface should be standard where core > code would update progress messages to listeners that can be used by > external code to display feedback to the user. I did this with the > Neighbor Joining code for tree construction and it provides needed > feedback in a GUI. If not the user gets frustrated because they don't > know the code they are about to execute may take 10 minutes or 8 hours > to complete and they think the software is not working. The reverse is > also true for canceling an operation where you want to have core code > stop processing a long running loop. Once the code has completed then > the listener interface for process complete is called allowing the next > step in the external code to continue. The developer would have the > choice to call the "process" method or run it in a thread and wait for > the callback complete method to be called. > > This is the first step in the ability to have the core/long running > processes take advantage of multiple threads to complete the > computational task faster. Not all code can be parallelized easily but > if the algorithm can take advantage of running in parallel then it > should. This then opens up a couple of cloud computing frameworks that > extend the multi-threaded concepts in Java across a cluster > http://www.terracotta.org/. If we put an emphasis on having code that > runs well in a thread we are one step closer to an architecture that can > run in a cloud. The computational problems are only going to get bigger > and with Amazon EC2 and http://www.eucalyptus.com/ approaches > computational IO cycles are going to be cheap as long as the > software/libraries can easily take advantage of it. > > Thanks > > Scooter > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas > Prlic > Sent: Monday, May 11, 2009 12:27 AM > To: biojava-dev > Subject: [Biojava-dev] Plans for next biojava release - modularization > > Hi biojava-devs, > > It is time to start working on the next biojava release. ?I ?would > like to modularize the current code base and apply some of the ideas > that have emerged around Richard's "biojava 3" code. In principle the > idea is that all changes should be backwards compatible with the > interfaces provided by the current biojava 1.7 release. ?Backwards > compatibility shall only be broken if the functionality is being > replaced with something that works better, and gets documented > accordingly. For the build functionality I would suggest to stick with > what Richard's biojava 3 code base already is providing. Since we will > try to be backwards compatible all code development should be part of > the biojava-trunk and the first step will be to move the ant-build > scripts to a maven build process. Following this procedure will allow > to use e.g. the code refactoring tools provided by Eclipse, which > should come in handy. > > The modules I would like to see should provide self-contained > functionality and cross dependencies should be restricted to a > minimum. I would suggest to have the following modules: > > biojava-core: Contains everything that can not easily be modularized > or nobody volunteers to become a module maintainer. > biojava-phylogeny: Scooter expressed some interested to provide such a > module and become package maintainer for it. > biojava-structure: Everything protein structure related. I would be > package maintainer. > biojava-blast: Blast parsing is a frequently requested functionality > and it would be good to have this code self-contained. A package > maintainer for this still will need to be nominated at a later stage. > Any suggestions for other modules? > > Let me know what you think about this. > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From mark.schreiber at novartis.com Tue May 12 01:26:33 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 12 May 2009 13:26:33 +0800 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> Message-ID: Hi - This was one thing we discussed previously with respect to biojava 3. Generally I support the idea because almost all computers are now multi-core and as you say cloud or utility computing is already a reality. However, I tend to think that biojava should not control threading or concurrency. This should be done by the developer. This is because sometimes mutithreading can be fast on a slow computer but slow on a fast computer (due to the overhead in spawning threads) so programs need to be tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc don't like people attempting to control their own threads. What BioJava should do is expose granular and thread-safe operations that can be threaded or form discrete tasks on a utility grid or complete in SessionBeans on an App server. For example it would be better if BioJava had a single threaded method to calculate the GC of a single sequence rather than a multi-threaded method that calculates the GC of multiple sequences. This would let the developer make a multithreaded version if desired or distribute multiple tasks based on the single threaded version to a compute cloud (and let the cloud manage all the tasks). Possibly the best situation would be to have the single threaded fine grain operations that let developers or grid engines control threading and then higher level APIs that do it for you (or good cookbook examples that show you how to do it). Another idea that was discussed was the use of properties files to allow people to set how many CPUs they wanted to make available to the JVM or name packages that can or cannot use threading. Finally, there are lots of times when it is highly desirable to use Java beans because they play well with dozens of Java api's however beans don't work well with threads because they have public setter methods. I would like to see a lot more bean use in a future BioJava because it would make life so much easier but a lot of care would need to be taken to make sure thread safety is preserved. There are many patterns that can be used such as synchronization locks etc to make things thread safe so I think this can be achieved as long as we are disciplined and consider that all methods may be used in a multi-threaded application (even if we write the method as a single thread). If there are code checkers that make suggestions on thread safety it would be great to have these as part of the standard build process. Good documentation would go a long way as well. Are there unit test patterns that can catch these problems as well? Suggestions would be great. Progress Listener patterns are good but it depends on the situation and might be better handled in high level APIs or left to the developer. For example in your NJ code a progress listener would be good if someone fed 1000 sequences into the method but not if they only put in 10. Also code running on an old machine might need a progress listener but the same problem on a new machine may complete almost instantly. Probably a pluggable listener would be the way to go. Also it might be possible to do this using the new JDK APIs that let you take a peek at the stack trace. Even if your NJ method didn't allow for a progress listener a developer could still make one by looking at the method calls in the stack. As long as your NJ method called other methods internally for each sequence (quite likely) it would be possible to observe the cycle of method calls from the stack. This might make it possible to have a very general BioJava progress listener that can be told to count the number of times a method is called in the stack. The name of the method would be the argument. If the application runs in a Java App server you can also do this very easily with a method Interceptor. - Mark biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > Andreas > > Another theme that should be considered is providing a multi-thread > version of any module with long run time. This would have a couple > elements. A progress listener interface should be standard where core > code would update progress messages to listeners that can be used by > external code to display feedback to the user. I did this with the > Neighbor Joining code for tree construction and it provides needed > feedback in a GUI. If not the user gets frustrated because they don't > know the code they are about to execute may take 10 minutes or 8 hours > to complete and they think the software is not working. The reverse is > also true for canceling an operation where you want to have core code > stop processing a long running loop. Once the code has completed then > the listener interface for process complete is called allowing the next > step in the external code to continue. The developer would have the > choice to call the "process" method or run it in a thread and wait for > the callback complete method to be called. > > This is the first step in the ability to have the core/long running > processes take advantage of multiple threads to complete the > computational task faster. Not all code can be parallelized easily but > if the algorithm can take advantage of running in parallel then it > should. This then opens up a couple of cloud computing frameworks that > extend the multi-threaded concepts in Java across a cluster > http://www.terracotta.org/. If we put an emphasis on having code that > runs well in a thread we are one step closer to an architecture that can > run in a cloud. The computational problems are only going to get bigger > and with Amazon EC2 and http://www.eucalyptus.com/ approaches > computational IO cycles are going to be cheap as long as the > software/libraries can easily take advantage of it. > > Thanks > > Scooter > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas > Prlic > Sent: Monday, May 11, 2009 12:27 AM > To: biojava-dev > Subject: [Biojava-dev] Plans for next biojava release - modularization > > Hi biojava-devs, > > It is time to start working on the next biojava release. I would > like to modularize the current code base and apply some of the ideas > that have emerged around Richard's "biojava 3" code. In principle the > idea is that all changes should be backwards compatible with the > interfaces provided by the current biojava 1.7 release. Backwards > compatibility shall only be broken if the functionality is being > replaced with something that works better, and gets documented > accordingly. For the build functionality I would suggest to stick with > what Richard's biojava 3 code base already is providing. Since we will > try to be backwards compatible all code development should be part of > the biojava-trunk and the first step will be to move the ant-build > scripts to a maven build process. Following this procedure will allow > to use e.g. the code refactoring tools provided by Eclipse, which > should come in handy. > > The modules I would like to see should provide self-contained > functionality and cross dependencies should be restricted to a > minimum. I would suggest to have the following modules: > > biojava-core: Contains everything that can not easily be modularized > or nobody volunteers to become a module maintainer. > biojava-phylogeny: Scooter expressed some interested to provide such a > module and become package maintainer for it. > biojava-structure: Everything protein structure related. I would be > package maintainer. > biojava-blast: Blast parsing is a frequently requested functionality > and it would be good to have this code self-contained. A package > maintainer for this still will need to be nominated at a later stage. > Any suggestions for other modules? > > Let me know what you think about this. > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From ayates at ebi.ac.uk Tue May 12 04:27:52 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 12 May 2009 09:27:52 +0100 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: References: Message-ID: <4A093308.4030409@ebi.ac.uk> I agree with Mark. Later versions of the Java environment will make concurrent programming easier not to mention languages already available on the VM (Scala & Clojure) that make it very easy indeed. Our goal in biojava must be to write code which will behave well in one of these environments. I don't want us to fall into the trap of earlier biojava where things like own implementations of database connection pooling data sources (sorry I don't mean to pick on any one part of the code but it highlights very well what we should avoid). We're bioinformaticians/engineers; lets do what we do best and work well within our chosen field. Let other people like Doug Lea deal with the pain that is concurrent programming & the alike :) Andy mark.schreiber at novartis.com wrote: > Hi - > > This was one thing we discussed previously with respect to biojava 3. > Generally I support the idea because almost all computers are now > multi-core and as you say cloud or utility computing is already a reality. > > However, I tend to think that biojava should not control threading or > concurrency. This should be done by the developer. This is because > sometimes mutithreading can be fast on a slow computer but slow on a fast > computer (due to the overhead in spawning threads) so programs need to be > tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc > don't like people attempting to control their own threads. What BioJava > should do is expose granular and thread-safe operations that can be > threaded or form discrete tasks on a utility grid or complete in > SessionBeans on an App server. For example it would be better if BioJava > had a single threaded method to calculate the GC of a single sequence > rather than a multi-threaded method that calculates the GC of multiple > sequences. This would let the developer make a multithreaded version if > desired or distribute multiple tasks based on the single threaded version > to a compute cloud (and let the cloud manage all the tasks). > > Possibly the best situation would be to have the single threaded fine > grain operations that let developers or grid engines control threading and > then higher level APIs that do it for you (or good cookbook examples that > show you how to do it). Another idea that was discussed was the use of > properties files to allow people to set how many CPUs they wanted to make > available to the JVM or name packages that can or cannot use threading. > > Finally, there are lots of times when it is highly desirable to use Java > beans because they play well with dozens of Java api's however beans don't > work well with threads because they have public setter methods. I would > like to see a lot more bean use in a future BioJava because it would make > life so much easier but a lot of care would need to be taken to make sure > thread safety is preserved. There are many patterns that can be used such > as synchronization locks etc to make things thread safe so I think this > can be achieved as long as we are disciplined and consider that all > methods may be used in a multi-threaded application (even if we write the > method as a single thread). If there are code checkers that make > suggestions on thread safety it would be great to have these as part of > the standard build process. Good documentation would go a long way as > well. Are there unit test patterns that can catch these problems as well? > Suggestions would be great. > > Progress Listener patterns are good but it depends on the situation and > might be better handled in high level APIs or left to the developer. For > example in your NJ code a progress listener would be good if someone fed > 1000 sequences into the method but not if they only put in 10. Also code > running on an old machine might need a progress listener but the same > problem on a new machine may complete almost instantly. Probably a > pluggable listener would be the way to go. Also it might be possible to > do this using the new JDK APIs that let you take a peek at the stack > trace. Even if your NJ method didn't allow for a progress listener a > developer could still make one by looking at the method calls in the > stack. As long as your NJ method called other methods internally for each > sequence (quite likely) it would be possible to observe the cycle of > method calls from the stack. This might make it possible to have a very > general BioJava progress listener that can be told to count the number of > times a method is called in the stack. The name of the method would be the > argument. If the application runs in a Java App server you can also do > this very easily with a method Interceptor. > > - Mark > > biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > >> Andreas >> >> Another theme that should be considered is providing a multi-thread >> version of any module with long run time. This would have a couple >> elements. A progress listener interface should be standard where core >> code would update progress messages to listeners that can be used by >> external code to display feedback to the user. I did this with the >> Neighbor Joining code for tree construction and it provides needed >> feedback in a GUI. If not the user gets frustrated because they don't >> know the code they are about to execute may take 10 minutes or 8 hours >> to complete and they think the software is not working. The reverse is >> also true for canceling an operation where you want to have core code >> stop processing a long running loop. Once the code has completed then >> the listener interface for process complete is called allowing the next >> step in the external code to continue. The developer would have the >> choice to call the "process" method or run it in a thread and wait for >> the callback complete method to be called. >> >> This is the first step in the ability to have the core/long running >> processes take advantage of multiple threads to complete the >> computational task faster. Not all code can be parallelized easily but >> if the algorithm can take advantage of running in parallel then it >> should. This then opens up a couple of cloud computing frameworks that >> extend the multi-threaded concepts in Java across a cluster >> http://www.terracotta.org/. If we put an emphasis on having code that >> runs well in a thread we are one step closer to an architecture that can >> run in a cloud. The computational problems are only going to get bigger >> and with Amazon EC2 and http://www.eucalyptus.com/ approaches >> computational IO cycles are going to be cheap as long as the >> software/libraries can easily take advantage of it. >> >> Thanks >> >> Scooter >> >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas >> Prlic >> Sent: Monday, May 11, 2009 12:27 AM >> To: biojava-dev >> Subject: [Biojava-dev] Plans for next biojava release - modularization >> >> Hi biojava-devs, >> >> It is time to start working on the next biojava release. I would >> like to modularize the current code base and apply some of the ideas >> that have emerged around Richard's "biojava 3" code. In principle the >> idea is that all changes should be backwards compatible with the >> interfaces provided by the current biojava 1.7 release. Backwards >> compatibility shall only be broken if the functionality is being >> replaced with something that works better, and gets documented >> accordingly. For the build functionality I would suggest to stick with >> what Richard's biojava 3 code base already is providing. Since we will >> try to be backwards compatible all code development should be part of >> the biojava-trunk and the first step will be to move the ant-build >> scripts to a maven build process. Following this procedure will allow >> to use e.g. the code refactoring tools provided by Eclipse, which >> should come in handy. >> >> The modules I would like to see should provide self-contained >> functionality and cross dependencies should be restricted to a >> minimum. I would suggest to have the following modules: >> >> biojava-core: Contains everything that can not easily be modularized >> or nobody volunteers to become a module maintainer. >> biojava-phylogeny: Scooter expressed some interested to provide such a >> module and become package maintainer for it. >> biojava-structure: Everything protein structure related. I would be >> package maintainer. >> biojava-blast: Blast parsing is a frequently requested functionality >> and it would be good to have this code self-contained. A package >> maintainer for this still will need to be nominated at a later stage. >> Any suggestions for other modules? >> >> Let me know what you think about this. >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure > under applicable law. If the reader of this message is not the intended > recipient, or the employee or agent responsible for delivery of the > message to the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this communication is strictly > prohibited. If you have received this communication in error, please > notify the sender immediately by e-mail and delete the material from any > computer. Thank you. > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at eaglegenomics.com Tue May 12 04:26:26 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 12 May 2009 09:26:26 +0100 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> Message-ID: <1242116786.7101.7.camel@buzzybee> The BJ3 code contains only as much code as is needed to represent sequences and to parse/write simple FASTA. It should be viewed as a concept. In particular the file parsing mechanism is quite flexible (if a little complex) but easily wrapped with simple one-liner utility methods to provide end-users with easier-to-use APIs. Sequence representation in BJ3 is done via the Collections API. It's set up in such a way that you can write something yourself that implements the List API and behaves like a List but internally uses a more compact or even offline storage mechanism to represent the sequence. This allows you to reuse sequences wherever Lists can be used, e.g. in Iterators or foreach-loops. Everything written so far has been documented here: http://biojava.org/wiki/BioJava3:HowTo cheers, Richard On Sun, 2009-05-10 at 21:26 -0700, Andreas Prlic wrote: > Hi biojava-devs, > > It is time to start working on the next biojava release. I would > like to modularize the current code base and apply some of the ideas > that have emerged around Richard's "biojava 3" code. In principle the > idea is that all changes should be backwards compatible with the > interfaces provided by the current biojava 1.7 release. Backwards > compatibility shall only be broken if the functionality is being > replaced with something that works better, and gets documented > accordingly. For the build functionality I would suggest to stick with > what Richard's biojava 3 code base already is providing. Since we will > try to be backwards compatible all code development should be part of > the biojava-trunk and the first step will be to move the ant-build > scripts to a maven build process. Following this procedure will allow > to use e.g. the code refactoring tools provided by Eclipse, which > should come in handy. > > The modules I would like to see should provide self-contained > functionality and cross dependencies should be restricted to a > minimum. I would suggest to have the following modules: > > biojava-core: Contains everything that can not easily be modularized > or nobody volunteers to become a module maintainer. > biojava-phylogeny: Scooter expressed some interested to provide such a > module and become package maintainer for it. > biojava-structure: Everything protein structure related. I would be > package maintainer. > biojava-blast: Blast parsing is a frequently requested functionality > and it would be good to have this code self-contained. A package > maintainer for this still will need to be nominated at a later stage. > Any suggestions for other modules? > > Let me know what you think about this. > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From HWillis at scripps.edu Tue May 12 09:34:51 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 12 May 2009 09:34:51 -0400 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> Message-ID: <061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu> Mark It is a challenge on knowing where to draw the line. Allowing both options is a reasonable approach. The implementation of the algorithm is key to allow it to be multi-threaded or being able to run in parallel. One approach is to provide a standard interface such as process() would wait for the result/return value and run in the parent thread. To run the algorithm in a thread you can have a startProcess() where you can add yourself as a progress listener and when complete() method is called you can call getResults(). You can then also have the corresponding stopProcess() which would set an internal value to cause all threads to quit. Lots of ways to tackle the problem the key is to start talking about it and at minimum take advantage of multiple-cores where the external code can set the number of cores to use. You can get a dual quad core machine these days for < $1000 but most software implementations are not designed to take advantage of it. The real question is what exists today in the BioJava API that is considered long running in normal use case and thus is a candidate to be run in parallel. It may not be an issue in existing BioJava code. When I first started using BioJava I went looking for BLAST code only to find a BLAST parser. I wanted to do a Multiple Sequence Alignment and turns out that Biojava code calls CLUSTALW as an external processor under the covers. I also needed code to construct trees from an MSA and found the summer of code project that was only focused on representing the tree. It would be nice to have a BLAST implementation in Java optimized to run on a cluster but who has time to rewrite BLAST in Java when you can do BLAST search via the web and focus on parsing the results. BioJava needs a BLAST API that makes a web services call to an external service and gets returns structured results in core BioJava structures. Probably not difficult to do a Java version of CLUSTALW but again we can push the work out to http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results back returned in BioJava structures. I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web service -> BioJava code. I haven't done the research but it seems that http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to expose common biology computational services. If multiple external services are offering BLAST via web services where each picked a different implementation then BioJava could provide abstraction to different services. Thanks Scooter From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] Sent: Tuesday, May 12, 2009 1:27 AM To: Scooter Willis Cc: Andreas Prlic; biojava-dev Subject: Re: [Biojava-dev] Plans for next biojava release - modularization Hi - This was one thing we discussed previously with respect to biojava 3. Generally I support the idea because almost all computers are now multi-core and as you say cloud or utility computing is already a reality. However, I tend to think that biojava should not control threading or concurrency. This should be done by the developer. This is because sometimes mutithreading can be fast on a slow computer but slow on a fast computer (due to the overhead in spawning threads) so programs need to be tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc don't like people attempting to control their own threads. What BioJava should do is expose granular and thread-safe operations that can be threaded or form discrete tasks on a utility grid or complete in SessionBeans on an App server. For example it would be better if BioJava had a single threaded method to calculate the GC of a single sequence rather than a multi-threaded method that calculates the GC of multiple sequences. This would let the developer make a multithreaded version if desired or distribute multiple tasks based on the single threaded version to a compute cloud (and let the cloud manage all the tasks). Possibly the best situation would be to have the single threaded fine grain operations that let developers or grid engines control threading and then higher level APIs that do it for you (or good cookbook examples that show you how to do it). Another idea that was discussed was the use of properties files to allow people to set how many CPUs they wanted to make available to the JVM or name packages that can or cannot use threading. Finally, there are lots of times when it is highly desirable to use Java beans because they play well with dozens of Java api's however beans don't work well with threads because they have public setter methods. I would like to see a lot more bean use in a future BioJava because it would make life so much easier but a lot of care would need to be taken to make sure thread safety is preserved. There are many patterns that can be used such as synchronization locks etc to make things thread safe so I think this can be achieved as long as we are disciplined and consider that all methods may be used in a multi-threaded application (even if we write the method as a single thread). If there are code checkers that make suggestions on thread safety it would be great to have these as part of the standard build process. Good documentation would go a long way as well. Are there unit test patterns that can catch these problems as well? Suggestions would be great. Progress Listener patterns are good but it depends on the situation and might be better handled in high level APIs or left to the developer. For example in your NJ code a progress listener would be good if someone fed 1000 sequences into the method but not if they only put in 10. Also code running on an old machine might need a progress listener but the same problem on a new machine may complete almost instantly. Probably a pluggable listener would be the way to go. Also it might be possible to do this using the new JDK APIs that let you take a peek at the stack trace. Even if your NJ method didn't allow for a progress listener a developer could still make one by looking at the method calls in the stack. As long as your NJ method called other methods internally for each sequence (quite likely) it would be possible to observe the cycle of method calls from the stack. This might make it possible to have a very general BioJava progress listener that can be told to count the number of times a method is called in the stack. The name of the method would be the argument. If the application runs in a Java App server you can also do this very easily with a method Interceptor. - Mark biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > Andreas > > Another theme that should be considered is providing a multi-thread > version of any module with long run time. This would have a couple > elements. A progress listener interface should be standard where core > code would update progress messages to listeners that can be used by > external code to display feedback to the user. I did this with the > Neighbor Joining code for tree construction and it provides needed > feedback in a GUI. If not the user gets frustrated because they don't > know the code they are about to execute may take 10 minutes or 8 hours > to complete and they think the software is not working. The reverse is > also true for canceling an operation where you want to have core code > stop processing a long running loop. Once the code has completed then > the listener interface for process complete is called allowing the next > step in the external code to continue. The developer would have the > choice to call the "process" method or run it in a thread and wait for > the callback complete method to be called. > > This is the first step in the ability to have the core/long running > processes take advantage of multiple threads to complete the > computational task faster. Not all code can be parallelized easily but > if the algorithm can take advantage of running in parallel then it > should. This then opens up a couple of cloud computing frameworks that > extend the multi-threaded concepts in Java across a cluster > http://www.terracotta.org/. If we put an emphasis on having code that > runs well in a thread we are one step closer to an architecture that can > run in a cloud. The computational problems are only going to get bigger > and with Amazon EC2 and http://www.eucalyptus.com/ approaches > computational IO cycles are going to be cheap as long as the > software/libraries can easily take advantage of it. > > Thanks > > Scooter > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas > Prlic > Sent: Monday, May 11, 2009 12:27 AM > To: biojava-dev > Subject: [Biojava-dev] Plans for next biojava release - modularization > > Hi biojava-devs, > > It is time to start working on the next biojava release. I would > like to modularize the current code base and apply some of the ideas > that have emerged around Richard's "biojava 3" code. In principle the > idea is that all changes should be backwards compatible with the > interfaces provided by the current biojava 1.7 release. Backwards > compatibility shall only be broken if the functionality is being > replaced with something that works better, and gets documented > accordingly. For the build functionality I would suggest to stick with > what Richard's biojava 3 code base already is providing. Since we will > try to be backwards compatible all code development should be part of > the biojava-trunk and the first step will be to move the ant-build > scripts to a maven build process. Following this procedure will allow > to use e.g. the code refactoring tools provided by Eclipse, which > should come in handy. > > The modules I would like to see should provide self-contained > functionality and cross dependencies should be restricted to a > minimum. I would suggest to have the following modules: > > biojava-core: Contains everything that can not easily be modularized > or nobody volunteers to become a module maintainer. > biojava-phylogeny: Scooter expressed some interested to provide such a > module and become package maintainer for it. > biojava-structure: Everything protein structure related. I would be > package maintainer. > biojava-blast: Blast parsing is a frequently requested functionality > and it would be good to have this code self-contained. A package > maintainer for this still will need to be nominated at a later stage. > Any suggestions for other modules? > > Let me know what you think about this. > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From andreas at sdsc.edu Tue May 12 19:52:51 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 12 May 2009 16:52:51 -0700 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <1242116786.7101.7.camel@buzzybee> References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> <1242116786.7101.7.camel@buzzybee> Message-ID: <59a41c430905121652s7c548985xd9261734b42a4182@mail.gmail.com> Hi Richard, Do you think the BJ3 code could form the beginning of a new biojava-sequence module and can become part of the next release? Andreas On Tue, May 12, 2009 at 1:26 AM, Richard Holland wrote: > The BJ3 code contains only as much code as is needed to represent > sequences and to parse/write simple FASTA. It should be viewed as a > concept. In particular the file parsing mechanism is quite flexible (if > a little complex) but easily wrapped with simple one-liner utility > methods to provide end-users with easier-to-use APIs. > > Sequence representation in BJ3 is done via the Collections API. It's set > up in such a way that you can write something yourself that implements > the List API and behaves like a List but internally uses a more compact > or even offline storage mechanism to represent the sequence. This allows > you to reuse sequences wherever Lists can be used, e.g. in Iterators or > foreach-loops. > > Everything written so far has been documented here: > > ?http://biojava.org/wiki/BioJava3:HowTo > > cheers, > Richard > > > > On Sun, 2009-05-10 at 21:26 -0700, Andreas Prlic wrote: >> Hi biojava-devs, >> >> It is time to start working on the next biojava release. ?I ?would >> like to modularize the current code base and apply some of the ideas >> that have emerged around Richard's "biojava 3" code. In principle the >> idea is that all changes should be backwards compatible with the >> interfaces provided by the current biojava 1.7 release. ?Backwards >> compatibility shall only be broken if the functionality is being >> replaced with something that works better, and gets documented >> accordingly. For the build functionality I would suggest to stick with >> what Richard's biojava 3 code base already is providing. Since we will >> try to be backwards compatible all code development should be part of >> the biojava-trunk and the first step will be to move the ant-build >> scripts to a maven build process. Following this procedure will allow >> to use e.g. the code refactoring tools provided by Eclipse, which >> should come in handy. >> >> The modules I would like to see should provide self-contained >> functionality and cross dependencies should be restricted to a >> minimum. I would suggest to have the following modules: >> >> biojava-core: Contains everything that can not easily be modularized >> or nobody volunteers to become a module maintainer. >> biojava-phylogeny: Scooter expressed some interested to provide such a >> module and become package maintainer for it. >> biojava-structure: Everything protein structure related. I would be >> package maintainer. >> biojava-blast: Blast parsing is a frequently requested functionality >> and it would be good to have this code self-contained. A package >> maintainer for this still will need to be nominated at a later stage. >> Any suggestions for other modules? >> >> Let me know what you think about this. >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > From andreas at sdsc.edu Tue May 12 19:59:11 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 12 May 2009 16:59:11 -0700 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu> References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu> Message-ID: <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com> Hi Scooter, about your suggestion for the blast webservice client code: In principle I like the idea and we have had questions on the mailing list regarding this in the past. Only thing is I think there is already some client code in java available: http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp but I am not sure how good that Java client library is.... Besides this, there is the need for work on our blast parser library and if you are interested in working on that you are welcome. As I mentioned, I think this should become its own module, due to the popularity of that code. Andreas On Tue, May 12, 2009 at 6:34 AM, Scooter Willis wrote: > Mark > > > > It is a challenge on knowing where to draw the line. Allowing both options > is a reasonable approach. The implementation of the algorithm is key to > allow it to be multi-threaded or being able to run in parallel. One approach > is to provide a standard interface such as process() would wait for the > result/return value and run in the parent thread. To run the algorithm in a > thread you can have a startProcess() where you can add yourself as a > progress listener and when complete() method is called you can call > getResults(). You can then also have the corresponding stopProcess() which > would set an internal value to cause all threads to quit. ?Lots of ways to > tackle the problem the key is to start talking about it and at minimum take > advantage of multiple-cores where the external code can set the number of > cores to use. You can get a dual quad core machine these days for < $1000 > but most software implementations are not designed to take advantage of it. > > > > The real question is what exists today in the BioJava API that is considered > long running in normal use case and thus is a candidate to be run in > parallel. It may not be an issue in existing BioJava code. When I first > started using BioJava I went looking for BLAST code only to find a BLAST > parser. I wanted to do a Multiple Sequence Alignment and turns out that > Biojava code calls CLUSTALW as an external processor under the covers. ?I > also needed code to construct trees from an MSA and found the summer of code > project that was only focused on representing the tree. > > > > It would be nice to have a BLAST implementation in Java optimized to run on > a cluster but who has time to rewrite BLAST in Java when you can do BLAST > search via the web and focus on parsing the results. BioJava needs a BLAST > API that makes a web services call to an external service and gets returns > structured results in core BioJava structures. Probably not difficult to do > a Java version of CLUSTALW but again we can push the work out to > http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results > back returned in BioJava structures. > > > > I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web > service -> BioJava code. I haven?t done the research but it seems that > http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to > expose common biology ?computational services. If multiple external services > are offering BLAST via web services where each picked a different > implementation then BioJava could provide abstraction to different services. > > > > Thanks > > Scooter > > > > From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] > Sent: Tuesday, May 12, 2009 1:27 AM > To: Scooter Willis > Cc: Andreas Prlic; biojava-dev > Subject: Re: [Biojava-dev] Plans for next biojava release - modularization > > > > Hi - > > This was one thing we discussed previously with respect to biojava 3. > ?Generally I support the idea because almost all computers are now > multi-core and as you say cloud or utility computing is already a reality. > > However, I tend to think that biojava should not control threading or > concurrency. This should be done by the developer. This is because sometimes > mutithreading can be fast on a slow computer but slow on a fast computer > (due to the overhead in spawning threads) so programs need to be tunable. > Also Java app servers and things like Sun Grid Engine, EC2 etc don't like > people attempting to control their own threads. ?What BioJava should do is > expose granular and thread-safe operations that can be threaded or form > discrete tasks on a utility grid or complete in SessionBeans on an App > server. ?For example it would be better if BioJava had a single threaded > method to calculate the GC of a single sequence rather than a multi-threaded > method that calculates the GC of multiple sequences. ?This would let the > developer make a multithreaded version if desired or distribute multiple > tasks based on the single threaded version to a compute cloud (and let the > cloud manage all the tasks). > > Possibly the best situation would be to have the single threaded fine grain > operations that let developers or grid engines control threading and then > higher level APIs that do it for you (or good cookbook examples that show > you how to do it). ?Another idea that was discussed was the use of > properties files to allow people to set how many CPUs they wanted to make > available to the JVM or name packages that can or cannot use threading. > > Finally, there are lots of times when it is highly desirable to use Java > beans because they play well with dozens of Java api's however beans don't > work well with threads because they have public setter methods. ?I would > like to see a lot more bean use in a future BioJava because it would make > life so much easier but a lot of care would need to be taken to make sure > thread safety is preserved. ?There are many patterns that can be used such > as synchronization locks etc to make things thread safe so I think this can > be achieved as long as we are disciplined and consider that all methods may > be used in a multi-threaded application (even if we write the method as a > single thread). ?If there are code checkers that make suggestions on thread > safety it would be great to have these as part of the standard build > process. ?Good documentation would go a long way as well. ?Are there unit > test patterns that can catch these problems as well? ?Suggestions would be > great. > > Progress Listener patterns are good but it depends on the situation and > might be better handled in high level APIs or left to the developer. ?For > example in your NJ code a progress listener would be good if someone fed > 1000 sequences into the method but not if they only put in 10. Also code > running on an old machine might need a progress listener but the same > problem on a new machine may complete almost instantly. ?Probably a > pluggable listener would be the way to go. ?Also it might be possible to do > this using the new JDK APIs that let you take a peek at the stack trace. > Even if your NJ method didn't allow for a progress listener a developer > could still make one by looking at the method calls in the stack. As long as > your NJ method called other methods internally for each sequence (quite > likely) it would be possible to observe the cycle of method calls from the > stack. ?This might make it possible to have a very general BioJava progress > listener that can be told to count the number of times a method is called in > the stack. The name of the method would be the argument. ?If the application > runs in a Java App server you can also do this very easily with a method > Interceptor. > > - Mark > > biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > >> Andreas >> >> Another theme that should be considered is providing a multi-thread >> version of any module with long run time. This would have a couple >> elements. A progress listener interface should be standard where core >> code would update progress messages to listeners that can be used by >> external code to display feedback to the user. I did this with the >> Neighbor Joining code for tree construction and it provides needed >> feedback in a GUI. If not the user gets frustrated because they don't >> know the code they are about to execute may take 10 minutes or 8 hours >> to complete and they think the software is not working. The reverse is >> also true for canceling an operation where you want to have core code >> stop processing a long running loop. Once the code has completed then >> the listener interface for process complete is called allowing the next >> step in the external code to continue. The developer would have the >> choice to call the "process" method or run it in a thread and wait for >> the callback complete method to be called. >> >> This is the first step in the ability to have the core/long running >> processes take advantage of multiple threads to complete the >> computational task faster. Not all code can be parallelized easily but >> if the algorithm can take advantage of running in parallel then it >> should. This then opens up a couple of cloud computing frameworks that >> extend the multi-threaded concepts in Java across a cluster >> http://www.terracotta.org/. If we put an emphasis on having code that >> runs well in a thread we are one step closer to an architecture that can >> run in a cloud. The computational problems are only going to get bigger >> and with Amazon EC2 and http://www.eucalyptus.com/ approaches >> computational IO cycles are going to be cheap as long as the >> software/libraries can easily take advantage of it. >> >> Thanks >> >> Scooter >> >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas >> Prlic >> Sent: Monday, May 11, 2009 12:27 AM >> To: biojava-dev >> Subject: [Biojava-dev] Plans for next biojava release - modularization >> >> Hi biojava-devs, >> >> It is time to start working on the next biojava release. ?I ?would >> like to modularize the current code base and apply some of the ideas >> that have emerged around Richard's "biojava 3" code. In principle the >> idea is that all changes should be backwards compatible with the >> interfaces provided by the current biojava 1.7 release. ?Backwards >> compatibility shall only be broken if the functionality is being >> replaced with something that works better, and gets documented >> accordingly. For the build functionality I would suggest to stick with >> what Richard's biojava 3 code base already is providing. Since we will >> try to be backwards compatible all code development should be part of >> the biojava-trunk and the first step will be to move the ant-build >> scripts to a maven build process. Following this procedure will allow >> to use e.g. the code refactoring tools provided by Eclipse, which >> should come in handy. >> >> The modules I would like to see should provide self-contained >> functionality and cross dependencies should be restricted to a >> minimum. I would suggest to have the following modules: >> >> biojava-core: Contains everything that can not easily be modularized >> or nobody volunteers to become a module maintainer. >> biojava-phylogeny: Scooter expressed some interested to provide such a >> module and become package maintainer for it. >> biojava-structure: Everything protein structure related. I would be >> package maintainer. >> biojava-blast: Blast parsing is a frequently requested functionality >> and it would be good to have this code self-contained. A package >> maintainer for this still will need to be nominated at a later stage. >> Any suggestions for other modules? >> >> Let me know what you think about this. >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure under > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivery of the message to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. If you > have received this communication in error, please notify the sender > immediately by e-mail and delete the material from any computer. ?Thank you. From HWillis at scripps.edu Tue May 12 20:13:45 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 12 May 2009 20:13:45 -0400 Subject: [Biojava-dev] Plans for next biojava release - modularization References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu><061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu> <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C855@FLMAIL1.fl.ad.scripps.edu> Andreas The goal for BioJava could be to provide a wrapper for the http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp java code so that inputs/outputs are BioJava. I think they are using Axis for the client web services code. If BioJava 3 is going to be Java 6 minimum then it is easier to use the Java 6 SOAP processing capabilities by pointing to the WSDL code and generating the Java code for the client side. This cuts down on the additional external 3rd parties that are required. I try to stay out of the legacy file parsing business whenever possible. Scooter -----Original Message----- From: andreas.prlic at gmail.com on behalf of Andreas Prlic Sent: Tue 5/12/2009 7:59 PM To: Scooter Willis Cc: biojava-dev Subject: Re: [Biojava-dev] Plans for next biojava release - modularization Hi Scooter, about your suggestion for the blast webservice client code: In principle I like the idea and we have had questions on the mailing list regarding this in the past. Only thing is I think there is already some client code in java available: http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp but I am not sure how good that Java client library is.... Besides this, there is the need for work on our blast parser library and if you are interested in working on that you are welcome. As I mentioned, I think this should become its own module, due to the popularity of that code. Andreas On Tue, May 12, 2009 at 6:34 AM, Scooter Willis wrote: > Mark > > > > It is a challenge on knowing where to draw the line. Allowing both options > is a reasonable approach. The implementation of the algorithm is key to > allow it to be multi-threaded or being able to run in parallel. One approach > is to provide a standard interface such as process() would wait for the > result/return value and run in the parent thread. To run the algorithm in a > thread you can have a startProcess() where you can add yourself as a > progress listener and when complete() method is called you can call > getResults(). You can then also have the corresponding stopProcess() which > would set an internal value to cause all threads to quit. ?Lots of ways to > tackle the problem the key is to start talking about it and at minimum take > advantage of multiple-cores where the external code can set the number of > cores to use. You can get a dual quad core machine these days for < $1000 > but most software implementations are not designed to take advantage of it. > > > > The real question is what exists today in the BioJava API that is considered > long running in normal use case and thus is a candidate to be run in > parallel. It may not be an issue in existing BioJava code. When I first > started using BioJava I went looking for BLAST code only to find a BLAST > parser. I wanted to do a Multiple Sequence Alignment and turns out that > Biojava code calls CLUSTALW as an external processor under the covers. ?I > also needed code to construct trees from an MSA and found the summer of code > project that was only focused on representing the tree. > > > > It would be nice to have a BLAST implementation in Java optimized to run on > a cluster but who has time to rewrite BLAST in Java when you can do BLAST > search via the web and focus on parsing the results. BioJava needs a BLAST > API that makes a web services call to an external service and gets returns > structured results in core BioJava structures. Probably not difficult to do > a Java version of CLUSTALW but again we can push the work out to > http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results > back returned in BioJava structures. > > > > I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web > service -> BioJava code. I haven't done the research but it seems that > http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to > expose common biology ?computational services. If multiple external services > are offering BLAST via web services where each picked a different > implementation then BioJava could provide abstraction to different services. > > > > Thanks > > Scooter > > > > From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] > Sent: Tuesday, May 12, 2009 1:27 AM > To: Scooter Willis > Cc: Andreas Prlic; biojava-dev > Subject: Re: [Biojava-dev] Plans for next biojava release - modularization > > > > Hi - > > This was one thing we discussed previously with respect to biojava 3. > ?Generally I support the idea because almost all computers are now > multi-core and as you say cloud or utility computing is already a reality. > > However, I tend to think that biojava should not control threading or > concurrency. This should be done by the developer. This is because sometimes > mutithreading can be fast on a slow computer but slow on a fast computer > (due to the overhead in spawning threads) so programs need to be tunable. > Also Java app servers and things like Sun Grid Engine, EC2 etc don't like > people attempting to control their own threads. ?What BioJava should do is > expose granular and thread-safe operations that can be threaded or form > discrete tasks on a utility grid or complete in SessionBeans on an App > server. ?For example it would be better if BioJava had a single threaded > method to calculate the GC of a single sequence rather than a multi-threaded > method that calculates the GC of multiple sequences. ?This would let the > developer make a multithreaded version if desired or distribute multiple > tasks based on the single threaded version to a compute cloud (and let the > cloud manage all the tasks). > > Possibly the best situation would be to have the single threaded fine grain > operations that let developers or grid engines control threading and then > higher level APIs that do it for you (or good cookbook examples that show > you how to do it). ?Another idea that was discussed was the use of > properties files to allow people to set how many CPUs they wanted to make > available to the JVM or name packages that can or cannot use threading. > > Finally, there are lots of times when it is highly desirable to use Java > beans because they play well with dozens of Java api's however beans don't > work well with threads because they have public setter methods. ?I would > like to see a lot more bean use in a future BioJava because it would make > life so much easier but a lot of care would need to be taken to make sure > thread safety is preserved. ?There are many patterns that can be used such > as synchronization locks etc to make things thread safe so I think this can > be achieved as long as we are disciplined and consider that all methods may > be used in a multi-threaded application (even if we write the method as a > single thread). ?If there are code checkers that make suggestions on thread > safety it would be great to have these as part of the standard build > process. ?Good documentation would go a long way as well. ?Are there unit > test patterns that can catch these problems as well? ?Suggestions would be > great. > > Progress Listener patterns are good but it depends on the situation and > might be better handled in high level APIs or left to the developer. ?For > example in your NJ code a progress listener would be good if someone fed > 1000 sequences into the method but not if they only put in 10. Also code > running on an old machine might need a progress listener but the same > problem on a new machine may complete almost instantly. ?Probably a > pluggable listener would be the way to go. ?Also it might be possible to do > this using the new JDK APIs that let you take a peek at the stack trace. > Even if your NJ method didn't allow for a progress listener a developer > could still make one by looking at the method calls in the stack. As long as > your NJ method called other methods internally for each sequence (quite > likely) it would be possible to observe the cycle of method calls from the > stack. ?This might make it possible to have a very general BioJava progress > listener that can be told to count the number of times a method is called in > the stack. The name of the method would be the argument. ?If the application > runs in a Java App server you can also do this very easily with a method > Interceptor. > > - Mark > > biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > >> Andreas >> >> Another theme that should be considered is providing a multi-thread >> version of any module with long run time. This would have a couple >> elements. A progress listener interface should be standard where core >> code would update progress messages to listeners that can be used by >> external code to display feedback to the user. I did this with the >> Neighbor Joining code for tree construction and it provides needed >> feedback in a GUI. If not the user gets frustrated because they don't >> know the code they are about to execute may take 10 minutes or 8 hours >> to complete and they think the software is not working. The reverse is >> also true for canceling an operation where you want to have core code >> stop processing a long running loop. Once the code has completed then >> the listener interface for process complete is called allowing the next >> step in the external code to continue. The developer would have the >> choice to call the "process" method or run it in a thread and wait for >> the callback complete method to be called. >> >> This is the first step in the ability to have the core/long running >> processes take advantage of multiple threads to complete the >> computational task faster. Not all code can be parallelized easily but >> if the algorithm can take advantage of running in parallel then it >> should. This then opens up a couple of cloud computing frameworks that >> extend the multi-threaded concepts in Java across a cluster >> http://www.terracotta.org/. If we put an emphasis on having code that >> runs well in a thread we are one step closer to an architecture that can >> run in a cloud. The computational problems are only going to get bigger >> and with Amazon EC2 and http://www.eucalyptus.com/ approaches >> computational IO cycles are going to be cheap as long as the >> software/libraries can easily take advantage of it. >> >> Thanks >> >> Scooter >> >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas >> Prlic >> Sent: Monday, May 11, 2009 12:27 AM >> To: biojava-dev >> Subject: [Biojava-dev] Plans for next biojava release - modularization >> >> Hi biojava-devs, >> >> It is time to start working on the next biojava release. ?I ?would >> like to modularize the current code base and apply some of the ideas >> that have emerged around Richard's "biojava 3" code. In principle the >> idea is that all changes should be backwards compatible with the >> interfaces provided by the current biojava 1.7 release. ?Backwards >> compatibility shall only be broken if the functionality is being >> replaced with something that works better, and gets documented >> accordingly. For the build functionality I would suggest to stick with >> what Richard's biojava 3 code base already is providing. Since we will >> try to be backwards compatible all code development should be part of >> the biojava-trunk and the first step will be to move the ant-build >> scripts to a maven build process. Following this procedure will allow >> to use e.g. the code refactoring tools provided by Eclipse, which >> should come in handy. >> >> The modules I would like to see should provide self-contained >> functionality and cross dependencies should be restricted to a >> minimum. I would suggest to have the following modules: >> >> biojava-core: Contains everything that can not easily be modularized >> or nobody volunteers to become a module maintainer. >> biojava-phylogeny: Scooter expressed some interested to provide such a >> module and become package maintainer for it. >> biojava-structure: Everything protein structure related. I would be >> package maintainer. >> biojava-blast: Blast parsing is a frequently requested functionality >> and it would be good to have this code self-contained. A package >> maintainer for this still will need to be nominated at a later stage. >> Any suggestions for other modules? >> >> Let me know what you think about this. >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure under > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivery of the message to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. If you > have received this communication in error, please notify the sender > immediately by e-mail and delete the material from any computer. ?Thank you. From mark.schreiber at novartis.com Tue May 12 20:09:31 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 13 May 2009 08:09:31 +0800 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com> Message-ID: A while back I gave Richard some code that uses JAXB to objectify (and deobjectify) BLAST XML output. This might be useful for parsing BLAST results from the webservices which normally use BLAST XML. I could probably dig it up again if needed (it was autogenerated anyway). It would probably be a good object model for BLAST output if people want to parse other types of BLAST output (such as flatfile, but who would want to do that!). The BLAST XML seems to accommodate strange flavours of BLAST such as PSI-BLAST etc and also has been much more stable than the default flat file output. - Mark Andreas Prlic Sent by: biojava-dev-bounces at lists.open-bio.org 05/13/2009 08:02 AM To Scooter Willis cc biojava-dev Subject Re: [Biojava-dev] Plans for next biojava release - modularization Hi Scooter, about your suggestion for the blast webservice client code: In principle I like the idea and we have had questions on the mailing list regarding this in the past. Only thing is I think there is already some client code in java available: http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp but I am not sure how good that Java client library is.... Besides this, there is the need for work on our blast parser library and if you are interested in working on that you are welcome. As I mentioned, I think this should become its own module, due to the popularity of that code. Andreas On Tue, May 12, 2009 at 6:34 AM, Scooter Willis wrote: > Mark > > > > It is a challenge on knowing where to draw the line. Allowing both options > is a reasonable approach. The implementation of the algorithm is key to > allow it to be multi-threaded or being able to run in parallel. One approach > is to provide a standard interface such as process() would wait for the > result/return value and run in the parent thread. To run the algorithm in a > thread you can have a startProcess() where you can add yourself as a > progress listener and when complete() method is called you can call > getResults(). You can then also have the corresponding stopProcess() which > would set an internal value to cause all threads to quit. Lots of ways to > tackle the problem the key is to start talking about it and at minimum take > advantage of multiple-cores where the external code can set the number of > cores to use. You can get a dual quad core machine these days for < $1000 > but most software implementations are not designed to take advantage of it. > > > > The real question is what exists today in the BioJava API that is considered > long running in normal use case and thus is a candidate to be run in > parallel. It may not be an issue in existing BioJava code. When I first > started using BioJava I went looking for BLAST code only to find a BLAST > parser. I wanted to do a Multiple Sequence Alignment and turns out that > Biojava code calls CLUSTALW as an external processor under the covers. I > also needed code to construct trees from an MSA and found the summer of code > project that was only focused on representing the tree. > > > > It would be nice to have a BLAST implementation in Java optimized to run on > a cluster but who has time to rewrite BLAST in Java when you can do BLAST > search via the web and focus on parsing the results. BioJava needs a BLAST > API that makes a web services call to an external service and gets returns > structured results in core BioJava structures. Probably not difficult to do > a Java version of CLUSTALW but again we can push the work out to > http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results > back returned in BioJava structures. > > > > I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web > service -> BioJava code. I haven?t done the research but it seems that > http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to > expose common biology computational services. If multiple external services > are offering BLAST via web services where each picked a different > implementation then BioJava could provide abstraction to different services. > > > > Thanks > > Scooter > > > > From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] > Sent: Tuesday, May 12, 2009 1:27 AM > To: Scooter Willis > Cc: Andreas Prlic; biojava-dev > Subject: Re: [Biojava-dev] Plans for next biojava release - modularization > > > > Hi - > > This was one thing we discussed previously with respect to biojava 3. > Generally I support the idea because almost all computers are now > multi-core and as you say cloud or utility computing is already a reality. > > However, I tend to think that biojava should not control threading or > concurrency. This should be done by the developer. This is because sometimes > mutithreading can be fast on a slow computer but slow on a fast computer > (due to the overhead in spawning threads) so programs need to be tunable. > Also Java app servers and things like Sun Grid Engine, EC2 etc don't like > people attempting to control their own threads. What BioJava should do is > expose granular and thread-safe operations that can be threaded or form > discrete tasks on a utility grid or complete in SessionBeans on an App > server. For example it would be better if BioJava had a single threaded > method to calculate the GC of a single sequence rather than a multi-threaded > method that calculates the GC of multiple sequences. This would let the > developer make a multithreaded version if desired or distribute multiple > tasks based on the single threaded version to a compute cloud (and let the > cloud manage all the tasks). > > Possibly the best situation would be to have the single threaded fine grain > operations that let developers or grid engines control threading and then > higher level APIs that do it for you (or good cookbook examples that show > you how to do it). Another idea that was discussed was the use of > properties files to allow people to set how many CPUs they wanted to make > available to the JVM or name packages that can or cannot use threading. > > Finally, there are lots of times when it is highly desirable to use Java > beans because they play well with dozens of Java api's however beans don't > work well with threads because they have public setter methods. I would > like to see a lot more bean use in a future BioJava because it would make > life so much easier but a lot of care would need to be taken to make sure > thread safety is preserved. There are many patterns that can be used such > as synchronization locks etc to make things thread safe so I think this can > be achieved as long as we are disciplined and consider that all methods may > be used in a multi-threaded application (even if we write the method as a > single thread). If there are code checkers that make suggestions on thread > safety it would be great to have these as part of the standard build > process. Good documentation would go a long way as well. Are there unit > test patterns that can catch these problems as well? Suggestions would be > great. > > Progress Listener patterns are good but it depends on the situation and > might be better handled in high level APIs or left to the developer. For > example in your NJ code a progress listener would be good if someone fed > 1000 sequences into the method but not if they only put in 10. Also code > running on an old machine might need a progress listener but the same > problem on a new machine may complete almost instantly. Probably a > pluggable listener would be the way to go. Also it might be possible to do > this using the new JDK APIs that let you take a peek at the stack trace. > Even if your NJ method didn't allow for a progress listener a developer > could still make one by looking at the method calls in the stack. As long as > your NJ method called other methods internally for each sequence (quite > likely) it would be possible to observe the cycle of method calls from the > stack. This might make it possible to have a very general BioJava progress > listener that can be told to count the number of times a method is called in > the stack. The name of the method would be the argument. If the application > runs in a Java App server you can also do this very easily with a method > Interceptor. > > - Mark > > biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > >> Andreas >> >> Another theme that should be considered is providing a multi-thread >> version of any module with long run time. This would have a couple >> elements. A progress listener interface should be standard where core >> code would update progress messages to listeners that can be used by >> external code to display feedback to the user. I did this with the >> Neighbor Joining code for tree construction and it provides needed >> feedback in a GUI. If not the user gets frustrated because they don't >> know the code they are about to execute may take 10 minutes or 8 hours >> to complete and they think the software is not working. The reverse is >> also true for canceling an operation where you want to have core code >> stop processing a long running loop. Once the code has completed then >> the listener interface for process complete is called allowing the next >> step in the external code to continue. The developer would have the >> choice to call the "process" method or run it in a thread and wait for >> the callback complete method to be called. >> >> This is the first step in the ability to have the core/long running >> processes take advantage of multiple threads to complete the >> computational task faster. Not all code can be parallelized easily but >> if the algorithm can take advantage of running in parallel then it >> should. This then opens up a couple of cloud computing frameworks that >> extend the multi-threaded concepts in Java across a cluster >> http://www.terracotta.org/. If we put an emphasis on having code that >> runs well in a thread we are one step closer to an architecture that can >> run in a cloud. The computational problems are only going to get bigger >> and with Amazon EC2 and http://www.eucalyptus.com/ approaches >> computational IO cycles are going to be cheap as long as the >> software/libraries can easily take advantage of it. >> >> Thanks >> >> Scooter >> >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas >> Prlic >> Sent: Monday, May 11, 2009 12:27 AM >> To: biojava-dev >> Subject: [Biojava-dev] Plans for next biojava release - modularization >> >> Hi biojava-devs, >> >> It is time to start working on the next biojava release. I would >> like to modularize the current code base and apply some of the ideas >> that have emerged around Richard's "biojava 3" code. In principle the >> idea is that all changes should be backwards compatible with the >> interfaces provided by the current biojava 1.7 release. Backwards >> compatibility shall only be broken if the functionality is being >> replaced with something that works better, and gets documented >> accordingly. For the build functionality I would suggest to stick with >> what Richard's biojava 3 code base already is providing. Since we will >> try to be backwards compatible all code development should be part of >> the biojava-trunk and the first step will be to move the ant-build >> scripts to a maven build process. Following this procedure will allow >> to use e.g. the code refactoring tools provided by Eclipse, which >> should come in handy. >> >> The modules I would like to see should provide self-contained >> functionality and cross dependencies should be restricted to a >> minimum. I would suggest to have the following modules: >> >> biojava-core: Contains everything that can not easily be modularized >> or nobody volunteers to become a module maintainer. >> biojava-phylogeny: Scooter expressed some interested to provide such a >> module and become package maintainer for it. >> biojava-structure: Everything protein structure related. I would be >> package maintainer. >> biojava-blast: Blast parsing is a frequently requested functionality >> and it would be good to have this code self-contained. A package >> maintainer for this still will need to be nominated at a later stage. >> Any suggestions for other modules? >> >> Let me know what you think about this. >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure under > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivery of the message to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. If you > have received this communication in error, please notify the sender > immediately by e-mail and delete the material from any computer. Thank you. _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Tue May 12 20:23:30 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 12 May 2009 20:23:30 -0400 Subject: [Biojava-dev] Plans for next biojava release - modularization References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu><061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu> <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C855@FLMAIL1.fl.ad.scripps.edu> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C858@FLMAIL1.fl.ad.scripps.edu> Andreas A follow up point related to Mark's comment could be that parsing blast output would not be required or less important if we provide a clean BioJava API to make the web service call with BioJava data structure inputs and give back BioJava data structure outputs. This saves the step of the user doing the web query, file save, parse etc. It would be interesting to know how many users run their own BLAST server for privacy reasons. Scooter -----Original Message----- From: Scooter Willis Sent: Tue 5/12/2009 8:13 PM To: Andreas Prlic Cc: biojava-dev Subject: RE: [Biojava-dev] Plans for next biojava release - modularization Andreas The goal for BioJava could be to provide a wrapper for the http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp java code so that inputs/outputs are BioJava. I think they are using Axis for the client web services code. If BioJava 3 is going to be Java 6 minimum then it is easier to use the Java 6 SOAP processing capabilities by pointing to the WSDL code and generating the Java code for the client side. This cuts down on the additional external 3rd parties that are required. I try to stay out of the legacy file parsing business whenever possible. Scooter -----Original Message----- From: andreas.prlic at gmail.com on behalf of Andreas Prlic Sent: Tue 5/12/2009 7:59 PM To: Scooter Willis Cc: biojava-dev Subject: Re: [Biojava-dev] Plans for next biojava release - modularization Hi Scooter, about your suggestion for the blast webservice client code: In principle I like the idea and we have had questions on the mailing list regarding this in the past. Only thing is I think there is already some client code in java available: http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp but I am not sure how good that Java client library is.... Besides this, there is the need for work on our blast parser library and if you are interested in working on that you are welcome. As I mentioned, I think this should become its own module, due to the popularity of that code. Andreas On Tue, May 12, 2009 at 6:34 AM, Scooter Willis wrote: > Mark > > > > It is a challenge on knowing where to draw the line. Allowing both options > is a reasonable approach. The implementation of the algorithm is key to > allow it to be multi-threaded or being able to run in parallel. One approach > is to provide a standard interface such as process() would wait for the > result/return value and run in the parent thread. To run the algorithm in a > thread you can have a startProcess() where you can add yourself as a > progress listener and when complete() method is called you can call > getResults(). You can then also have the corresponding stopProcess() which > would set an internal value to cause all threads to quit. ?Lots of ways to > tackle the problem the key is to start talking about it and at minimum take > advantage of multiple-cores where the external code can set the number of > cores to use. You can get a dual quad core machine these days for < $1000 > but most software implementations are not designed to take advantage of it. > > > > The real question is what exists today in the BioJava API that is considered > long running in normal use case and thus is a candidate to be run in > parallel. It may not be an issue in existing BioJava code. When I first > started using BioJava I went looking for BLAST code only to find a BLAST > parser. I wanted to do a Multiple Sequence Alignment and turns out that > Biojava code calls CLUSTALW as an external processor under the covers. ?I > also needed code to construct trees from an MSA and found the summer of code > project that was only focused on representing the tree. > > > > It would be nice to have a BLAST implementation in Java optimized to run on > a cluster but who has time to rewrite BLAST in Java when you can do BLAST > search via the web and focus on parsing the results. BioJava needs a BLAST > API that makes a web services call to an external service and gets returns > structured results in core BioJava structures. Probably not difficult to do > a Java version of CLUSTALW but again we can push the work out to > http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results > back returned in BioJava structures. > > > > I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web > service -> BioJava code. I haven't done the research but it seems that > http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to > expose common biology ?computational services. If multiple external services > are offering BLAST via web services where each picked a different > implementation then BioJava could provide abstraction to different services. > > > > Thanks > > Scooter > > > > From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] > Sent: Tuesday, May 12, 2009 1:27 AM > To: Scooter Willis > Cc: Andreas Prlic; biojava-dev > Subject: Re: [Biojava-dev] Plans for next biojava release - modularization > > > > Hi - > > This was one thing we discussed previously with respect to biojava 3. > ?Generally I support the idea because almost all computers are now > multi-core and as you say cloud or utility computing is already a reality. > > However, I tend to think that biojava should not control threading or > concurrency. This should be done by the developer. This is because sometimes > mutithreading can be fast on a slow computer but slow on a fast computer > (due to the overhead in spawning threads) so programs need to be tunable. > Also Java app servers and things like Sun Grid Engine, EC2 etc don't like > people attempting to control their own threads. ?What BioJava should do is > expose granular and thread-safe operations that can be threaded or form > discrete tasks on a utility grid or complete in SessionBeans on an App > server. ?For example it would be better if BioJava had a single threaded > method to calculate the GC of a single sequence rather than a multi-threaded > method that calculates the GC of multiple sequences. ?This would let the > developer make a multithreaded version if desired or distribute multiple > tasks based on the single threaded version to a compute cloud (and let the > cloud manage all the tasks). > > Possibly the best situation would be to have the single threaded fine grain > operations that let developers or grid engines control threading and then > higher level APIs that do it for you (or good cookbook examples that show > you how to do it). ?Another idea that was discussed was the use of > properties files to allow people to set how many CPUs they wanted to make > available to the JVM or name packages that can or cannot use threading. > > Finally, there are lots of times when it is highly desirable to use Java > beans because they play well with dozens of Java api's however beans don't > work well with threads because they have public setter methods. ?I would > like to see a lot more bean use in a future BioJava because it would make > life so much easier but a lot of care would need to be taken to make sure > thread safety is preserved. ?There are many patterns that can be used such > as synchronization locks etc to make things thread safe so I think this can > be achieved as long as we are disciplined and consider that all methods may > be used in a multi-threaded application (even if we write the method as a > single thread). ?If there are code checkers that make suggestions on thread > safety it would be great to have these as part of the standard build > process. ?Good documentation would go a long way as well. ?Are there unit > test patterns that can catch these problems as well? ?Suggestions would be > great. > > Progress Listener patterns are good but it depends on the situation and > might be better handled in high level APIs or left to the developer. ?For > example in your NJ code a progress listener would be good if someone fed > 1000 sequences into the method but not if they only put in 10. Also code > running on an old machine might need a progress listener but the same > problem on a new machine may complete almost instantly. ?Probably a > pluggable listener would be the way to go. ?Also it might be possible to do > this using the new JDK APIs that let you take a peek at the stack trace. > Even if your NJ method didn't allow for a progress listener a developer > could still make one by looking at the method calls in the stack. As long as > your NJ method called other methods internally for each sequence (quite > likely) it would be possible to observe the cycle of method calls from the > stack. ?This might make it possible to have a very general BioJava progress > listener that can be told to count the number of times a method is called in > the stack. The name of the method would be the argument. ?If the application > runs in a Java App server you can also do this very easily with a method > Interceptor. > > - Mark > > biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > >> Andreas >> >> Another theme that should be considered is providing a multi-thread >> version of any module with long run time. This would have a couple >> elements. A progress listener interface should be standard where core >> code would update progress messages to listeners that can be used by >> external code to display feedback to the user. I did this with the >> Neighbor Joining code for tree construction and it provides needed >> feedback in a GUI. If not the user gets frustrated because they don't >> know the code they are about to execute may take 10 minutes or 8 hours >> to complete and they think the software is not working. The reverse is >> also true for canceling an operation where you want to have core code >> stop processing a long running loop. Once the code has completed then >> the listener interface for process complete is called allowing the next >> step in the external code to continue. The developer would have the >> choice to call the "process" method or run it in a thread and wait for >> the callback complete method to be called. >> >> This is the first step in the ability to have the core/long running >> processes take advantage of multiple threads to complete the >> computational task faster. Not all code can be parallelized easily but >> if the algorithm can take advantage of running in parallel then it >> should. This then opens up a couple of cloud computing frameworks that >> extend the multi-threaded concepts in Java across a cluster >> http://www.terracotta.org/. If we put an emphasis on having code that >> runs well in a thread we are one step closer to an architecture that can >> run in a cloud. The computational problems are only going to get bigger >> and with Amazon EC2 and http://www.eucalyptus.com/ approaches >> computational IO cycles are going to be cheap as long as the >> software/libraries can easily take advantage of it. >> >> Thanks >> >> Scooter >> >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas >> Prlic >> Sent: Monday, May 11, 2009 12:27 AM >> To: biojava-dev >> Subject: [Biojava-dev] Plans for next biojava release - modularization >> >> Hi biojava-devs, >> >> It is time to start working on the next biojava release. ?I ?would >> like to modularize the current code base and apply some of the ideas >> that have emerged around Richard's "biojava 3" code. In principle the >> idea is that all changes should be backwards compatible with the >> interfaces provided by the current biojava 1.7 release. ?Backwards >> compatibility shall only be broken if the functionality is being >> replaced with something that works better, and gets documented >> accordingly. For the build functionality I would suggest to stick with >> what Richard's biojava 3 code base already is providing. Since we will >> try to be backwards compatible all code development should be part of >> the biojava-trunk and the first step will be to move the ant-build >> scripts to a maven build process. Following this procedure will allow >> to use e.g. the code refactoring tools provided by Eclipse, which >> should come in handy. >> >> The modules I would like to see should provide self-contained >> functionality and cross dependencies should be restricted to a >> minimum. I would suggest to have the following modules: >> >> biojava-core: Contains everything that can not easily be modularized >> or nobody volunteers to become a module maintainer. >> biojava-phylogeny: Scooter expressed some interested to provide such a >> module and become package maintainer for it. >> biojava-structure: Everything protein structure related. I would be >> package maintainer. >> biojava-blast: Blast parsing is a frequently requested functionality >> and it would be good to have this code self-contained. A package >> maintainer for this still will need to be nominated at a later stage. >> Any suggestions for other modules? >> >> Let me know what you think about this. >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure under > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivery of the message to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. If you > have received this communication in error, please notify the sender > immediately by e-mail and delete the material from any computer. ?Thank you. From andreas at sdsc.edu Tue May 12 20:45:54 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 12 May 2009 17:45:54 -0700 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: References: <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com> Message-ID: <59a41c430905121745p7325d69dgf7e4d916746bf14d@mail.gmail.com> The point with the auto-generated code raises actually another question to me: How shall we deal with auto-generated code? I also have some code that is currently not part on BioJava, but it might be useful for other people: It allows to parse uniprot XML files and serialize / de-serialize the objects to a database using EJBs, hibernate and the uniprot XML files. How far should biojava go in supporting such auto generated or semi-auto generated code? A On Tue, May 12, 2009 at 5:09 PM, wrote: > > A while back I gave Richard some code that uses JAXB to objectify (and > deobjectify) BLAST XML output. This might be useful for parsing BLAST > results from the webservices which normally use BLAST XML. I could probably > dig it up again if needed (it was autogenerated anyway). > > It would probably be a good object model for BLAST output if people want to > parse other types of BLAST output (such as flatfile, but who would want to > do that!). ?The BLAST XML seems to accommodate strange flavours of BLAST > such as PSI-BLAST etc and also has been much more stable than the default > flat file output. > > - Mark > > > > Andreas Prlic > Sent by: biojava-dev-bounces at lists.open-bio.org > > 05/13/2009 08:02 AM > > To > Scooter Willis > cc > biojava-dev > Subject > Re: [Biojava-dev] Plans for next biojava release - modularization > > > > > Hi Scooter, > > about your suggestion for the blast webservice client code: In > principle I like the idea and we have had questions on the mailing > list regarding this in the past. Only thing is I think there is > already some client code in java available: > http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp > but I am not sure how good that Java client library is.... > > Besides this, there is the need for work on our blast parser library > and if you are interested in working on that you are welcome. As I > mentioned, I think this should become its own module, due to the > popularity of that code. > > Andreas > > > > > On Tue, May 12, 2009 at 6:34 AM, Scooter Willis wrote: >> Mark >> >> >> >> It is a challenge on knowing where to draw the line. Allowing both options >> is a reasonable approach. The implementation of the algorithm is key to >> allow it to be multi-threaded or being able to run in parallel. One >> approach >> is to provide a standard interface such as process() would wait for the >> result/return value and run in the parent thread. To run the algorithm in >> a >> thread you can have a startProcess() where you can add yourself as a >> progress listener and when complete() method is called you can call >> getResults(). You can then also have the corresponding stopProcess() which >> would set an internal value to cause all threads to quit. ?Lots of ways to >> tackle the problem the key is to start talking about it and at minimum >> take >> advantage of multiple-cores where the external code can set the number of >> cores to use. You can get a dual quad core machine these days for < $1000 >> but most software implementations are not designed to take advantage of >> it. >> >> >> >> The real question is what exists today in the BioJava API that is >> considered >> long running in normal use case and thus is a candidate to be run in >> parallel. It may not be an issue in existing BioJava code. When I first >> started using BioJava I went looking for BLAST code only to find a BLAST >> parser. I wanted to do a Multiple Sequence Alignment and turns out that >> Biojava code calls CLUSTALW as an external processor under the covers. ?I >> also needed code to construct trees from an MSA and found the summer of >> code >> project that was only focused on representing the tree. >> >> >> >> It would be nice to have a BLAST implementation in Java optimized to run >> on >> a cluster but who has time to rewrite BLAST in Java when you can do BLAST >> search via the web and focus on parsing the results. BioJava needs a BLAST >> API that makes a web services call to an external service and gets returns >> structured results in core BioJava structures. Probably not difficult to >> do >> a Java version of CLUSTALW but again we can push the work out to >> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the >> results >> back returned in BioJava structures. >> >> >> >> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web >> service -> BioJava code. I haven?t done the research but it seems that >> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to >> expose common biology ?computational services. If multiple external >> services >> are offering BLAST via web services where each picked a different >> implementation then BioJava could provide abstraction to different >> services. >> >> >> >> Thanks >> >> Scooter >> >> >> >> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] >> Sent: Tuesday, May 12, 2009 1:27 AM >> To: Scooter Willis >> Cc: Andreas Prlic; biojava-dev >> Subject: Re: [Biojava-dev] Plans for next biojava release - modularization >> >> >> >> Hi - >> >> This was one thing we discussed previously with respect to biojava 3. >> ?Generally I support the idea because almost all computers are now >> multi-core and as you say cloud or utility computing is already a reality. >> >> However, I tend to think that biojava should not control threading or >> concurrency. This should be done by the developer. This is because >> sometimes >> mutithreading can be fast on a slow computer but slow on a fast computer >> (due to the overhead in spawning threads) so programs need to be tunable. >> Also Java app servers and things like Sun Grid Engine, EC2 etc don't like >> people attempting to control their own threads. ?What BioJava should do is >> expose granular and thread-safe operations that can be threaded or form >> discrete tasks on a utility grid or complete in SessionBeans on an App >> server. ?For example it would be better if BioJava had a single threaded >> method to calculate the GC of a single sequence rather than a >> multi-threaded >> method that calculates the GC of multiple sequences. ?This would let the >> developer make a multithreaded version if desired or distribute multiple >> tasks based on the single threaded version to a compute cloud (and let the >> cloud manage all the tasks). >> >> Possibly the best situation would be to have the single threaded fine >> grain >> operations that let developers or grid engines control threading and then >> higher level APIs that do it for you (or good cookbook examples that show >> you how to do it). ?Another idea that was discussed was the use of >> properties files to allow people to set how many CPUs they wanted to make >> available to the JVM or name packages that can or cannot use threading. >> >> Finally, there are lots of times when it is highly desirable to use Java >> beans because they play well with dozens of Java api's however beans don't >> work well with threads because they have public setter methods. ?I would >> like to see a lot more bean use in a future BioJava because it would make >> life so much easier but a lot of care would need to be taken to make sure >> thread safety is preserved. ?There are many patterns that can be used such >> as synchronization locks etc to make things thread safe so I think this >> can >> be achieved as long as we are disciplined and consider that all methods >> may >> be used in a multi-threaded application (even if we write the method as a >> single thread). ?If there are code checkers that make suggestions on >> thread >> safety it would be great to have these as part of the standard build >> process. ?Good documentation would go a long way as well. ?Are there unit >> test patterns that can catch these problems as well? ?Suggestions would be >> great. >> >> Progress Listener patterns are good but it depends on the situation and >> might be better handled in high level APIs or left to the developer. ?For >> example in your NJ code a progress listener would be good if someone fed >> 1000 sequences into the method but not if they only put in 10. Also code >> running on an old machine might need a progress listener but the same >> problem on a new machine may complete almost instantly. ?Probably a >> pluggable listener would be the way to go. ?Also it might be possible to >> do >> this using the new JDK APIs that let you take a peek at the stack trace. >> Even if your NJ method didn't allow for a progress listener a developer >> could still make one by looking at the method calls in the stack. As long >> as >> your NJ method called other methods internally for each sequence (quite >> likely) it would be possible to observe the cycle of method calls from the >> stack. ?This might make it possible to have a very general BioJava >> progress >> listener that can be told to count the number of times a method is called >> in >> the stack. The name of the method would be the argument. ?If the >> application >> runs in a Java App server you can also do this very easily with a method >> Interceptor. >> >> - Mark >> >> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: >> >>> Andreas >>> >>> Another theme that should be considered is providing a multi-thread >>> version of any module with long run time. This would have a couple >>> elements. A progress listener interface should be standard where core >>> code would update progress messages to listeners that can be used by >>> external code to display feedback to the user. I did this with the >>> Neighbor Joining code for tree construction and it provides needed >>> feedback in a GUI. If not the user gets frustrated because they don't >>> know the code they are about to execute may take 10 minutes or 8 hours >>> to complete and they think the software is not working. The reverse is >>> also true for canceling an operation where you want to have core code >>> stop processing a long running loop. Once the code has completed then >>> the listener interface for process complete is called allowing the next >>> step in the external code to continue. The developer would have the >>> choice to call the "process" method or run it in a thread and wait for >>> the callback complete method to be called. >>> >>> This is the first step in the ability to have the core/long running >>> processes take advantage of multiple threads to complete the >>> computational task faster. Not all code can be parallelized easily but >>> if the algorithm can take advantage of running in parallel then it >>> should. This then opens up a couple of cloud computing frameworks that >>> extend the multi-threaded concepts in Java across a cluster >>> http://www.terracotta.org/. If we put an emphasis on having code that >>> runs well in a thread we are one step closer to an architecture that can >>> run in a cloud. The computational problems are only going to get bigger >>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches >>> computational IO cycles are going to be cheap as long as the >>> software/libraries can easily take advantage of it. >>> >>> Thanks >>> >>> Scooter >>> >>> -----Original Message----- >>> From: biojava-dev-bounces at lists.open-bio.org >>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas >>> Prlic >>> Sent: Monday, May 11, 2009 12:27 AM >>> To: biojava-dev >>> Subject: [Biojava-dev] Plans for next biojava release - modularization >>> >>> Hi biojava-devs, >>> >>> It is time to start working on the next biojava release. ?I ?would >>> like to modularize the current code base and apply some of the ideas >>> that have emerged around Richard's "biojava 3" code. In principle the >>> idea is that all changes should be backwards compatible with the >>> interfaces provided by the current biojava 1.7 release. ?Backwards >>> compatibility shall only be broken if the functionality is being >>> replaced with something that works better, and gets documented >>> accordingly. For the build functionality I would suggest to stick with >>> what Richard's biojava 3 code base already is providing. Since we will >>> try to be backwards compatible all code development should be part of >>> the biojava-trunk and the first step will be to move the ant-build >>> scripts to a maven build process. Following this procedure will allow >>> to use e.g. the code refactoring tools provided by Eclipse, which >>> should come in handy. >>> >>> The modules I would like to see should provide self-contained >>> functionality and cross dependencies should be restricted to a >>> minimum. I would suggest to have the following modules: >>> >>> biojava-core: Contains everything that can not easily be modularized >>> or nobody volunteers to become a module maintainer. >>> biojava-phylogeny: Scooter expressed some interested to provide such a >>> module and become package maintainer for it. >>> biojava-structure: Everything protein structure related. I would be >>> package maintainer. >>> biojava-blast: Blast parsing is a frequently requested functionality >>> and it would be good to have this code self-contained. A package >>> maintainer for this still will need to be nominated at a later stage. >>> Any suggestions for other modules? >>> >>> Let me know what you think about this. >>> >>> Andreas >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _________________________ >> >> CONFIDENTIALITY NOTICE >> >> The information contained in this e-mail message is intended only for the >> exclusive use of the individual or entity named above and may contain >> information that is privileged, confidential or exempt from disclosure >> under >> applicable law. If the reader of this message is not the intended >> recipient, >> or the employee or agent responsible for delivery of the message to the >> intended recipient, you are hereby notified that any dissemination, >> distribution or copying of this communication is strictly prohibited. If >> you >> have received this communication in error, please notify the sender >> immediately by e-mail and delete the material from any computer. ?Thank >> you. > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From mark.schreiber at novartis.com Tue May 12 22:15:27 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 13 May 2009 10:15:27 +0800 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <59a41c430905121745p7325d69dgf7e4d916746bf14d@mail.gmail.com> Message-ID: Hi - I think it depends if the code is going to be auto-generated at each build or only once. I have autogenerated Entity classes for BioSQL tables. My recommendation would be that these be used for JPA mapping to BioSQL from BioJava. I think these only need be generated once (unless the BioSQL schema changes), especially as the autogeneration didn't quite catch some of the subtleties of the schema. They can also be in their own module, not the core. Classes that map to XML or webservice clients can be autogenerated from XML schema, DTD or WSDL once or at every build (automatically from ANT and probably Maven). In these cases it may pay to do it with every build because these classes are completely boiler plate code and should never need to be manually modified. Also it means the code for these utility classes will never be in the code base and at will not be possible for someone to change it accidentally (and the code base will be smaller). Only the XSD or WSDL will be in subversion (and any higher level code that makes use of the boilerplate client code). Improvements in the boilerplate code or changes that come with updates to JAXB and similar will automatically appear at the next build (when we change JAXB versions). Conceptually the BLAST XML parsing module may consist of only the BLAST XSD (or DTD) and a high-level biojava class like the following: public interface BlastParser { public Serializable[] parseBlast(URL url){ Calls bioler plate code... } public Serializable[] parseBlast(String blastXMLOutput){ Calls bioler plate code... } } The code for the bit that does the JAXB marshalling etc could be generated at build time. The Serializable array would be the objects that JAXB generates. Probably they would be a more specific stub that implements serializable (eg BlastResult or similar depending on the XSD). I think it really comes down to a question of how much the generated code is boilerplate code that will never be changed. If it is not 'modifiable' then it can be generated at build. If the autogenerated code is an outline of a class where method bodies need to be filled in or customized then they should not be autogenerated at build time. A good example would be JUnit classes that can be autogenerated to give you a template that will compile and run but probably will not perform a sensible test. The developer of the test could autogenerate the template but would then need to make the test sensible. At that point the test should be in the code base and should not be regenerated at build time. - Mark biojava-dev-bounces at lists.open-bio.org wrote on 05/13/2009 08:45:54 AM: > The point with the auto-generated code raises actually another > question to me: How shall we deal with auto-generated code? > > I also have some code that is currently not part on BioJava, but it > might be useful for other people: It allows to parse uniprot XML files > and serialize / de-serialize the objects to a database using EJBs, > hibernate and the uniprot XML files. > > How far should biojava go in supporting such auto generated or > semi-auto generated code? > A > > > On Tue, May 12, 2009 at 5:09 PM, wrote: > > > > A while back I gave Richard some code that uses JAXB to objectify (and > > deobjectify) BLAST XML output. This might be useful for parsing BLAST > > results from the webservices which normally use BLAST XML. I could probably > > dig it up again if needed (it was autogenerated anyway). > > > > It would probably be a good object model for BLAST output if people want to > > parse other types of BLAST output (such as flatfile, but who would want to > > do that!). The BLAST XML seems to accommodate strange flavours of BLAST > > such as PSI-BLAST etc and also has been much more stable than the default > > flat file output. > > > > - Mark > > > > > > > > Andreas Prlic > > Sent by: biojava-dev-bounces at lists.open-bio.org > > > > 05/13/2009 08:02 AM > > > > To > > Scooter Willis > > cc > > biojava-dev > > Subject > > Re: [Biojava-dev] Plans for next biojava release - modularization > > > > > > > > > > Hi Scooter, > > > > about your suggestion for the blast webservice client code: In > > principle I like the idea and we have had questions on the mailing > > list regarding this in the past. Only thing is I think there is > > already some client code in java available: > > http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp > > but I am not sure how good that Java client library is.... > > > > Besides this, there is the need for work on our blast parser library > > and if you are interested in working on that you are welcome. As I > > mentioned, I think this should become its own module, due to the > > popularity of that code. > > > > Andreas > > > > > > > > > > On Tue, May 12, 2009 at 6:34 AM, Scooter Willis wrote: > >> Mark > >> > >> > >> > >> It is a challenge on knowing where to draw the line. Allowing both options > >> is a reasonable approach. The implementation of the algorithm is key to > >> allow it to be multi-threaded or being able to run in parallel. One > >> approach > >> is to provide a standard interface such as process() would wait for the > >> result/return value and run in the parent thread. To run the algorithm in > >> a > >> thread you can have a startProcess() where you can add yourself as a > >> progress listener and when complete() method is called you can call > >> getResults(). You can then also have the corresponding stopProcess() which > >> would set an internal value to cause all threads to quit. Lots of ways to > >> tackle the problem the key is to start talking about it and at minimum > >> take > >> advantage of multiple-cores where the external code can set the number of > >> cores to use. You can get a dual quad core machine these days for < $1000 > >> but most software implementations are not designed to take advantage of > >> it. > >> > >> > >> > >> The real question is what exists today in the BioJava API that is > >> considered > >> long running in normal use case and thus is a candidate to be run in > >> parallel. It may not be an issue in existing BioJava code. When I first > >> started using BioJava I went looking for BLAST code only to find a BLAST > >> parser. I wanted to do a Multiple Sequence Alignment and turns out that > >> Biojava code calls CLUSTALW as an external processor under the covers. I > >> also needed code to construct trees from an MSA and found the summer of > >> code > >> project that was only focused on representing the tree. > >> > >> > >> > >> It would be nice to have a BLAST implementation in Java optimized to run > >> on > >> a cluster but who has time to rewrite BLAST in Java when you can do BLAST > >> search via the web and focus on parsing the results. BioJava needs a BLAST > >> API that makes a web services call to an external service and gets returns > >> structured results in core BioJava structures. Probably not difficult to > >> do > >> a Java version of CLUSTALW but again we can push the work out to > >> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the > >> results > >> back returned in BioJava structures. > >> > >> > >> > >> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web > >> service -> BioJava code. I haven?t done the research but it seems that > >> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to > >> expose common biology computational services. If multiple external > >> services > >> are offering BLAST via web services where each picked a different > >> implementation then BioJava could provide abstraction to different > >> services. > >> > >> > >> > >> Thanks > >> > >> Scooter > >> > >> > >> > >> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] > >> Sent: Tuesday, May 12, 2009 1:27 AM > >> To: Scooter Willis > >> Cc: Andreas Prlic; biojava-dev > >> Subject: Re: [Biojava-dev] Plans for next biojava release - modularization > >> > >> > >> > >> Hi - > >> > >> This was one thing we discussed previously with respect to biojava 3. > >> Generally I support the idea because almost all computers are now > >> multi-core and as you say cloud or utility computing is already a reality. > >> > >> However, I tend to think that biojava should not control threading or > >> concurrency. This should be done by the developer. This is because > >> sometimes > >> mutithreading can be fast on a slow computer but slow on a fast computer > >> (due to the overhead in spawning threads) so programs need to be tunable. > >> Also Java app servers and things like Sun Grid Engine, EC2 etc don't like > >> people attempting to control their own threads. What BioJava should do is > >> expose granular and thread-safe operations that can be threaded or form > >> discrete tasks on a utility grid or complete in SessionBeans on an App > >> server. For example it would be better if BioJava had a single threaded > >> method to calculate the GC of a single sequence rather than a > >> multi-threaded > >> method that calculates the GC of multiple sequences. This would let the > >> developer make a multithreaded version if desired or distribute multiple > >> tasks based on the single threaded version to a compute cloud (and let the > >> cloud manage all the tasks). > >> > >> Possibly the best situation would be to have the single threaded fine > >> grain > >> operations that let developers or grid engines control threading and then > >> higher level APIs that do it for you (or good cookbook examples that show > >> you how to do it). Another idea that was discussed was the use of > >> properties files to allow people to set how many CPUs they wanted to make > >> available to the JVM or name packages that can or cannot use threading. > >> > >> Finally, there are lots of times when it is highly desirable to use Java > >> beans because they play well with dozens of Java api's however beans don't > >> work well with threads because they have public setter methods. I would > >> like to see a lot more bean use in a future BioJava because it would make > >> life so much easier but a lot of care would need to be taken to make sure > >> thread safety is preserved. There are many patterns that can be used such > >> as synchronization locks etc to make things thread safe so I think this > >> can > >> be achieved as long as we are disciplined and consider that all methods > >> may > >> be used in a multi-threaded application (even if we write the method as a > >> single thread). If there are code checkers that make suggestions on > >> thread > >> safety it would be great to have these as part of the standard build > >> process. Good documentation would go a long way as well. Are there unit > >> test patterns that can catch these problems as well? Suggestions would be > >> great. > >> > >> Progress Listener patterns are good but it depends on the situation and > >> might be better handled in high level APIs or left to the developer. For > >> example in your NJ code a progress listener would be good if someone fed > >> 1000 sequences into the method but not if they only put in 10. Also code > >> running on an old machine might need a progress listener but the same > >> problem on a new machine may complete almost instantly. Probably a > >> pluggable listener would be the way to go. Also it might be possible to > >> do > >> this using the new JDK APIs that let you take a peek at the stack trace. > >> Even if your NJ method didn't allow for a progress listener a developer > >> could still make one by looking at the method calls in the stack. As long > >> as > >> your NJ method called other methods internally for each sequence (quite > >> likely) it would be possible to observe the cycle of method calls from the > >> stack. This might make it possible to have a very general BioJava > >> progress > >> listener that can be told to count the number of times a method is called > >> in > >> the stack. The name of the method would be the argument. If the > >> application > >> runs in a Java App server you can also do this very easily with a method > >> Interceptor. > >> > >> - Mark > >> > >> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > >> > >>> Andreas > >>> > >>> Another theme that should be considered is providing a multi-thread > >>> version of any module with long run time. This would have a couple > >>> elements. A progress listener interface should be standard where core > >>> code would update progress messages to listeners that can be used by > >>> external code to display feedback to the user. I did this with the > >>> Neighbor Joining code for tree construction and it provides needed > >>> feedback in a GUI. If not the user gets frustrated because they don't > >>> know the code they are about to execute may take 10 minutes or 8 hours > >>> to complete and they think the software is not working. The reverse is > >>> also true for canceling an operation where you want to have core code > >>> stop processing a long running loop. Once the code has completed then > >>> the listener interface for process complete is called allowing the next > >>> step in the external code to continue. The developer would have the > >>> choice to call the "process" method or run it in a thread and wait for > >>> the callback complete method to be called. > >>> > >>> This is the first step in the ability to have the core/long running > >>> processes take advantage of multiple threads to complete the > >>> computational task faster. Not all code can be parallelized easily but > >>> if the algorithm can take advantage of running in parallel then it > >>> should. This then opens up a couple of cloud computing frameworks that > >>> extend the multi-threaded concepts in Java across a cluster > >>> http://www.terracotta.org/. If we put an emphasis on having code that > >>> runs well in a thread we are one step closer to an architecture that can > >>> run in a cloud. The computational problems are only going to get bigger > >>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches > >>> computational IO cycles are going to be cheap as long as the > >>> software/libraries can easily take advantage of it. > >>> > >>> Thanks > >>> > >>> Scooter > >>> > >>> -----Original Message----- > >>> From: biojava-dev-bounces at lists.open-bio.org > >>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas > >>> Prlic > >>> Sent: Monday, May 11, 2009 12:27 AM > >>> To: biojava-dev > >>> Subject: [Biojava-dev] Plans for next biojava release - modularization > >>> > >>> Hi biojava-devs, > >>> > >>> It is time to start working on the next biojava release. I would > >>> like to modularize the current code base and apply some of the ideas > >>> that have emerged around Richard's "biojava 3" code. In principle the > >>> idea is that all changes should be backwards compatible with the > >>> interfaces provided by the current biojava 1.7 release. Backwards > >>> compatibility shall only be broken if the functionality is being > >>> replaced with something that works better, and gets documented > >>> accordingly. For the build functionality I would suggest to stick with > >>> what Richard's biojava 3 code base already is providing. Since we will > >>> try to be backwards compatible all code development should be part of > >>> the biojava-trunk and the first step will be to move the ant-build > >>> scripts to a maven build process. Following this procedure will allow > >>> to use e.g. the code refactoring tools provided by Eclipse, which > >>> should come in handy. > >>> > >>> The modules I would like to see should provide self-contained > >>> functionality and cross dependencies should be restricted to a > >>> minimum. I would suggest to have the following modules: > >>> > >>> biojava-core: Contains everything that can not easily be modularized > >>> or nobody volunteers to become a module maintainer. > >>> biojava-phylogeny: Scooter expressed some interested to provide such a > >>> module and become package maintainer for it. > >>> biojava-structure: Everything protein structure related. I would be > >>> package maintainer. > >>> biojava-blast: Blast parsing is a frequently requested functionality > >>> and it would be good to have this code self-contained. A package > >>> maintainer for this still will need to be nominated at a later stage. > >>> Any suggestions for other modules? > >>> > >>> Let me know what you think about this. > >>> > >>> Andreas > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >>> > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > >> _________________________ > >> > >> CONFIDENTIALITY NOTICE > >> > >> The information contained in this e-mail message is intended only for the > >> exclusive use of the individual or entity named above and may contain > >> information that is privileged, confidential or exempt from disclosure > >> under > >> applicable law. If the reader of this message is not the intended > >> recipient, > >> or the employee or agent responsible for delivery of the message to the > >> intended recipient, you are hereby notified that any dissemination, > >> distribution or copying of this communication is strictly prohibited. If > >> you > >> have received this communication in error, please notify the sender > >> immediately by e-mail and delete the material from any computer. Thank > >> you. > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From msmoot at ucsd.edu Thu May 21 19:47:22 2009 From: msmoot at ucsd.edu (Mike Smoot) Date: Thu, 21 May 2009 16:47:22 -0700 Subject: [Biojava-dev] an outsider's take on Biojava 3 Message-ID: Hi Everyone, I thought I'd respond to Andreas' request for participation in the BioJava 3 design discussions that he made last week on the normal BioJava list. I'm the lead developer on the Cytoscape project (http://cytoscape.org), so I thought I'd provide some perspective on what a project using BioJava might look for in BioJava 3. Basically, I'd just like to voice my strong support for the "Basic Principles" listed here: http://biojava.org/wiki/BioJava3_Design. Finer granularity of jars, acyclic dependencies, and the separation of API and implementation are precisely the things we're doing in Cytoscape 3. The first two points will go a long way towards making it easier to use specific parts of the library without needing everything at once. The second point will allow alternative implementations of certain interfaces, which is one approach to dealing with issues like parallel vs. non-parallel versions of algorithms. Maven also sounds great. If I could add one bullet to the list, it would be to add OSGi metadata to the jars to allow easy integration with OSGi-based projects (such as Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven plugins to make this dead simple and it wouldn't impact anyone not using OSGi. Please take that with a large grain of salt, I just thought you might appreciate an outsider's perspective! thanks, Mike -- ____________________________________________________________ Michael Smoot, Ph.D. Bioengineering Department tel: 858-822-4756 University of California San Diego From markjschreiber at gmail.com Thu May 21 22:59:14 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 22 May 2009 10:59:14 +0800 Subject: [Biojava-dev] an outsider's take on Biojava 3 In-Reply-To: References: Message-ID: <93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com> Thanks for the comments. The OSGi system sounds interesting. I think we should consider it. I have also added two more recommendations for the Design Principles: On Fri, May 22, 2009 at 7:47 AM, Mike Smoot wrote: > Hi Everyone, > > I thought I'd respond to Andreas' request for participation in the BioJava 3 > design discussions that he made last week on the normal BioJava list. ?I'm > the lead developer on the Cytoscape project (http://cytoscape.org), so I > thought I'd provide some perspective on what a project using BioJava might > look for in BioJava 3. > > Basically, I'd just like to voice my strong support for the "Basic > Principles" listed here: http://biojava.org/wiki/BioJava3_Design. ?Finer > granularity of jars, acyclic dependencies, and the separation of API and > implementation are precisely the things we're doing in Cytoscape 3. ?The > first two points will go a long way towards making it easier to use specific > parts of the library without needing everything at once. ?The second point > will allow alternative implementations of certain interfaces, which is one > approach to dealing with issues like parallel vs. non-parallel versions of > algorithms. ?Maven also sounds great. > > If I could add one bullet to the list, it would be to add OSGi metadata to > the jars to allow easy integration with OSGi-based projects (such as > Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven > plugins to make this dead simple and it wouldn't impact anyone not using > OSGi. > > Please take that with a large grain of salt, I just thought you might > appreciate an outsider's perspective! > > thanks, > Mike > > -- > ____________________________________________________________ > Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department > tel: 858-822-4756 ? ? ? ? University of California San Diego > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From markjschreiber at gmail.com Thu May 21 23:01:57 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 22 May 2009 11:01:57 +0800 Subject: [Biojava-dev] an outsider's take on Biojava 3 In-Reply-To: <93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com> References: <93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com> Message-ID: <93b45ca50905212001v70067680mafb8f0bc36f6c497@mail.gmail.com> Sorry, sent before I said what the new principles were. 1. Extensive use of the Logging API 2. (At the risk of having a fatwa declared against me) Most biojava exceptions should derive from RuntimeException and be unchecked See the wiki page for more details. - Mark On Fri, May 22, 2009 at 10:59 AM, Mark Schreiber wrote: > Thanks for the comments. The OSGi system sounds interesting. I think > we should consider it. > > I have also added two more recommendations for the Design Principles: > > > On Fri, May 22, 2009 at 7:47 AM, Mike Smoot wrote: >> Hi Everyone, >> >> I thought I'd respond to Andreas' request for participation in the BioJava 3 >> design discussions that he made last week on the normal BioJava list. ?I'm >> the lead developer on the Cytoscape project (http://cytoscape.org), so I >> thought I'd provide some perspective on what a project using BioJava might >> look for in BioJava 3. >> >> Basically, I'd just like to voice my strong support for the "Basic >> Principles" listed here: http://biojava.org/wiki/BioJava3_Design. ?Finer >> granularity of jars, acyclic dependencies, and the separation of API and >> implementation are precisely the things we're doing in Cytoscape 3. ?The >> first two points will go a long way towards making it easier to use specific >> parts of the library without needing everything at once. ?The second point >> will allow alternative implementations of certain interfaces, which is one >> approach to dealing with issues like parallel vs. non-parallel versions of >> algorithms. ?Maven also sounds great. >> >> If I could add one bullet to the list, it would be to add OSGi metadata to >> the jars to allow easy integration with OSGi-based projects (such as >> Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven >> plugins to make this dead simple and it wouldn't impact anyone not using >> OSGi. >> >> Please take that with a large grain of salt, I just thought you might >> appreciate an outsider's perspective! >> >> thanks, >> Mike >> >> -- >> ____________________________________________________________ >> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department >> tel: 858-822-4756 ? ? ? ? University of California San Diego >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > From holland at eaglegenomics.com Fri May 22 05:02:43 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 22 May 2009 10:02:43 +0100 Subject: [Biojava-dev] an outsider's take on Biojava 3 In-Reply-To: <93b45ca50905212001v70067680mafb8f0bc36f6c497@mail.gmail.com> References: <93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com> <93b45ca50905212001v70067680mafb8f0bc36f6c497@mail.gmail.com> Message-ID: <1242982963.10413.6.camel@buzzybee> RuntimeException is good for things that can't be recovered from. If the user has provided bad coordinates or invalid sequence, that's a recoverable error (because there's a chance that they came from user input via a user interface, which can be corrected and retried). Even file parsing exceptions should be recoverable - the user can move on to the next record without borking the entire file (we already see broken records quite a lot in Genbank downloads). But, for things like being unable to call out to Blast, or being unable to convert DNA to Protein because of a misconfiguration internally somewhere, I agree that RuntimeExceptions are probably best. These are unrecoverable and indicate that changes need to be made to the programming code or BioJava itself. So in my mind then RuntimeExceptions are good for highlighting programming errors, but not good for errors relating to invalid input data. On Fri, 2009-05-22 at 11:01 +0800, Mark Schreiber wrote: > Sorry, sent before I said what the new principles were. > > 1. Extensive use of the Logging API > 2. (At the risk of having a fatwa declared against me) Most biojava > exceptions should derive from RuntimeException and be unchecked > > See the wiki page for more details. > > - Mark > > On Fri, May 22, 2009 at 10:59 AM, Mark Schreiber > wrote: > > Thanks for the comments. The OSGi system sounds interesting. I think > > we should consider it. > > > > I have also added two more recommendations for the Design Principles: > > > > > > On Fri, May 22, 2009 at 7:47 AM, Mike Smoot wrote: > >> Hi Everyone, > >> > >> I thought I'd respond to Andreas' request for participation in the BioJava 3 > >> design discussions that he made last week on the normal BioJava list. I'm > >> the lead developer on the Cytoscape project (http://cytoscape.org), so I > >> thought I'd provide some perspective on what a project using BioJava might > >> look for in BioJava 3. > >> > >> Basically, I'd just like to voice my strong support for the "Basic > >> Principles" listed here: http://biojava.org/wiki/BioJava3_Design. Finer > >> granularity of jars, acyclic dependencies, and the separation of API and > >> implementation are precisely the things we're doing in Cytoscape 3. The > >> first two points will go a long way towards making it easier to use specific > >> parts of the library without needing everything at once. The second point > >> will allow alternative implementations of certain interfaces, which is one > >> approach to dealing with issues like parallel vs. non-parallel versions of > >> algorithms. Maven also sounds great. > >> > >> If I could add one bullet to the list, it would be to add OSGi metadata to > >> the jars to allow easy integration with OSGi-based projects (such as > >> Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven > >> plugins to make this dead simple and it wouldn't impact anyone not using > >> OSGi. > >> > >> Please take that with a large grain of salt, I just thought you might > >> appreciate an outsider's perspective! > >> > >> thanks, > >> Mike > >> > >> -- > >> ____________________________________________________________ > >> Michael Smoot, Ph.D. Bioengineering Department > >> tel: 858-822-4756 University of California San Diego > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andreas at sdsc.edu Mon May 25 00:22:09 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 24 May 2009 21:22:09 -0700 Subject: [Biojava-dev] next steps Message-ID: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> Hi, While talking about design requirements, I think we also need to think pragmatically about how much time we will have to refactor code vs. re-writing modules from scratch. To get started with the next steps, I suggest the following procedure: First thing will be to move to Maven. Then components should be refactored into independent sub-modules. Then the submodules can get improved to follow the new design guidelines. Once we have reached a certain stability with the re-organized code base, we will make the next release. Any comments? If there is general agreement about this, I would take the next step and replace the ant build system with a maven based one. Andreas From andreas at sdsc.edu Mon May 25 11:14:06 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 25 May 2009 08:14:06 -0700 Subject: [Biojava-dev] next steps In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> Message-ID: <59a41c430905250814p2cfcc627h477e688637f50ccb@mail.gmail.com> > build some sort of graph relationship tool. It is also easy enough to start > dragging packages around to different projects in netbeans and resolve > compiler errors. yea, same for Eclipse. The Eclipse Maven plugin allows to auto-convert a project to Maven (quite easy). I have played around with it and it was quite easy to get a mavenized biojava with the dependencies correctly converted. That's why I thought it might be the first step. You suggest to first do the modularization and then the maven meta data. I still have to figure out how to make make independent submodules as part of Maven in eclipse now.... let me play around a bit more and see how it goes... The package list sounds good and java 1.6 too. Andreas > > The advantage of smaller tightly group functional jars is that it allows you > to have more frequent minor releases with out updating and releasing the > entire biojava package. It also allows individuals to own a smaller block of > code for unit test, documentation and examples. > > With Maven this becomes less of an issue to worry about multiple parts and > pieces and their relationships. I think we need to divide up into a > reasonable approximation of the jars before doing the meta data for maven. > > Looking at the current package structure this is an attempt of grouping > jars. I do not have enough code familiarity with all of biojava so this is > strictly based on package names. > > biojava-core Any classes that organize data structures and would probably > include org.biojava.bio.seq.*. Any utility classes that can be used by other > packages org.biojava.utils.* > > biojava-structure org.biojava.bio.structure.* > > biojava-gui org.biojava.bio.gui > > biojava-phylo A package that has a few classes for viewing trees structures > using the jgrapht-jdk package. I need to play with the code and see if it > actually uses graph generated by jgrapht for anything special. I have code > that will render a tree as a simple graphic. I have used jgrapht?for other > projects so it is not a bad "graphing" package for network diagrams. It > could be refactored out. > > Not sure how to tackle the org.biojava.bio.program package as it seems to > have lots of distinct functional code. > > biojava-ws-blast - A web service approach to doing blast. The api would hide > the web services call > > biojava-blast - Blast parsing code. We could have one package for anything > blast related > > biojava-ws-clustalw - A web services approach to doing clustalw multiple > sequence alignment The api would hide the web services call > > biojava-alignment - Code for doing sequence alignment. We could have one > package for anything alignment related > > Does anyone know if you can get usage statistics from maven as to what jar > files are being downloaded? This would help provide good statistics on what > code is being used which will help focus on improvements in documentation > etc. > > I assume we are going to make Java 1.6 the minimum requirement moving > forward? This simplifies some of the web services api requirements to > minimize the number of external packages that are required. > > > Scooter > > > > > > > > ________________________________ > From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic > Sent: Mon 5/25/2009 12:22 AM > To: biojava-dev at lists.open-bio.org > Subject: [Biojava-dev] next steps > > Hi, > > While talking about design requirements, I think we also need to think > pragmatically about how much time we will have to refactor code vs. > re-writing modules from scratch. To get started with the next steps, I > ?suggest the following procedure: First thing will be to move to > Maven. Then components should be refactored into independent > sub-modules. Then the submodules can get improved to follow the new > design guidelines. Once we have reached a certain stability with the > re-organized code base, we will make the next release. > > Any comments? If there is general agreement about this, I would take > the next step and replace the ant build system with a maven based one. > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From HWillis at scripps.edu Mon May 25 10:48:50 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 25 May 2009 10:48:50 -0400 Subject: [Biojava-dev] next steps References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> Andreas I was looking at the biojava code yesterday to see how easy it would be to divide up into functionally grouped jars based on package hierarchy. I tried to find some refactoring tools that would give a network graph view of class relationships. It is simple enough to parse source for import statements and build some sort of graph relationship tool. It is also easy enough to start dragging packages around to different projects in netbeans and resolve compiler errors. The advantage of smaller tightly group functional jars is that it allows you to have more frequent minor releases with out updating and releasing the entire biojava package. It also allows individuals to own a smaller block of code for unit test, documentation and examples. With Maven this becomes less of an issue to worry about multiple parts and pieces and their relationships. I think we need to divide up into a reasonable approximation of the jars before doing the meta data for maven. Looking at the current package structure this is an attempt of grouping jars. I do not have enough code familiarity with all of biojava so this is strictly based on package names. biojava-core Any classes that organize data structures and would probably include org.biojava.bio.seq.*. Any utility classes that can be used by other packages org.biojava.utils.* biojava-structure org.biojava.bio.structure.* biojava-gui org.biojava.bio.gui biojava-phylo A package that has a few classes for viewing trees structures using the jgrapht-jdk package. I need to play with the code and see if it actually uses graph generated by jgrapht for anything special. I have code that will render a tree as a simple graphic. I have used jgrapht for other projects so it is not a bad "graphing" package for network diagrams. It could be refactored out. Not sure how to tackle the org.biojava.bio.program package as it seems to have lots of distinct functional code. biojava-ws-blast - A web service approach to doing blast. The api would hide the web services call biojava-blast - Blast parsing code. We could have one package for anything blast related biojava-ws-clustalw - A web services approach to doing clustalw multiple sequence alignment The api would hide the web services call biojava-alignment - Code for doing sequence alignment. We could have one package for anything alignment related Does anyone know if you can get usage statistics from maven as to what jar files are being downloaded? This would help provide good statistics on what code is being used which will help focus on improvements in documentation etc. I assume we are going to make Java 1.6 the minimum requirement moving forward? This simplifies some of the web services api requirements to minimize the number of external packages that are required. Scooter ________________________________ From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic Sent: Mon 5/25/2009 12:22 AM To: biojava-dev at lists.open-bio.org Subject: [Biojava-dev] next steps Hi, While talking about design requirements, I think we also need to think pragmatically about how much time we will have to refactor code vs. re-writing modules from scratch. To get started with the next steps, I suggest the following procedure: First thing will be to move to Maven. Then components should be refactored into independent sub-modules. Then the submodules can get improved to follow the new design guidelines. Once we have reached a certain stability with the re-organized code base, we will make the next release. Any comments? If there is general agreement about this, I would take the next step and replace the ant build system with a maven based one. Andreas _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From msmoot at ucsd.edu Mon May 25 13:07:57 2009 From: msmoot at ucsd.edu (Mike Smoot) Date: Mon, 25 May 2009 10:07:57 -0700 Subject: [Biojava-dev] next steps In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> Message-ID: On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: > > I was looking at the biojava code yesterday to see how easy it would be to > divide up into functionally grouped jars based on package hierarchy. I tried > to find some refactoring tools that would give a network graph view of class > relationships. It is simple enough to parse source for import statements and > build some sort of graph relationship tool. It is also easy enough to start > dragging packages around to different projects in netbeans and resolve > compiler errors. > JDepend is a nice tool for evaluating package dependencies. http://www.clarkware.com/software/JDepend.html Mike -- ____________________________________________________________ Michael Smoot, Ph.D. Bioengineering Department tel: 858-822-4756 University of California San Diego From HWillis at scripps.edu Mon May 25 18:59:10 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 25 May 2009 18:59:10 -0400 Subject: [Biojava-dev] next steps References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> I attached the JDepend output for BioJava. This will help on the circular dependencies where core classes should not have dependencies on other packages and if they do it should be refactored into the core class. Scooter ________________________________ From: mike.smoot at gmail.com on behalf of Mike Smoot Sent: Mon 5/25/2009 1:07 PM To: Scooter Willis Cc: Andreas Prlic; biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] next steps On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: I was looking at the biojava code yesterday to see how easy it would be to divide up into functionally grouped jars based on package hierarchy. I tried to find some refactoring tools that would give a network graph view of class relationships. It is simple enough to parse source for import statements and build some sort of graph relationship tool. It is also easy enough to start dragging packages around to different projects in netbeans and resolve compiler errors. JDepend is a nice tool for evaluating package dependencies. http://www.clarkware.com/software/JDepend.html Mike -- ____________________________________________________________ Michael Smoot, Ph.D. Bioengineering Department tel: 858-822-4756 University of California San Diego -------------- next part -------------- A non-text attachment was scrubbed... Name: report.xml Type: text/xml Size: 567706 bytes Desc: report.xml URL: From andreas at sdsc.edu Thu May 28 00:31:15 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 27 May 2009 21:31:15 -0700 Subject: [Biojava-dev] next steps In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> Message-ID: <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> Hi Scooter, quick update: There is also an eclipse plugin for JDepend, that provides a user interface to browse thought the dependencies. As I already mentioned earlier, I had some quick progress with the maven plugin to convert the project to maven and create a first pom. At the moment I am testing how best to create sub-projects that should contain the modules. The plugin does not seem to make it easy to create new modules, so I agree with your earlier suggestion that it is best to modularize first and the mavenize 2nd... Should we create a branch in svn and play around with refactoring there and once we are happy with it we can switch that branch to become the trunk? Andreas On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: > I attached the JDepend output for BioJava. This will help on the circular > dependencies where core classes should not have dependencies on other > packages and if they do it should be refactored into the core class. > > Scooter > ________________________________ > From: mike.smoot at gmail.com on behalf of Mike Smoot > Sent: Mon 5/25/2009 1:07 PM > To: Scooter Willis > Cc: Andreas Prlic; biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] next steps > > > > On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: >> >> I was looking at the biojava code yesterday to see how easy it would be to >> divide up into functionally grouped jars based on package hierarchy. I tried >> to find some refactoring tools that would give a network graph view of class >> relationships. It is simple enough to parse source for import statements and >> build some sort of graph relationship tool. It is also easy enough to start >> dragging packages around to different projects in netbeans and resolve >> compiler errors. > > JDepend is a nice tool for evaluating package dependencies. > > http://www.clarkware.com/software/JDepend.html > > > Mike > > -- > ____________________________________________________________ > Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department > tel: 858-822-4756 ? ? ? ? University of California San Diego > From juberpatel at gmail.com Thu May 28 03:09:29 2009 From: juberpatel at gmail.com (juber patel) Date: Thu, 28 May 2009 12:39:29 +0530 Subject: [Biojava-dev] next steps In-Reply-To: <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> Message-ID: just a small observation: Maven may not be easy to use and switch to maven should be done after some consideration. I have personally not used it, but have seen people on the Mahout list struggling with maven. Its utility may not justify its complexity. juber On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic wrote: > Hi Scooter, > > quick update: There is also an eclipse plugin for JDepend, that > provides a user interface to browse thought the dependencies. > > As I already mentioned earlier, I had some quick progress with the > maven plugin to convert the project to maven and create a first pom. > At the moment I am testing how ?best to create ?sub-projects that > should contain the modules. ?The plugin does not seem to make it easy > to create new modules, so I agree with your earlier suggestion that it > is best to modularize first and the mavenize 2nd... Should we create a > branch in svn and play around with refactoring there and once we are > happy with it we can switch that branch to become the trunk? > > Andreas > > > > > On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: >> I attached the JDepend output for BioJava. This will help on the circular >> dependencies where core classes should not have dependencies on other >> packages and if they do it should be refactored into the core class. >> >> Scooter >> ________________________________ >> From: mike.smoot at gmail.com on behalf of Mike Smoot >> Sent: Mon 5/25/2009 1:07 PM >> To: Scooter Willis >> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org >> Subject: Re: [Biojava-dev] next steps >> >> >> >> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: >>> >>> I was looking at the biojava code yesterday to see how easy it would be to >>> divide up into functionally grouped jars based on package hierarchy. I tried >>> to find some refactoring tools that would give a network graph view of class >>> relationships. It is simple enough to parse source for import statements and >>> build some sort of graph relationship tool. It is also easy enough to start >>> dragging packages around to different projects in netbeans and resolve >>> compiler errors. >> >> JDepend is a nice tool for evaluating package dependencies. >> >> http://www.clarkware.com/software/JDepend.html >> >> >> Mike >> >> -- >> ____________________________________________________________ >> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department >> tel: 858-822-4756 ? ? ? ? University of California San Diego >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Juber Patel http://juberpatel.googlepages.com From holland at eaglegenomics.com Thu May 28 02:55:28 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 28 May 2009 07:55:28 +0100 Subject: [Biojava-dev] next steps In-Reply-To: <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> Message-ID: <1243493728.5260.1.camel@buzzybee> I found when creating modules for the testbed biojava3 that it was easier to do it by hand. Only two things need to be done - first of all a list of modules needs to be added to the parent pom.xml of the project, then each module has its own pom.xml referencing the parent pom.xml. Once created this way it only takes a project refresh in Eclipse/NetBeans for the new module to show up. See the example pom.xmls under the old biojava3 branch for details. cheers, Richard On Wed, 2009-05-27 at 21:31 -0700, Andreas Prlic wrote: > Hi Scooter, > > quick update: There is also an eclipse plugin for JDepend, that > provides a user interface to browse thought the dependencies. > > As I already mentioned earlier, I had some quick progress with the > maven plugin to convert the project to maven and create a first pom. > At the moment I am testing how best to create sub-projects that > should contain the modules. The plugin does not seem to make it easy > to create new modules, so I agree with your earlier suggestion that it > is best to modularize first and the mavenize 2nd... Should we create a > branch in svn and play around with refactoring there and once we are > happy with it we can switch that branch to become the trunk? > > Andreas > > > > > On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: > > I attached the JDepend output for BioJava. This will help on the circular > > dependencies where core classes should not have dependencies on other > > packages and if they do it should be refactored into the core class. > > > > Scooter > > ________________________________ > > From: mike.smoot at gmail.com on behalf of Mike Smoot > > Sent: Mon 5/25/2009 1:07 PM > > To: Scooter Willis > > Cc: Andreas Prlic; biojava-dev at lists.open-bio.org > > Subject: Re: [Biojava-dev] next steps > > > > > > > > On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: > >> > >> I was looking at the biojava code yesterday to see how easy it would be to > >> divide up into functionally grouped jars based on package hierarchy. I tried > >> to find some refactoring tools that would give a network graph view of class > >> relationships. It is simple enough to parse source for import statements and > >> build some sort of graph relationship tool. It is also easy enough to start > >> dragging packages around to different projects in netbeans and resolve > >> compiler errors. > > > > JDepend is a nice tool for evaluating package dependencies. > > > > http://www.clarkware.com/software/JDepend.html > > > > > > Mike > > > > -- > > ____________________________________________________________ > > Michael Smoot, Ph.D. Bioengineering Department > > tel: 858-822-4756 University of California San Diego > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Thu May 28 04:16:05 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 28 May 2009 09:16:05 +0100 Subject: [Biojava-dev] next steps In-Reply-To: References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> Message-ID: <4A1E4845.8080906@ebi.ac.uk> Maven's big plus points are easy integration into just about any IDE & its transitive dependency management capability. On a project like BioJava (need people to get setup & running quickly over a wide range of development environments) these two points really make it one of the only viable choices I can would use. This isn't to say the other build systems are not as good/better (rake, raven, gant, gradle, ant) just they do not fit the bill as well. Andy juber patel wrote: > just a small observation: > > Maven may not be easy to use and switch to maven should be done after > some consideration. I have personally not used it, but have seen > people on the Mahout list struggling with maven. Its utility may not > justify its complexity. > > juber > > > On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic wrote: >> Hi Scooter, >> >> quick update: There is also an eclipse plugin for JDepend, that >> provides a user interface to browse thought the dependencies. >> >> As I already mentioned earlier, I had some quick progress with the >> maven plugin to convert the project to maven and create a first pom. >> At the moment I am testing how best to create sub-projects that >> should contain the modules. The plugin does not seem to make it easy >> to create new modules, so I agree with your earlier suggestion that it >> is best to modularize first and the mavenize 2nd... Should we create a >> branch in svn and play around with refactoring there and once we are >> happy with it we can switch that branch to become the trunk? >> >> Andreas >> >> >> >> >> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: >>> I attached the JDepend output for BioJava. This will help on the circular >>> dependencies where core classes should not have dependencies on other >>> packages and if they do it should be refactored into the core class. >>> >>> Scooter >>> ________________________________ >>> From: mike.smoot at gmail.com on behalf of Mike Smoot >>> Sent: Mon 5/25/2009 1:07 PM >>> To: Scooter Willis >>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org >>> Subject: Re: [Biojava-dev] next steps >>> >>> >>> >>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: >>>> I was looking at the biojava code yesterday to see how easy it would be to >>>> divide up into functionally grouped jars based on package hierarchy. I tried >>>> to find some refactoring tools that would give a network graph view of class >>>> relationships. It is simple enough to parse source for import statements and >>>> build some sort of graph relationship tool. It is also easy enough to start >>>> dragging packages around to different projects in netbeans and resolve >>>> compiler errors. >>> JDepend is a nice tool for evaluating package dependencies. >>> >>> http://www.clarkware.com/software/JDepend.html >>> >>> >>> Mike >>> >>> -- >>> ____________________________________________________________ >>> Michael Smoot, Ph.D. Bioengineering Department >>> tel: 858-822-4756 University of California San Diego >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > From james at carmanconsulting.com Thu May 28 05:37:53 2009 From: james at carmanconsulting.com (James Carman) Date: Thu, 28 May 2009 05:37:53 -0400 Subject: [Biojava-dev] next steps In-Reply-To: References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> Message-ID: Maven really isn't that hard. I have no idea what the Mahout folks are having troubles with, but I'm sure it can be addressed. Maven't benefits greatly outweigh its complexity (which isn't that high, IMHO). If you folks want a hand "mavenizing" your project, I wouldn't mind helping. On Thu, May 28, 2009 at 3:09 AM, juber patel wrote: > just a small observation: > > Maven may not be easy to use and switch to maven should be done after > some consideration. I have personally not used it, but have seen > people on the Mahout list struggling with maven. Its utility may not > justify its complexity. > > juber > > > On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic wrote: >> Hi Scooter, >> >> quick update: There is also an eclipse plugin for JDepend, that >> provides a user interface to browse thought the dependencies. >> >> As I already mentioned earlier, I had some quick progress with the >> maven plugin to convert the project to maven and create a first pom. >> At the moment I am testing how ?best to create ?sub-projects that >> should contain the modules. ?The plugin does not seem to make it easy >> to create new modules, so I agree with your earlier suggestion that it >> is best to modularize first and the mavenize 2nd... Should we create a >> branch in svn and play around with refactoring there and once we are >> happy with it we can switch that branch to become the trunk? >> >> Andreas >> >> >> >> >> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: >>> I attached the JDepend output for BioJava. This will help on the circular >>> dependencies where core classes should not have dependencies on other >>> packages and if they do it should be refactored into the core class. >>> >>> Scooter >>> ________________________________ >>> From: mike.smoot at gmail.com on behalf of Mike Smoot >>> Sent: Mon 5/25/2009 1:07 PM >>> To: Scooter Willis >>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org >>> Subject: Re: [Biojava-dev] next steps >>> >>> >>> >>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: >>>> >>>> I was looking at the biojava code yesterday to see how easy it would be to >>>> divide up into functionally grouped jars based on package hierarchy. I tried >>>> to find some refactoring tools that would give a network graph view of class >>>> relationships. It is simple enough to parse source for import statements and >>>> build some sort of graph relationship tool. It is also easy enough to start >>>> dragging packages around to different projects in netbeans and resolve >>>> compiler errors. >>> >>> JDepend is a nice tool for evaluating package dependencies. >>> >>> http://www.clarkware.com/software/JDepend.html >>> >>> >>> Mike >>> >>> -- >>> ____________________________________________________________ >>> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department >>> tel: 858-822-4756 ? ? ? ? University of California San Diego >>> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > > -- > Juber Patel ? ? ? ?http://juberpatel.googlepages.com > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From HWillis at scripps.edu Thu May 28 09:10:43 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 May 2009 09:10:43 -0400 Subject: [Biojava-dev] next steps References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C861@FLMAIL1.fl.ad.scripps.edu> Maven should be viewed as an additional option for developers where once a version of BioJava is released the Maven repository is updated and we need to make sure we have all the meta-data/dependency information correct. This doesn't mean that BioJava development needs to be done in Maven but simply is another way to get the jars after they have been released. BioJava as a single Jar is not that hard to integrate into your project given that we have a handful of external jars files that we provide as part of the download. For other projects I have worked with where they only package the jar for that project and then give you web links to download 10 other external projects then that is a pain. You go to each website to figure out the download process and find that they are now all in different releases then Maven is a great solution because the developers of biojava took the time to get the exact version of jar files from external packages referenced properly and did not leave it to the "customer" to figure out. If we use apache commons as a model I personally would rather grab the package of interest say biojava-blast and add into my development environment. Maven is an Apache project yet when you go to http://commons.apache.org/ and grab the component of interest Maven isn't even listed as an option. This is probably because it is an overkill for a single jar. Doesn't mean that you can't get commons jar's via maven when you load a larger project. In our case we may have a couple components where it can get a little complicated by external jar dependencies. Using biojava-blast as an example where it has a web service client that is either using axis or the latest greatest sun JSR. The project I am importing biojava-blast via Maven into already uses axis but an older version because everything works and I haven't needed to do the upgrade. Maven may make the integration step easier but it doesn't solve the problem where I as the developer now need to do something to resolve the version conflicts. So I view Maven as a nice option for developers who are a big fan of Maven and makes them smile when they can grab the code they need from BioJava via Maven. We should plan on having an apache commons like page to download and publish the jars in maven as well. Scooter ________________________________ From: biojava-dev-bounces at lists.open-bio.org on behalf of James Carman Sent: Thu 5/28/2009 5:37 AM To: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] next steps Maven really isn't that hard. I have no idea what the Mahout folks are having troubles with, but I'm sure it can be addressed. Maven't benefits greatly outweigh its complexity (which isn't that high, IMHO). If you folks want a hand "mavenizing" your project, I wouldn't mind helping. On Thu, May 28, 2009 at 3:09 AM, juber patel wrote: > just a small observation: > > Maven may not be easy to use and switch to maven should be done after > some consideration. I have personally not used it, but have seen > people on the Mahout list struggling with maven. Its utility may not > justify its complexity. > > juber > > > On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic wrote: >> Hi Scooter, >> >> quick update: There is also an eclipse plugin for JDepend, that >> provides a user interface to browse thought the dependencies. >> >> As I already mentioned earlier, I had some quick progress with the >> maven plugin to convert the project to maven and create a first pom. >> At the moment I am testing how best to create sub-projects that >> should contain the modules. The plugin does not seem to make it easy >> to create new modules, so I agree with your earlier suggestion that it >> is best to modularize first and the mavenize 2nd... Should we create a >> branch in svn and play around with refactoring there and once we are >> happy with it we can switch that branch to become the trunk? >> >> Andreas >> >> >> >> >> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: >>> I attached the JDepend output for BioJava. This will help on the circular >>> dependencies where core classes should not have dependencies on other >>> packages and if they do it should be refactored into the core class. >>> >>> Scooter >>> ________________________________ >>> From: mike.smoot at gmail.com on behalf of Mike Smoot >>> Sent: Mon 5/25/2009 1:07 PM >>> To: Scooter Willis >>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org >>> Subject: Re: [Biojava-dev] next steps >>> >>> >>> >>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: >>>> >>>> I was looking at the biojava code yesterday to see how easy it would be to >>>> divide up into functionally grouped jars based on package hierarchy. I tried >>>> to find some refactoring tools that would give a network graph view of class >>>> relationships. It is simple enough to parse source for import statements and >>>> build some sort of graph relationship tool. It is also easy enough to start >>>> dragging packages around to different projects in netbeans and resolve >>>> compiler errors. >>> >>> JDepend is a nice tool for evaluating package dependencies. >>> >>> http://www.clarkware.com/software/JDepend.html >>> >>> >>> Mike >>> >>> -- >>> ____________________________________________________________ >>> Michael Smoot, Ph.D. Bioengineering Department >>> tel: 858-822-4756 University of California San Diego >>> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > > -- > Juber Patel http://juberpatel.googlepages.com > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Thu May 28 09:37:27 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 May 2009 09:37:27 -0400 Subject: [Biojava-dev] BioJava BLAST web services Message-ID: <061BFD133FA1584693D19C79A0072F5F76C863@FLMAIL1.fl.ad.scripps.edu> I am planning on doing some testing of a couple BLAST web services interfaces(assuming more than one exists) and see what they truly have in common and see how that would impact a BJ3 front end to multiple providers. My assumption is that they will be the same. I noticed on the NCBI Blast implementations the user was required to pass their email address as part of the web service call. They are concerned with abuse from external processes and they only allow one sequence per request. Same-Same but different is always fun! >From wikipedia the following are listed as BLAST resources where more than one may offer a web service interface. Should BioJava3 try and support more than one? Thanks Scooter Variations of BLAST * WU-BLAST - the original gapping BLAST with statistics, developed and maintained by Warren Gish at Washington University in St. Louis * EBI's BLAST Services - EBI's main blast services page. * FSA-BLAST - a new, faster but still accurate version of NCBI BLAST based on recently published algorithmic improvements * NBIC mpiBLAST - at the Netherlands Bioinformatics Centre * Parallel BLAST - a dual scheduling BLAST tested on the Blue Gene/L * mpiBLAST - open-source parallel BLAST * A/G BLAST - implementation for PowerPC G4/G5 processors and Mac OS X, from Apple Computer 's Advanced Computation Group and Genentech . * STRAP - the protein workbench STRAP contains a comfortable BLAST front-end with a cache for BLAST results [edit ] Commercial versions * ThermoBLAST by DNA Software Inc. - scans entire genomes quickly and accurately combing the power of BLAST with the most advanced thermodynamics parameters * PatternHunter - an alternative software which provides similar functionality to BLAST while claiming increased speed and sensitivity * KoriBlast - a reliable graphical environment dedicated to sequence data mining. KoriBlast combines Blast searches with advanced data management capabilities and a state-of-the-art graphical user interface. * microbial identification BLAST - a quality controlled database for in-vitro diagnostics. SepsiTest combines broad-range-PCR using ultra-pure reagents with Blast searches in a quality controlled environment. From james at carmanconsulting.com Thu May 28 09:45:23 2009 From: james at carmanconsulting.com (James Carman) Date: Thu, 28 May 2009 09:45:23 -0400 Subject: [Biojava-dev] next steps In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C861@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C861@FLMAIL1.fl.ad.scripps.edu> Message-ID: I would say that you should use the Apache Commons projects as a model (I'm an Apache Commons PMC member, so I'm a bit biased). The maven-generated site will include information on the dependencies (including whether they are optional and where you can get them provided the other projects play nicely and include that information). And, yes, when you *do* use Maven, it will download all required transitive dependencies for you and add it to your classpath automagically. That's why it's so nice. Well, that's one of the MANY reasons it's so nice. The release plugin also saves a LOT of headaches, if you ask me (once you get it configured properly). On Thu, May 28, 2009 at 9:10 AM, Scooter Willis wrote: > Maven should be viewed as an additional option for developers where once a > version of BioJava is released the Maven repository is updated and we need > to make sure we have all the meta-data/dependency information correct. This > doesn't mean that BioJava development needs to be done in Maven but simply > is another way to get the jars after they have been released. BioJava as a > single Jar is not that hard to integrate into your project given that we > have a handful of external jars files that? we provide as part of the > download. For other projects I have worked with where they only package the > jar for that project and then give you web links to download 10 other > external projects then that is a pain.?You go to each website to figure out > the download process and find that they are now all in different releases > then Maven is a great solution because the developers of biojava took the > time to get the exact version of jar files from external packages referenced > properly and did not leave it to the "customer" to figure out. > > If we use apache commons as a model I personally?would rather grab the > package of interest say biojava-blast and add into my development > environment. Maven is an Apache project yet when you go to > http://commons.apache.org/?and?grab the component of interest Maven isn't > even listed as an option. This is probably because it is an overkill for a > single?jar. Doesn't mean that you can't get?commons?jar's via maven when you > load a larger project. > > In our case we may have a couple components where it can get a little > complicated by external jar dependencies. Using biojava-blast as an example > where it?has a web service client that is either using axis or the latest > greatest sun JSR. The project I am importing biojava-blast via Maven into > already uses axis but an older version because everything works and I > haven't needed to? do the upgrade. Maven may make the integration step > easier but it doesn't solve the problem where I as the developer now need to > do? something to resolve the version conflicts. > > So I view Maven as a nice option for developers who are a big fan of Maven > and makes them smile when they can grab the code they need from BioJava via > Maven. We should plan on having an apache commons like page to download and > publish the jars in maven as well. > > Scooter > ________________________________ > From: biojava-dev-bounces at lists.open-bio.org on behalf of James Carman > Sent: Thu 5/28/2009 5:37 AM > To: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] next steps > > Maven really isn't that hard.? I have no idea what the Mahout folks > are having troubles with, but I'm sure it can be addressed.? Maven't > benefits greatly outweigh its complexity (which isn't that high, > IMHO).? If you folks want a hand "mavenizing" your project, I wouldn't > mind helping. > > On Thu, May 28, 2009 at 3:09 AM, juber patel wrote: >> just a small observation: >> >> Maven may not be easy to use and switch to maven should be done after >> some consideration. I have personally not used it, but have seen >> people on the Mahout list struggling with maven. Its utility may not >> justify its complexity. >> >> juber >> >> >> On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic wrote: >>> Hi Scooter, >>> >>> quick update: There is also an eclipse plugin for JDepend, that >>> provides a user interface to browse thought the dependencies. >>> >>> As I already mentioned earlier, I had some quick progress with the >>> maven plugin to convert the project to maven and create a first pom. >>> At the moment I am testing how ?best to create ?sub-projects that >>> should contain the modules. ?The plugin does not seem to make it easy >>> to create new modules, so I agree with your earlier suggestion that it >>> is best to modularize first and the mavenize 2nd... Should we create a >>> branch in svn and play around with refactoring there and once we are >>> happy with it we can switch that branch to become the trunk? >>> >>> Andreas >>> >>> >>> >>> >>> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis >>> wrote: >>>> I attached the JDepend output for BioJava. This will help on the >>>> circular >>>> dependencies where core classes should not have dependencies on other >>>> packages and if they do it should be refactored into the core class. >>>> >>>> Scooter >>>> ________________________________ >>>> From: mike.smoot at gmail.com on behalf of Mike Smoot >>>> Sent: Mon 5/25/2009 1:07 PM >>>> To: Scooter Willis >>>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org >>>> Subject: Re: [Biojava-dev] next steps >>>> >>>> >>>> >>>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis >>>> wrote: >>>>> >>>>> I was looking at the biojava code yesterday to see how easy it would be >>>>> to >>>>> divide up into functionally grouped jars based on package hierarchy. I >>>>> tried >>>>> to find some refactoring tools that would give a network graph view of >>>>> class >>>>> relationships. It is simple enough to parse source for import >>>>> statements and >>>>> build some sort of graph relationship tool. It is also easy enough to >>>>> start >>>>> dragging packages around to different projects in netbeans and resolve >>>>> compiler errors. >>>> >>>> JDepend is a nice tool for evaluating package dependencies. >>>> >>>> http://www.clarkware.com/software/JDepend.html >>>> >>>> >>>> Mike >>>> >>>> -- >>>> ____________________________________________________________ >>>> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department >>>> tel: 858-822-4756 ? ? ? ? University of California San Diego >>>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> >> >> >> -- >> Juber Patel ? ? ? ?http://juberpatel.googlepages.com >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From andreas at sdsc.edu Thu May 28 12:53:33 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 28 May 2009 09:53:33 -0700 Subject: [Biojava-dev] hierarchical vs flat module organisation Message-ID: <59a41c430905280953w964ab36q7baf1fd5eb21e62a@mail.gmail.com> Hi, from the different posts it seems there are two types of suggestions for how to organize modules: hierarchical vs. flat. I wonder if the best way to organize this is to mix the designs. There could be few top-level modules like core, webservices, phylo, structure. These would be equivalent to projects in the workspace. These can then contain-submodules like webservices-blast-ebi webservices-blast-ncbi webservices-whatever or structure-core structure-viewers The submodules would be sub-folders in the projects. Any thoughts on that? Andreas From HWillis at scripps.edu Thu May 28 14:09:32 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 May 2009 14:09:32 -0400 Subject: [Biojava-dev] hierarchical vs flat module organisation References: <59a41c430905280953w964ab36q7baf1fd5eb21e62a@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C867@FLMAIL1.fl.ad.scripps.edu> Andreas I think the organization should make the most sense to the user of BioJava and should be functionally grouped. I show up looking for specific biology algorithms/code. Blast, Sequence Alignment, Tree construction etc. In that module I would then find different features that I can then explore to solve the problem. The question becomes would I pick a module based on how it solved the problem. Given that BioJava does not have a native solution do to BLAST nor does the developer want to deal with all the configuration the BLAST-web services call simply becomes the only choice. The results of parsing a BLAST output and making a BLAST web service call should be the same structured result where I would then use other BioJava api's against the results. I think we should group by function an that gives the developer a collection of tools to work with. Scooter ________________________________ From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic Sent: Thu 5/28/2009 12:53 PM To: biojava-dev Subject: [Biojava-dev] hierarchical vs flat module organisation Hi, from the different posts it seems there are two types of suggestions for how to organize modules: hierarchical vs. flat. I wonder if the best way to organize this is to mix the designs. There could be few top-level modules like core, webservices, phylo, structure. These would be equivalent to projects in the workspace. These can then contain-submodules like webservices-blast-ebi webservices-blast-ncbi webservices-whatever or structure-core structure-viewers The submodules would be sub-folders in the projects. Any thoughts on that? Andreas _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Thu May 28 13:57:27 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 May 2009 13:57:27 -0400 Subject: [Biojava-dev] next steps References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com><061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu><061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C864@FLMAIL1.fl.ad.scripps.edu> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C866@FLMAIL1.fl.ad.scripps.edu> Andreas I think each jar probably needs its own svn trunk. This is how apache commons is setup. The advantage of this is that everything is modularized with nice defined boundaries on dependencies. If you have once source tree that builds multiple jars then it becomes very easy to grab a class from another jar and forcing additional dependencies. You also don't need to worry about a single user having access to the entire source tree. If you have a new developer who wants to get involved with a specific interest then easy to give him access to that package without worrying about breaking other packages. Do you think we should call the functional grouping packages or modules or something else? If you take a wack at the refactoring based on X number of modules then you could check each one in a different subversion trunk. Each module will probably have a dependency on biojava-core which will also be a separate subversion trunk. In Netbeans I would setup a project for each and then I can add the biojava-core project as an external project dependency. This also allows each module to be released independently and more frequently. We probably need to come up with a versioning convention that is part of the jar name. Not sure if any of the ant build tools automate the upticking of major/minor version number when packaging jars. For the user of biojava they would download a single jar for the module of interest where the download contains all the external jars that are required including biojava-core. For maven that would be done via POM. As part of the refactoring now is the time to make any major namespace changes you want to make. I assume that eclipse refactoring makes this easy. Check all the code in and BioJava3 has begun! Scooter ________________________________ From: andreas.prlic at gmail.com on behalf of Andreas Prlic Sent: Thu 5/28/2009 12:31 AM To: Scooter Willis Cc: biojava-dev Subject: Re: [Biojava-dev] next steps Hi Scooter, quick update: There is also an eclipse plugin for JDepend, that provides a user interface to browse thought the dependencies. As I already mentioned earlier, I had some quick progress with the maven plugin to convert the project to maven and create a first pom. At the moment I am testing how best to create sub-projects that should contain the modules. The plugin does not seem to make it easy to create new modules, so I agree with your earlier suggestion that it is best to modularize first and the mavenize 2nd... Should we create a branch in svn and play around with refactoring there and once we are happy with it we can switch that branch to become the trunk? Andreas On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: > I attached the JDepend output for BioJava. This will help on the circular > dependencies where core classes should not have dependencies on other > packages and if they do it should be refactored into the core class. > > Scooter > ________________________________ > From: mike.smoot at gmail.com on behalf of Mike Smoot > Sent: Mon 5/25/2009 1:07 PM > To: Scooter Willis > Cc: Andreas Prlic; biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] next steps > > > > On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: >> >> I was looking at the biojava code yesterday to see how easy it would be to >> divide up into functionally grouped jars based on package hierarchy. I tried >> to find some refactoring tools that would give a network graph view of class >> relationships. It is simple enough to parse source for import statements and >> build some sort of graph relationship tool. It is also easy enough to start >> dragging packages around to different projects in netbeans and resolve >> compiler errors. > > JDepend is a nice tool for evaluating package dependencies. > > http://www.clarkware.com/software/JDepend.html > > > Mike > > -- > ____________________________________________________________ > Michael Smoot, Ph.D. Bioengineering Department > tel: 858-822-4756 University of California San Diego > From andreas.prlic at gmail.com Fri May 29 00:53:22 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Thu, 28 May 2009 21:53:22 -0700 Subject: [Biojava-dev] next steps In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C866@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C864@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C866@FLMAIL1.fl.ad.scripps.edu> Message-ID: <59a41c430905282153r5c82b7cfp1648807b6042eaf5@mail.gmail.com> > I think each jar probably needs its own svn trunk. This is how apache > commons is setup. The advantage of this is that everything is modularized > with nice defined boundaries on dependencies. If you have once source tree > that builds multiple jars then it becomes very easy to grab a class from > another jar and forcing additional dependencies. sounds good. Guess it might be good not to have too many .jar files in the end as well. > You also don't need to worry about a single user having access to the entire > source tree. If you have a new developer who wants to get involved with a > specific interest then easy to give him access to that package without > worrying about breaking other packages. might be useful in the future. For now I think it is good enough to give developers write access to all of biojava. > > Do you think we should call the functional grouping packages or modules or > something else? What about: we call a toplevel project, a package. A package can then consist of several modules. Not sure if we should have a jar per package or per module. > If you take a wack at the refactoring based on X number of modules then you > could check each one in a different subversion trunk. Each module will > probably have a dependency on biojava-core which will also be a separate > subversion trunk. In Netbeans I would setup a project for each and then I > can add the biojava-core project as an external project dependency. Sounds good and you would do the same in eclipse. This > also allows each module to be released independently and more frequently. We > probably need to come up with a versioning convention that is part of the > jar name. I think we should stick to the maven naming conventions. http://maven.apache.org/guides/mini/guide-naming-conventions.html e.g. groupId org.biojava.phylo for the phylogenetic package artifactId biojava-phylo version 3.0.0 or 3.0.0-SNAPSHOT if it is a nightly build Not sure if any of the ant build tools automate the upticking of > major/minor version number when packaging jars. Not sure about ant, but maven has a built in release plugin. if it is set up correctly you can just write mvn release:prepare and the release is being prepared... > As part of the refactoring now is the time to make any major namespace > changes you want to make. I assume that eclipse refactoring makes this easy. Namespace changes are tricky. In principle I don;t want to break backwards compatibility with the existing code base. On the other side having package names starting with org.biojava.structure, rather than org.biojava.bio.structure would be simpler. If in doubt I am for backwards compatibility. One case where I would like to see a change is the core blast parsing modules. org.biojava.bio.program.sax does not indicate at all that this has to do with blast. Andreas From heuermh at acm.org Fri May 29 12:29:04 2009 From: heuermh at acm.org (Michael Heuer) Date: Fri, 29 May 2009 12:29:04 -0400 (EDT) Subject: [Biojava-dev] next steps In-Reply-To: <59a41c430905282153r5c82b7cfp1648807b6042eaf5@mail.gmail.com> Message-ID: Andreas Prlic wrote: > > I think each jar probably needs its own svn trunk. This is how apache > > commons is setup. The advantage of this is that everything is modularized > > with nice defined boundaries on dependencies. If you have once source tree > > that builds multiple jars then it becomes very easy to grab a class from > > another jar and forcing additional dependencies. > > sounds good. Guess it might be good not to have too many .jar files > in the end as well. > > > You also don't need to worry about a single user having access to the entire > > source tree. If you have a new developer who wants to get involved with a > > specific interest then easy to give him access to that package without > > worrying about breaking other packages. > > might be useful in the future. For now I think it is good enough to > give developers write access to all of biojava. > > > > > > Do you think we should call the functional grouping packages or modules or > > something else? > > What about: we call a toplevel project, a package. A package can then > consist of several modules. Not sure if we should have a jar per > package or per module. > > > > If you take a wack at the refactoring based on X number of modules then you > > could check each one in a different subversion trunk. Each module will > > probably have a dependency on biojava-core which will also be a separate > > subversion trunk. In Netbeans I would setup a project for each and then I > > can add the biojava-core project as an external project dependency. > > Sounds good and you would do the same in eclipse. > > This > > also allows each module to be released independently and more frequently. We > > probably need to come up with a versioning convention that is part of the > > jar name. > > I think we should stick to the maven naming conventions. > http://maven.apache.org/guides/mini/guide-naming-conventions.html > e.g. > groupId org.biojava.phylo for the phylogenetic package > artifactId biojava-phylo > version 3.0.0 or 3.0.0-SNAPSHOT if it is a nightly build > > > Not sure if any of the ant build tools automate the upticking of > > major/minor version number when packaging jars. > > Not sure about ant, but maven has a built in release plugin. if it is > set up correctly you can just write > mvn release:prepare > and the release is being prepared... > > > > As part of the refactoring now is the time to make any major namespace > > changes you want to make. I assume that eclipse refactoring makes this easy. > > Namespace changes are tricky. In principle I don;t want to break > backwards compatibility with the existing code base. On the other side > having package names starting with org.biojava.structure, rather than > org.biojava.bio.structure would be simpler. If in doubt I am for > backwards compatibility. One case where I would like to see a change > is the core blast parsing modules. org.biojava.bio.program.sax does > not indicate at all that this has to do with blast. From xuxiang at sibs.ac.cn Sun May 31 21:54:46 2009 From: xuxiang at sibs.ac.cn (xuxiang) Date: Mon, 1 Jun 2009 09:54:46 +0800 Subject: [Biojava-dev] Next Generation Sequencing Message-ID: <200906010954385937117@sibs.ac.cn> Hi all, I am doing something about sequencing data from Illumina Genome Analyzer (Next Generation Sequencing). Are there any tools in BioJava for analyzing Next Generation Sequencing data? 2009-06-01 xuxiang From harryzs1981 at gmail.com Wed May 6 13:13:42 2009 From: harryzs1981 at gmail.com (sheng zhao) Date: Wed, 6 May 2009 15:13:42 +0200 Subject: [Biojava-dev] Biojava-doc in chm forma Message-ID: <3d23b1eb0905060613m643adf87sdef55a05a083dd51@mail.gmail.com> Hi Where can I find Biojava-doc in chm format?? Thanks ! harry From andreas at sdsc.edu Mon May 11 04:26:58 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 10 May 2009 21:26:58 -0700 Subject: [Biojava-dev] Plans for next biojava release - modularization Message-ID: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> Hi biojava-devs, It is time to start working on the next biojava release. I would like to modularize the current code base and apply some of the ideas that have emerged around Richard's "biojava 3" code. In principle the idea is that all changes should be backwards compatible with the interfaces provided by the current biojava 1.7 release. Backwards compatibility shall only be broken if the functionality is being replaced with something that works better, and gets documented accordingly. For the build functionality I would suggest to stick with what Richard's biojava 3 code base already is providing. Since we will try to be backwards compatible all code development should be part of the biojava-trunk and the first step will be to move the ant-build scripts to a maven build process. Following this procedure will allow to use e.g. the code refactoring tools provided by Eclipse, which should come in handy. The modules I would like to see should provide self-contained functionality and cross dependencies should be restricted to a minimum. I would suggest to have the following modules: biojava-core: Contains everything that can not easily be modularized or nobody volunteers to become a module maintainer. biojava-phylogeny: Scooter expressed some interested to provide such a module and become package maintainer for it. biojava-structure: Everything protein structure related. I would be package maintainer. biojava-blast: Blast parsing is a frequently requested functionality and it would be good to have this code self-contained. A package maintainer for this still will need to be nominated at a later stage. Any suggestions for other modules? Let me know what you think about this. Andreas From HWillis at scripps.edu Mon May 11 13:50:58 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 11 May 2009 09:50:58 -0400 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> Andreas Another theme that should be considered is providing a multi-thread version of any module with long run time. This would have a couple elements. A progress listener interface should be standard where core code would update progress messages to listeners that can be used by external code to display feedback to the user. I did this with the Neighbor Joining code for tree construction and it provides needed feedback in a GUI. If not the user gets frustrated because they don't know the code they are about to execute may take 10 minutes or 8 hours to complete and they think the software is not working. The reverse is also true for canceling an operation where you want to have core code stop processing a long running loop. Once the code has completed then the listener interface for process complete is called allowing the next step in the external code to continue. The developer would have the choice to call the "process" method or run it in a thread and wait for the callback complete method to be called. This is the first step in the ability to have the core/long running processes take advantage of multiple threads to complete the computational task faster. Not all code can be parallelized easily but if the algorithm can take advantage of running in parallel then it should. This then opens up a couple of cloud computing frameworks that extend the multi-threaded concepts in Java across a cluster http://www.terracotta.org/. If we put an emphasis on having code that runs well in a thread we are one step closer to an architecture that can run in a cloud. The computational problems are only going to get bigger and with Amazon EC2 and http://www.eucalyptus.com/ approaches computational IO cycles are going to be cheap as long as the software/libraries can easily take advantage of it. Thanks Scooter -----Original Message----- From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic Sent: Monday, May 11, 2009 12:27 AM To: biojava-dev Subject: [Biojava-dev] Plans for next biojava release - modularization Hi biojava-devs, It is time to start working on the next biojava release. I would like to modularize the current code base and apply some of the ideas that have emerged around Richard's "biojava 3" code. In principle the idea is that all changes should be backwards compatible with the interfaces provided by the current biojava 1.7 release. Backwards compatibility shall only be broken if the functionality is being replaced with something that works better, and gets documented accordingly. For the build functionality I would suggest to stick with what Richard's biojava 3 code base already is providing. Since we will try to be backwards compatible all code development should be part of the biojava-trunk and the first step will be to move the ant-build scripts to a maven build process. Following this procedure will allow to use e.g. the code refactoring tools provided by Eclipse, which should come in handy. The modules I would like to see should provide self-contained functionality and cross dependencies should be restricted to a minimum. I would suggest to have the following modules: biojava-core: Contains everything that can not easily be modularized or nobody volunteers to become a module maintainer. biojava-phylogeny: Scooter expressed some interested to provide such a module and become package maintainer for it. biojava-structure: Everything protein structure related. I would be package maintainer. biojava-blast: Blast parsing is a frequently requested functionality and it would be good to have this code self-contained. A package maintainer for this still will need to be nominated at a later stage. Any suggestions for other modules? Let me know what you think about this. Andreas _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Mon May 11 22:53:14 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 11 May 2009 15:53:14 -0700 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> Message-ID: <59a41c430905111553n743dbcb3hbb21ec59294cb723@mail.gmail.com> Hi Scooter, I like the idea of supporting multiple threads and parallelizing code where possible. Is there a reference implementation that you would recommend for how progress listeners should be implemented? I suppose the neighbor joining code you mention below is not part of biojava... Andreas On Mon, May 11, 2009 at 6:50 AM, Scooter Willis wrote: > Andreas > > Another theme that should be considered is providing a multi-thread > version of any module with long run time. This would have a couple > elements. A progress listener interface should be standard where core > code would update progress messages to listeners that can be used by > external code to display feedback to the user. I did this with the > Neighbor Joining code for tree construction and it provides needed > feedback in a GUI. If not the user gets frustrated because they don't > know the code they are about to execute may take 10 minutes or 8 hours > to complete and they think the software is not working. The reverse is > also true for canceling an operation where you want to have core code > stop processing a long running loop. Once the code has completed then > the listener interface for process complete is called allowing the next > step in the external code to continue. The developer would have the > choice to call the "process" method or run it in a thread and wait for > the callback complete method to be called. > > This is the first step in the ability to have the core/long running > processes take advantage of multiple threads to complete the > computational task faster. Not all code can be parallelized easily but > if the algorithm can take advantage of running in parallel then it > should. This then opens up a couple of cloud computing frameworks that > extend the multi-threaded concepts in Java across a cluster > http://www.terracotta.org/. If we put an emphasis on having code that > runs well in a thread we are one step closer to an architecture that can > run in a cloud. The computational problems are only going to get bigger > and with Amazon EC2 and http://www.eucalyptus.com/ approaches > computational IO cycles are going to be cheap as long as the > software/libraries can easily take advantage of it. > > Thanks > > Scooter > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas > Prlic > Sent: Monday, May 11, 2009 12:27 AM > To: biojava-dev > Subject: [Biojava-dev] Plans for next biojava release - modularization > > Hi biojava-devs, > > It is time to start working on the next biojava release. ?I ?would > like to modularize the current code base and apply some of the ideas > that have emerged around Richard's "biojava 3" code. In principle the > idea is that all changes should be backwards compatible with the > interfaces provided by the current biojava 1.7 release. ?Backwards > compatibility shall only be broken if the functionality is being > replaced with something that works better, and gets documented > accordingly. For the build functionality I would suggest to stick with > what Richard's biojava 3 code base already is providing. Since we will > try to be backwards compatible all code development should be part of > the biojava-trunk and the first step will be to move the ant-build > scripts to a maven build process. Following this procedure will allow > to use e.g. the code refactoring tools provided by Eclipse, which > should come in handy. > > The modules I would like to see should provide self-contained > functionality and cross dependencies should be restricted to a > minimum. I would suggest to have the following modules: > > biojava-core: Contains everything that can not easily be modularized > or nobody volunteers to become a module maintainer. > biojava-phylogeny: Scooter expressed some interested to provide such a > module and become package maintainer for it. > biojava-structure: Everything protein structure related. I would be > package maintainer. > biojava-blast: Blast parsing is a frequently requested functionality > and it would be good to have this code self-contained. A package > maintainer for this still will need to be nominated at a later stage. > Any suggestions for other modules? > > Let me know what you think about this. > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From HWillis at scripps.edu Tue May 12 00:34:11 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 11 May 2009 20:34:11 -0400 Subject: [Biojava-dev] Plans for next biojava release - modularization References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com><061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> <59a41c430905111553n743dbcb3hbb21ec59294cb723@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C84F@FLMAIL1.fl.ad.scripps.edu> Andreas This is what I put together for the tree code as the interface. In the loop code of the algorithm you simply call the appropriate progress message where it could be cleaned up to have one progress method and a float for percentage complete. Passing the instance of NJTree was required for this specific case because all the work was done when the NJTree class was instantiated. It really should be cleaned up so that it has a process method and is runnable in a thread if needed. The progress listener could be generic for all long running classes. I have wrapped the NJTree code in a TreeConstructor class which bridges the biojava framework and allows the NJTree code to be replaced by something that is compatible with the BioJava open source license if needed. I am still playing around with performance optimizations and need to see if Jalview would contribute the NJTree code to BioJava. If not, I would do my own implementation as the algorithm is not difficult. I was also thinking that we could have Java code that provides functionality such as Blast by making a web service call to an external publicly supported service. Instead of parsing Blast results flat files you can make a call to an external service http://www.ebi.ac.uk/Tools/webservices/services/wublast via web services and get well structured results. Scooter package org.biojavax.phylo; import org.biojavax.phylo.jalview.NJTree; /** * * @author willishf */ public interface NJTreeProgressListener { public void progress(NJTree njtree,String state, int percentageComplete); public void progress(NJTree njtree,String state, int currentCount,int totalCount); public void complete(NJTree njtree); public void canceled(NJTree njtree); } ********************************************************************************************** This code could be abstracted out into a base class or simply added into a class that needs to notify external listeners ********************************************************************************************** Vector progessListenerVector = new Vector(); public void addProgessListener(NJTreeProgressListener treeProgessListener) { if (treeProgessListener != null) { progessListenerVector.add(treeProgessListener); } } public void removeProgessListener(NJTreeProgressListener treeProgessListener) { if (treeProgessListener != null) { progessListenerVector.remove(treeProgessListener); } } public void broadcastComplete() { for (NJTreeProgressListener treeProgressListener : progessListenerVector) { treeProgressListener.complete(this); } } public void updateProgress(String state, int percentage) { for (NJTreeProgressListener treeProgressListener : progessListenerVector) { treeProgressListener.progress(this,state, percentage); } } public void updateProgress(String state, int currentCount, int totalCount) { for (NJTreeProgressListener treeProgressListener : progessListenerVector) { treeProgressListener.progress(this,state, currentCount, totalCount); } } *************************************************************************************** /* * To change this template, choose Tools | Templates * and open the template in the editor. */ package org.biojavax.phylo; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.util.ArrayList; import java.util.Vector; import org.biojava.bio.BioException; import org.biojavax.phylo.jalview.NJTreeNew; import org.biojavax.phylo.jalview.TreeConstructionAlgorithm; import org.biojavax.phylo.jalview.TreeType; import org.biojava.bio.seq.*; import org.biojavax.SimpleNamespace; import org.biojavax.bio.seq.RichSequence; import org.biojavax.bio.seq.RichSequenceIterator; import org.biojavax.phylo.jalview.NJSequence; import org.biojavax.phylo.jalview.NJTree; /** * * @author willishf */ public class TreeConstructor extends Thread { NJTree njtree = null; NJSequence[] sequences = null; TreeType treeType; TreeConstructionAlgorithm treeConstructionAlgorithm; NJTreeProgressListener treeProgessListener; public TreeConstructor(SequenceIterator iter, TreeType _treeType, TreeConstructionAlgorithm _treeConstructionAlgorithm, NJTreeProgressListener _treeProgessListener) { treeType = _treeType; treeConstructionAlgorithm = _treeConstructionAlgorithm; treeProgessListener = _treeProgessListener; ArrayList sequenceArray = new ArrayList(); while (iter.hasNext()) { try { Sequence seq = iter.nextSequence(); NJSequence njsequence = new NJSequence(seq.getName(), seq.seqString()); sequenceArray.add(njsequence); } catch (Exception e) { e.printStackTrace(); } } sequences = new NJSequence[sequenceArray.size()]; sequenceArray.toArray(sequences); } public TreeConstructor(Vector sequenceVector, TreeType _treeType, TreeConstructionAlgorithm _treeConstructionAlgorithm, NJTreeProgressListener _treeProgessListener) { treeType = _treeType; treeConstructionAlgorithm = _treeConstructionAlgorithm; treeProgessListener = _treeProgessListener; sequences = new NJSequence[sequenceVector.size()]; int index = 0; for (RichSequence seq : sequenceVector) { NJSequence njsequence = new NJSequence(seq.getName(), seq.seqString()); sequences[index] = njsequence; index++; } } public void cancel(){ if(njtree != null) njtree.cancel(); } public void process() throws Exception { njtree = new NJTree(sequences, treeType, treeConstructionAlgorithm, treeProgessListener); } @Override public void run() { try { process(); } catch (Exception e) { e.printStackTrace(); } } public String getNewickString() { if (njtree != null) { return njtree.toString(); } return ""; } public static void main(String[] args) { if (args.length == 0) { args = new String[3]; args[0] = "C:\\MutualInformation\\project\\hiv\\hiv-genes-genome.fasta"; } try { //prepare a BufferedReader for file io BufferedReader br = new BufferedReader(new FileReader(args[0])); SimpleNamespace ns = new SimpleNamespace("biojava"); // You can use any of the convenience methods found in the BioJava 1.6 API RichSequenceIterator rsi = RichSequence.IOTools.readFastaProtein(br, ns); long readTime = System.currentTimeMillis(); TreeConstructor treeConstructor = new TreeConstructor(rsi, TreeType.NJ, TreeConstructionAlgorithm.PID, new ProgessListenerStub()); treeConstructor.process(); long treeTime = System.currentTimeMillis(); String newick = treeConstructor.getNewickString(); System.out.println("Tree time " + (treeTime - readTime)); System.out.println(newick); } catch (FileNotFoundException ex) { //can't find file specified by args[0] ex.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } } } -----Original Message----- From: andreas.prlic at gmail.com on behalf of Andreas Prlic Sent: Mon 5/11/2009 6:53 PM To: Scooter Willis Cc: biojava-dev Subject: Re: [Biojava-dev] Plans for next biojava release - modularization Hi Scooter, I like the idea of supporting multiple threads and parallelizing code where possible. Is there a reference implementation that you would recommend for how progress listeners should be implemented? I suppose the neighbor joining code you mention below is not part of biojava... Andreas On Mon, May 11, 2009 at 6:50 AM, Scooter Willis wrote: > Andreas > > Another theme that should be considered is providing a multi-thread > version of any module with long run time. This would have a couple > elements. A progress listener interface should be standard where core > code would update progress messages to listeners that can be used by > external code to display feedback to the user. I did this with the > Neighbor Joining code for tree construction and it provides needed > feedback in a GUI. If not the user gets frustrated because they don't > know the code they are about to execute may take 10 minutes or 8 hours > to complete and they think the software is not working. The reverse is > also true for canceling an operation where you want to have core code > stop processing a long running loop. Once the code has completed then > the listener interface for process complete is called allowing the next > step in the external code to continue. The developer would have the > choice to call the "process" method or run it in a thread and wait for > the callback complete method to be called. > > This is the first step in the ability to have the core/long running > processes take advantage of multiple threads to complete the > computational task faster. Not all code can be parallelized easily but > if the algorithm can take advantage of running in parallel then it > should. This then opens up a couple of cloud computing frameworks that > extend the multi-threaded concepts in Java across a cluster > http://www.terracotta.org/. If we put an emphasis on having code that > runs well in a thread we are one step closer to an architecture that can > run in a cloud. The computational problems are only going to get bigger > and with Amazon EC2 and http://www.eucalyptus.com/ approaches > computational IO cycles are going to be cheap as long as the > software/libraries can easily take advantage of it. > > Thanks > > Scooter > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas > Prlic > Sent: Monday, May 11, 2009 12:27 AM > To: biojava-dev > Subject: [Biojava-dev] Plans for next biojava release - modularization > > Hi biojava-devs, > > It is time to start working on the next biojava release. ?I ?would > like to modularize the current code base and apply some of the ideas > that have emerged around Richard's "biojava 3" code. In principle the > idea is that all changes should be backwards compatible with the > interfaces provided by the current biojava 1.7 release. ?Backwards > compatibility shall only be broken if the functionality is being > replaced with something that works better, and gets documented > accordingly. For the build functionality I would suggest to stick with > what Richard's biojava 3 code base already is providing. Since we will > try to be backwards compatible all code development should be part of > the biojava-trunk and the first step will be to move the ant-build > scripts to a maven build process. Following this procedure will allow > to use e.g. the code refactoring tools provided by Eclipse, which > should come in handy. > > The modules I would like to see should provide self-contained > functionality and cross dependencies should be restricted to a > minimum. I would suggest to have the following modules: > > biojava-core: Contains everything that can not easily be modularized > or nobody volunteers to become a module maintainer. > biojava-phylogeny: Scooter expressed some interested to provide such a > module and become package maintainer for it. > biojava-structure: Everything protein structure related. I would be > package maintainer. > biojava-blast: Blast parsing is a frequently requested functionality > and it would be good to have this code self-contained. A package > maintainer for this still will need to be nominated at a later stage. > Any suggestions for other modules? > > Let me know what you think about this. > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From mark.schreiber at novartis.com Tue May 12 05:26:33 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 12 May 2009 13:26:33 +0800 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> Message-ID: Hi - This was one thing we discussed previously with respect to biojava 3. Generally I support the idea because almost all computers are now multi-core and as you say cloud or utility computing is already a reality. However, I tend to think that biojava should not control threading or concurrency. This should be done by the developer. This is because sometimes mutithreading can be fast on a slow computer but slow on a fast computer (due to the overhead in spawning threads) so programs need to be tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc don't like people attempting to control their own threads. What BioJava should do is expose granular and thread-safe operations that can be threaded or form discrete tasks on a utility grid or complete in SessionBeans on an App server. For example it would be better if BioJava had a single threaded method to calculate the GC of a single sequence rather than a multi-threaded method that calculates the GC of multiple sequences. This would let the developer make a multithreaded version if desired or distribute multiple tasks based on the single threaded version to a compute cloud (and let the cloud manage all the tasks). Possibly the best situation would be to have the single threaded fine grain operations that let developers or grid engines control threading and then higher level APIs that do it for you (or good cookbook examples that show you how to do it). Another idea that was discussed was the use of properties files to allow people to set how many CPUs they wanted to make available to the JVM or name packages that can or cannot use threading. Finally, there are lots of times when it is highly desirable to use Java beans because they play well with dozens of Java api's however beans don't work well with threads because they have public setter methods. I would like to see a lot more bean use in a future BioJava because it would make life so much easier but a lot of care would need to be taken to make sure thread safety is preserved. There are many patterns that can be used such as synchronization locks etc to make things thread safe so I think this can be achieved as long as we are disciplined and consider that all methods may be used in a multi-threaded application (even if we write the method as a single thread). If there are code checkers that make suggestions on thread safety it would be great to have these as part of the standard build process. Good documentation would go a long way as well. Are there unit test patterns that can catch these problems as well? Suggestions would be great. Progress Listener patterns are good but it depends on the situation and might be better handled in high level APIs or left to the developer. For example in your NJ code a progress listener would be good if someone fed 1000 sequences into the method but not if they only put in 10. Also code running on an old machine might need a progress listener but the same problem on a new machine may complete almost instantly. Probably a pluggable listener would be the way to go. Also it might be possible to do this using the new JDK APIs that let you take a peek at the stack trace. Even if your NJ method didn't allow for a progress listener a developer could still make one by looking at the method calls in the stack. As long as your NJ method called other methods internally for each sequence (quite likely) it would be possible to observe the cycle of method calls from the stack. This might make it possible to have a very general BioJava progress listener that can be told to count the number of times a method is called in the stack. The name of the method would be the argument. If the application runs in a Java App server you can also do this very easily with a method Interceptor. - Mark biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > Andreas > > Another theme that should be considered is providing a multi-thread > version of any module with long run time. This would have a couple > elements. A progress listener interface should be standard where core > code would update progress messages to listeners that can be used by > external code to display feedback to the user. I did this with the > Neighbor Joining code for tree construction and it provides needed > feedback in a GUI. If not the user gets frustrated because they don't > know the code they are about to execute may take 10 minutes or 8 hours > to complete and they think the software is not working. The reverse is > also true for canceling an operation where you want to have core code > stop processing a long running loop. Once the code has completed then > the listener interface for process complete is called allowing the next > step in the external code to continue. The developer would have the > choice to call the "process" method or run it in a thread and wait for > the callback complete method to be called. > > This is the first step in the ability to have the core/long running > processes take advantage of multiple threads to complete the > computational task faster. Not all code can be parallelized easily but > if the algorithm can take advantage of running in parallel then it > should. This then opens up a couple of cloud computing frameworks that > extend the multi-threaded concepts in Java across a cluster > http://www.terracotta.org/. If we put an emphasis on having code that > runs well in a thread we are one step closer to an architecture that can > run in a cloud. The computational problems are only going to get bigger > and with Amazon EC2 and http://www.eucalyptus.com/ approaches > computational IO cycles are going to be cheap as long as the > software/libraries can easily take advantage of it. > > Thanks > > Scooter > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas > Prlic > Sent: Monday, May 11, 2009 12:27 AM > To: biojava-dev > Subject: [Biojava-dev] Plans for next biojava release - modularization > > Hi biojava-devs, > > It is time to start working on the next biojava release. I would > like to modularize the current code base and apply some of the ideas > that have emerged around Richard's "biojava 3" code. In principle the > idea is that all changes should be backwards compatible with the > interfaces provided by the current biojava 1.7 release. Backwards > compatibility shall only be broken if the functionality is being > replaced with something that works better, and gets documented > accordingly. For the build functionality I would suggest to stick with > what Richard's biojava 3 code base already is providing. Since we will > try to be backwards compatible all code development should be part of > the biojava-trunk and the first step will be to move the ant-build > scripts to a maven build process. Following this procedure will allow > to use e.g. the code refactoring tools provided by Eclipse, which > should come in handy. > > The modules I would like to see should provide self-contained > functionality and cross dependencies should be restricted to a > minimum. I would suggest to have the following modules: > > biojava-core: Contains everything that can not easily be modularized > or nobody volunteers to become a module maintainer. > biojava-phylogeny: Scooter expressed some interested to provide such a > module and become package maintainer for it. > biojava-structure: Everything protein structure related. I would be > package maintainer. > biojava-blast: Blast parsing is a frequently requested functionality > and it would be good to have this code self-contained. A package > maintainer for this still will need to be nominated at a later stage. > Any suggestions for other modules? > > Let me know what you think about this. > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From ayates at ebi.ac.uk Tue May 12 08:27:52 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 12 May 2009 09:27:52 +0100 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: References: Message-ID: <4A093308.4030409@ebi.ac.uk> I agree with Mark. Later versions of the Java environment will make concurrent programming easier not to mention languages already available on the VM (Scala & Clojure) that make it very easy indeed. Our goal in biojava must be to write code which will behave well in one of these environments. I don't want us to fall into the trap of earlier biojava where things like own implementations of database connection pooling data sources (sorry I don't mean to pick on any one part of the code but it highlights very well what we should avoid). We're bioinformaticians/engineers; lets do what we do best and work well within our chosen field. Let other people like Doug Lea deal with the pain that is concurrent programming & the alike :) Andy mark.schreiber at novartis.com wrote: > Hi - > > This was one thing we discussed previously with respect to biojava 3. > Generally I support the idea because almost all computers are now > multi-core and as you say cloud or utility computing is already a reality. > > However, I tend to think that biojava should not control threading or > concurrency. This should be done by the developer. This is because > sometimes mutithreading can be fast on a slow computer but slow on a fast > computer (due to the overhead in spawning threads) so programs need to be > tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc > don't like people attempting to control their own threads. What BioJava > should do is expose granular and thread-safe operations that can be > threaded or form discrete tasks on a utility grid or complete in > SessionBeans on an App server. For example it would be better if BioJava > had a single threaded method to calculate the GC of a single sequence > rather than a multi-threaded method that calculates the GC of multiple > sequences. This would let the developer make a multithreaded version if > desired or distribute multiple tasks based on the single threaded version > to a compute cloud (and let the cloud manage all the tasks). > > Possibly the best situation would be to have the single threaded fine > grain operations that let developers or grid engines control threading and > then higher level APIs that do it for you (or good cookbook examples that > show you how to do it). Another idea that was discussed was the use of > properties files to allow people to set how many CPUs they wanted to make > available to the JVM or name packages that can or cannot use threading. > > Finally, there are lots of times when it is highly desirable to use Java > beans because they play well with dozens of Java api's however beans don't > work well with threads because they have public setter methods. I would > like to see a lot more bean use in a future BioJava because it would make > life so much easier but a lot of care would need to be taken to make sure > thread safety is preserved. There are many patterns that can be used such > as synchronization locks etc to make things thread safe so I think this > can be achieved as long as we are disciplined and consider that all > methods may be used in a multi-threaded application (even if we write the > method as a single thread). If there are code checkers that make > suggestions on thread safety it would be great to have these as part of > the standard build process. Good documentation would go a long way as > well. Are there unit test patterns that can catch these problems as well? > Suggestions would be great. > > Progress Listener patterns are good but it depends on the situation and > might be better handled in high level APIs or left to the developer. For > example in your NJ code a progress listener would be good if someone fed > 1000 sequences into the method but not if they only put in 10. Also code > running on an old machine might need a progress listener but the same > problem on a new machine may complete almost instantly. Probably a > pluggable listener would be the way to go. Also it might be possible to > do this using the new JDK APIs that let you take a peek at the stack > trace. Even if your NJ method didn't allow for a progress listener a > developer could still make one by looking at the method calls in the > stack. As long as your NJ method called other methods internally for each > sequence (quite likely) it would be possible to observe the cycle of > method calls from the stack. This might make it possible to have a very > general BioJava progress listener that can be told to count the number of > times a method is called in the stack. The name of the method would be the > argument. If the application runs in a Java App server you can also do > this very easily with a method Interceptor. > > - Mark > > biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > >> Andreas >> >> Another theme that should be considered is providing a multi-thread >> version of any module with long run time. This would have a couple >> elements. A progress listener interface should be standard where core >> code would update progress messages to listeners that can be used by >> external code to display feedback to the user. I did this with the >> Neighbor Joining code for tree construction and it provides needed >> feedback in a GUI. If not the user gets frustrated because they don't >> know the code they are about to execute may take 10 minutes or 8 hours >> to complete and they think the software is not working. The reverse is >> also true for canceling an operation where you want to have core code >> stop processing a long running loop. Once the code has completed then >> the listener interface for process complete is called allowing the next >> step in the external code to continue. The developer would have the >> choice to call the "process" method or run it in a thread and wait for >> the callback complete method to be called. >> >> This is the first step in the ability to have the core/long running >> processes take advantage of multiple threads to complete the >> computational task faster. Not all code can be parallelized easily but >> if the algorithm can take advantage of running in parallel then it >> should. This then opens up a couple of cloud computing frameworks that >> extend the multi-threaded concepts in Java across a cluster >> http://www.terracotta.org/. If we put an emphasis on having code that >> runs well in a thread we are one step closer to an architecture that can >> run in a cloud. The computational problems are only going to get bigger >> and with Amazon EC2 and http://www.eucalyptus.com/ approaches >> computational IO cycles are going to be cheap as long as the >> software/libraries can easily take advantage of it. >> >> Thanks >> >> Scooter >> >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas >> Prlic >> Sent: Monday, May 11, 2009 12:27 AM >> To: biojava-dev >> Subject: [Biojava-dev] Plans for next biojava release - modularization >> >> Hi biojava-devs, >> >> It is time to start working on the next biojava release. I would >> like to modularize the current code base and apply some of the ideas >> that have emerged around Richard's "biojava 3" code. In principle the >> idea is that all changes should be backwards compatible with the >> interfaces provided by the current biojava 1.7 release. Backwards >> compatibility shall only be broken if the functionality is being >> replaced with something that works better, and gets documented >> accordingly. For the build functionality I would suggest to stick with >> what Richard's biojava 3 code base already is providing. Since we will >> try to be backwards compatible all code development should be part of >> the biojava-trunk and the first step will be to move the ant-build >> scripts to a maven build process. Following this procedure will allow >> to use e.g. the code refactoring tools provided by Eclipse, which >> should come in handy. >> >> The modules I would like to see should provide self-contained >> functionality and cross dependencies should be restricted to a >> minimum. I would suggest to have the following modules: >> >> biojava-core: Contains everything that can not easily be modularized >> or nobody volunteers to become a module maintainer. >> biojava-phylogeny: Scooter expressed some interested to provide such a >> module and become package maintainer for it. >> biojava-structure: Everything protein structure related. I would be >> package maintainer. >> biojava-blast: Blast parsing is a frequently requested functionality >> and it would be good to have this code self-contained. A package >> maintainer for this still will need to be nominated at a later stage. >> Any suggestions for other modules? >> >> Let me know what you think about this. >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure > under applicable law. If the reader of this message is not the intended > recipient, or the employee or agent responsible for delivery of the > message to the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this communication is strictly > prohibited. If you have received this communication in error, please > notify the sender immediately by e-mail and delete the material from any > computer. Thank you. > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at eaglegenomics.com Tue May 12 08:26:26 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 12 May 2009 09:26:26 +0100 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> Message-ID: <1242116786.7101.7.camel@buzzybee> The BJ3 code contains only as much code as is needed to represent sequences and to parse/write simple FASTA. It should be viewed as a concept. In particular the file parsing mechanism is quite flexible (if a little complex) but easily wrapped with simple one-liner utility methods to provide end-users with easier-to-use APIs. Sequence representation in BJ3 is done via the Collections API. It's set up in such a way that you can write something yourself that implements the List API and behaves like a List but internally uses a more compact or even offline storage mechanism to represent the sequence. This allows you to reuse sequences wherever Lists can be used, e.g. in Iterators or foreach-loops. Everything written so far has been documented here: http://biojava.org/wiki/BioJava3:HowTo cheers, Richard On Sun, 2009-05-10 at 21:26 -0700, Andreas Prlic wrote: > Hi biojava-devs, > > It is time to start working on the next biojava release. I would > like to modularize the current code base and apply some of the ideas > that have emerged around Richard's "biojava 3" code. In principle the > idea is that all changes should be backwards compatible with the > interfaces provided by the current biojava 1.7 release. Backwards > compatibility shall only be broken if the functionality is being > replaced with something that works better, and gets documented > accordingly. For the build functionality I would suggest to stick with > what Richard's biojava 3 code base already is providing. Since we will > try to be backwards compatible all code development should be part of > the biojava-trunk and the first step will be to move the ant-build > scripts to a maven build process. Following this procedure will allow > to use e.g. the code refactoring tools provided by Eclipse, which > should come in handy. > > The modules I would like to see should provide self-contained > functionality and cross dependencies should be restricted to a > minimum. I would suggest to have the following modules: > > biojava-core: Contains everything that can not easily be modularized > or nobody volunteers to become a module maintainer. > biojava-phylogeny: Scooter expressed some interested to provide such a > module and become package maintainer for it. > biojava-structure: Everything protein structure related. I would be > package maintainer. > biojava-blast: Blast parsing is a frequently requested functionality > and it would be good to have this code self-contained. A package > maintainer for this still will need to be nominated at a later stage. > Any suggestions for other modules? > > Let me know what you think about this. > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From HWillis at scripps.edu Tue May 12 13:34:51 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 12 May 2009 09:34:51 -0400 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> Message-ID: <061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu> Mark It is a challenge on knowing where to draw the line. Allowing both options is a reasonable approach. The implementation of the algorithm is key to allow it to be multi-threaded or being able to run in parallel. One approach is to provide a standard interface such as process() would wait for the result/return value and run in the parent thread. To run the algorithm in a thread you can have a startProcess() where you can add yourself as a progress listener and when complete() method is called you can call getResults(). You can then also have the corresponding stopProcess() which would set an internal value to cause all threads to quit. Lots of ways to tackle the problem the key is to start talking about it and at minimum take advantage of multiple-cores where the external code can set the number of cores to use. You can get a dual quad core machine these days for < $1000 but most software implementations are not designed to take advantage of it. The real question is what exists today in the BioJava API that is considered long running in normal use case and thus is a candidate to be run in parallel. It may not be an issue in existing BioJava code. When I first started using BioJava I went looking for BLAST code only to find a BLAST parser. I wanted to do a Multiple Sequence Alignment and turns out that Biojava code calls CLUSTALW as an external processor under the covers. I also needed code to construct trees from an MSA and found the summer of code project that was only focused on representing the tree. It would be nice to have a BLAST implementation in Java optimized to run on a cluster but who has time to rewrite BLAST in Java when you can do BLAST search via the web and focus on parsing the results. BioJava needs a BLAST API that makes a web services call to an external service and gets returns structured results in core BioJava structures. Probably not difficult to do a Java version of CLUSTALW but again we can push the work out to http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results back returned in BioJava structures. I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web service -> BioJava code. I haven't done the research but it seems that http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to expose common biology computational services. If multiple external services are offering BLAST via web services where each picked a different implementation then BioJava could provide abstraction to different services. Thanks Scooter From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] Sent: Tuesday, May 12, 2009 1:27 AM To: Scooter Willis Cc: Andreas Prlic; biojava-dev Subject: Re: [Biojava-dev] Plans for next biojava release - modularization Hi - This was one thing we discussed previously with respect to biojava 3. Generally I support the idea because almost all computers are now multi-core and as you say cloud or utility computing is already a reality. However, I tend to think that biojava should not control threading or concurrency. This should be done by the developer. This is because sometimes mutithreading can be fast on a slow computer but slow on a fast computer (due to the overhead in spawning threads) so programs need to be tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc don't like people attempting to control their own threads. What BioJava should do is expose granular and thread-safe operations that can be threaded or form discrete tasks on a utility grid or complete in SessionBeans on an App server. For example it would be better if BioJava had a single threaded method to calculate the GC of a single sequence rather than a multi-threaded method that calculates the GC of multiple sequences. This would let the developer make a multithreaded version if desired or distribute multiple tasks based on the single threaded version to a compute cloud (and let the cloud manage all the tasks). Possibly the best situation would be to have the single threaded fine grain operations that let developers or grid engines control threading and then higher level APIs that do it for you (or good cookbook examples that show you how to do it). Another idea that was discussed was the use of properties files to allow people to set how many CPUs they wanted to make available to the JVM or name packages that can or cannot use threading. Finally, there are lots of times when it is highly desirable to use Java beans because they play well with dozens of Java api's however beans don't work well with threads because they have public setter methods. I would like to see a lot more bean use in a future BioJava because it would make life so much easier but a lot of care would need to be taken to make sure thread safety is preserved. There are many patterns that can be used such as synchronization locks etc to make things thread safe so I think this can be achieved as long as we are disciplined and consider that all methods may be used in a multi-threaded application (even if we write the method as a single thread). If there are code checkers that make suggestions on thread safety it would be great to have these as part of the standard build process. Good documentation would go a long way as well. Are there unit test patterns that can catch these problems as well? Suggestions would be great. Progress Listener patterns are good but it depends on the situation and might be better handled in high level APIs or left to the developer. For example in your NJ code a progress listener would be good if someone fed 1000 sequences into the method but not if they only put in 10. Also code running on an old machine might need a progress listener but the same problem on a new machine may complete almost instantly. Probably a pluggable listener would be the way to go. Also it might be possible to do this using the new JDK APIs that let you take a peek at the stack trace. Even if your NJ method didn't allow for a progress listener a developer could still make one by looking at the method calls in the stack. As long as your NJ method called other methods internally for each sequence (quite likely) it would be possible to observe the cycle of method calls from the stack. This might make it possible to have a very general BioJava progress listener that can be told to count the number of times a method is called in the stack. The name of the method would be the argument. If the application runs in a Java App server you can also do this very easily with a method Interceptor. - Mark biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > Andreas > > Another theme that should be considered is providing a multi-thread > version of any module with long run time. This would have a couple > elements. A progress listener interface should be standard where core > code would update progress messages to listeners that can be used by > external code to display feedback to the user. I did this with the > Neighbor Joining code for tree construction and it provides needed > feedback in a GUI. If not the user gets frustrated because they don't > know the code they are about to execute may take 10 minutes or 8 hours > to complete and they think the software is not working. The reverse is > also true for canceling an operation where you want to have core code > stop processing a long running loop. Once the code has completed then > the listener interface for process complete is called allowing the next > step in the external code to continue. The developer would have the > choice to call the "process" method or run it in a thread and wait for > the callback complete method to be called. > > This is the first step in the ability to have the core/long running > processes take advantage of multiple threads to complete the > computational task faster. Not all code can be parallelized easily but > if the algorithm can take advantage of running in parallel then it > should. This then opens up a couple of cloud computing frameworks that > extend the multi-threaded concepts in Java across a cluster > http://www.terracotta.org/. If we put an emphasis on having code that > runs well in a thread we are one step closer to an architecture that can > run in a cloud. The computational problems are only going to get bigger > and with Amazon EC2 and http://www.eucalyptus.com/ approaches > computational IO cycles are going to be cheap as long as the > software/libraries can easily take advantage of it. > > Thanks > > Scooter > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas > Prlic > Sent: Monday, May 11, 2009 12:27 AM > To: biojava-dev > Subject: [Biojava-dev] Plans for next biojava release - modularization > > Hi biojava-devs, > > It is time to start working on the next biojava release. I would > like to modularize the current code base and apply some of the ideas > that have emerged around Richard's "biojava 3" code. In principle the > idea is that all changes should be backwards compatible with the > interfaces provided by the current biojava 1.7 release. Backwards > compatibility shall only be broken if the functionality is being > replaced with something that works better, and gets documented > accordingly. For the build functionality I would suggest to stick with > what Richard's biojava 3 code base already is providing. Since we will > try to be backwards compatible all code development should be part of > the biojava-trunk and the first step will be to move the ant-build > scripts to a maven build process. Following this procedure will allow > to use e.g. the code refactoring tools provided by Eclipse, which > should come in handy. > > The modules I would like to see should provide self-contained > functionality and cross dependencies should be restricted to a > minimum. I would suggest to have the following modules: > > biojava-core: Contains everything that can not easily be modularized > or nobody volunteers to become a module maintainer. > biojava-phylogeny: Scooter expressed some interested to provide such a > module and become package maintainer for it. > biojava-structure: Everything protein structure related. I would be > package maintainer. > biojava-blast: Blast parsing is a frequently requested functionality > and it would be good to have this code self-contained. A package > maintainer for this still will need to be nominated at a later stage. > Any suggestions for other modules? > > Let me know what you think about this. > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From andreas at sdsc.edu Tue May 12 23:52:51 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 12 May 2009 16:52:51 -0700 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <1242116786.7101.7.camel@buzzybee> References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com> <1242116786.7101.7.camel@buzzybee> Message-ID: <59a41c430905121652s7c548985xd9261734b42a4182@mail.gmail.com> Hi Richard, Do you think the BJ3 code could form the beginning of a new biojava-sequence module and can become part of the next release? Andreas On Tue, May 12, 2009 at 1:26 AM, Richard Holland wrote: > The BJ3 code contains only as much code as is needed to represent > sequences and to parse/write simple FASTA. It should be viewed as a > concept. In particular the file parsing mechanism is quite flexible (if > a little complex) but easily wrapped with simple one-liner utility > methods to provide end-users with easier-to-use APIs. > > Sequence representation in BJ3 is done via the Collections API. It's set > up in such a way that you can write something yourself that implements > the List API and behaves like a List but internally uses a more compact > or even offline storage mechanism to represent the sequence. This allows > you to reuse sequences wherever Lists can be used, e.g. in Iterators or > foreach-loops. > > Everything written so far has been documented here: > > ?http://biojava.org/wiki/BioJava3:HowTo > > cheers, > Richard > > > > On Sun, 2009-05-10 at 21:26 -0700, Andreas Prlic wrote: >> Hi biojava-devs, >> >> It is time to start working on the next biojava release. ?I ?would >> like to modularize the current code base and apply some of the ideas >> that have emerged around Richard's "biojava 3" code. In principle the >> idea is that all changes should be backwards compatible with the >> interfaces provided by the current biojava 1.7 release. ?Backwards >> compatibility shall only be broken if the functionality is being >> replaced with something that works better, and gets documented >> accordingly. For the build functionality I would suggest to stick with >> what Richard's biojava 3 code base already is providing. Since we will >> try to be backwards compatible all code development should be part of >> the biojava-trunk and the first step will be to move the ant-build >> scripts to a maven build process. Following this procedure will allow >> to use e.g. the code refactoring tools provided by Eclipse, which >> should come in handy. >> >> The modules I would like to see should provide self-contained >> functionality and cross dependencies should be restricted to a >> minimum. I would suggest to have the following modules: >> >> biojava-core: Contains everything that can not easily be modularized >> or nobody volunteers to become a module maintainer. >> biojava-phylogeny: Scooter expressed some interested to provide such a >> module and become package maintainer for it. >> biojava-structure: Everything protein structure related. I would be >> package maintainer. >> biojava-blast: Blast parsing is a frequently requested functionality >> and it would be good to have this code self-contained. A package >> maintainer for this still will need to be nominated at a later stage. >> Any suggestions for other modules? >> >> Let me know what you think about this. >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > From andreas at sdsc.edu Tue May 12 23:59:11 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 12 May 2009 16:59:11 -0700 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu> References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu> Message-ID: <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com> Hi Scooter, about your suggestion for the blast webservice client code: In principle I like the idea and we have had questions on the mailing list regarding this in the past. Only thing is I think there is already some client code in java available: http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp but I am not sure how good that Java client library is.... Besides this, there is the need for work on our blast parser library and if you are interested in working on that you are welcome. As I mentioned, I think this should become its own module, due to the popularity of that code. Andreas On Tue, May 12, 2009 at 6:34 AM, Scooter Willis wrote: > Mark > > > > It is a challenge on knowing where to draw the line. Allowing both options > is a reasonable approach. The implementation of the algorithm is key to > allow it to be multi-threaded or being able to run in parallel. One approach > is to provide a standard interface such as process() would wait for the > result/return value and run in the parent thread. To run the algorithm in a > thread you can have a startProcess() where you can add yourself as a > progress listener and when complete() method is called you can call > getResults(). You can then also have the corresponding stopProcess() which > would set an internal value to cause all threads to quit. ?Lots of ways to > tackle the problem the key is to start talking about it and at minimum take > advantage of multiple-cores where the external code can set the number of > cores to use. You can get a dual quad core machine these days for < $1000 > but most software implementations are not designed to take advantage of it. > > > > The real question is what exists today in the BioJava API that is considered > long running in normal use case and thus is a candidate to be run in > parallel. It may not be an issue in existing BioJava code. When I first > started using BioJava I went looking for BLAST code only to find a BLAST > parser. I wanted to do a Multiple Sequence Alignment and turns out that > Biojava code calls CLUSTALW as an external processor under the covers. ?I > also needed code to construct trees from an MSA and found the summer of code > project that was only focused on representing the tree. > > > > It would be nice to have a BLAST implementation in Java optimized to run on > a cluster but who has time to rewrite BLAST in Java when you can do BLAST > search via the web and focus on parsing the results. BioJava needs a BLAST > API that makes a web services call to an external service and gets returns > structured results in core BioJava structures. Probably not difficult to do > a Java version of CLUSTALW but again we can push the work out to > http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results > back returned in BioJava structures. > > > > I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web > service -> BioJava code. I haven?t done the research but it seems that > http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to > expose common biology ?computational services. If multiple external services > are offering BLAST via web services where each picked a different > implementation then BioJava could provide abstraction to different services. > > > > Thanks > > Scooter > > > > From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] > Sent: Tuesday, May 12, 2009 1:27 AM > To: Scooter Willis > Cc: Andreas Prlic; biojava-dev > Subject: Re: [Biojava-dev] Plans for next biojava release - modularization > > > > Hi - > > This was one thing we discussed previously with respect to biojava 3. > ?Generally I support the idea because almost all computers are now > multi-core and as you say cloud or utility computing is already a reality. > > However, I tend to think that biojava should not control threading or > concurrency. This should be done by the developer. This is because sometimes > mutithreading can be fast on a slow computer but slow on a fast computer > (due to the overhead in spawning threads) so programs need to be tunable. > Also Java app servers and things like Sun Grid Engine, EC2 etc don't like > people attempting to control their own threads. ?What BioJava should do is > expose granular and thread-safe operations that can be threaded or form > discrete tasks on a utility grid or complete in SessionBeans on an App > server. ?For example it would be better if BioJava had a single threaded > method to calculate the GC of a single sequence rather than a multi-threaded > method that calculates the GC of multiple sequences. ?This would let the > developer make a multithreaded version if desired or distribute multiple > tasks based on the single threaded version to a compute cloud (and let the > cloud manage all the tasks). > > Possibly the best situation would be to have the single threaded fine grain > operations that let developers or grid engines control threading and then > higher level APIs that do it for you (or good cookbook examples that show > you how to do it). ?Another idea that was discussed was the use of > properties files to allow people to set how many CPUs they wanted to make > available to the JVM or name packages that can or cannot use threading. > > Finally, there are lots of times when it is highly desirable to use Java > beans because they play well with dozens of Java api's however beans don't > work well with threads because they have public setter methods. ?I would > like to see a lot more bean use in a future BioJava because it would make > life so much easier but a lot of care would need to be taken to make sure > thread safety is preserved. ?There are many patterns that can be used such > as synchronization locks etc to make things thread safe so I think this can > be achieved as long as we are disciplined and consider that all methods may > be used in a multi-threaded application (even if we write the method as a > single thread). ?If there are code checkers that make suggestions on thread > safety it would be great to have these as part of the standard build > process. ?Good documentation would go a long way as well. ?Are there unit > test patterns that can catch these problems as well? ?Suggestions would be > great. > > Progress Listener patterns are good but it depends on the situation and > might be better handled in high level APIs or left to the developer. ?For > example in your NJ code a progress listener would be good if someone fed > 1000 sequences into the method but not if they only put in 10. Also code > running on an old machine might need a progress listener but the same > problem on a new machine may complete almost instantly. ?Probably a > pluggable listener would be the way to go. ?Also it might be possible to do > this using the new JDK APIs that let you take a peek at the stack trace. > Even if your NJ method didn't allow for a progress listener a developer > could still make one by looking at the method calls in the stack. As long as > your NJ method called other methods internally for each sequence (quite > likely) it would be possible to observe the cycle of method calls from the > stack. ?This might make it possible to have a very general BioJava progress > listener that can be told to count the number of times a method is called in > the stack. The name of the method would be the argument. ?If the application > runs in a Java App server you can also do this very easily with a method > Interceptor. > > - Mark > > biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > >> Andreas >> >> Another theme that should be considered is providing a multi-thread >> version of any module with long run time. This would have a couple >> elements. A progress listener interface should be standard where core >> code would update progress messages to listeners that can be used by >> external code to display feedback to the user. I did this with the >> Neighbor Joining code for tree construction and it provides needed >> feedback in a GUI. If not the user gets frustrated because they don't >> know the code they are about to execute may take 10 minutes or 8 hours >> to complete and they think the software is not working. The reverse is >> also true for canceling an operation where you want to have core code >> stop processing a long running loop. Once the code has completed then >> the listener interface for process complete is called allowing the next >> step in the external code to continue. The developer would have the >> choice to call the "process" method or run it in a thread and wait for >> the callback complete method to be called. >> >> This is the first step in the ability to have the core/long running >> processes take advantage of multiple threads to complete the >> computational task faster. Not all code can be parallelized easily but >> if the algorithm can take advantage of running in parallel then it >> should. This then opens up a couple of cloud computing frameworks that >> extend the multi-threaded concepts in Java across a cluster >> http://www.terracotta.org/. If we put an emphasis on having code that >> runs well in a thread we are one step closer to an architecture that can >> run in a cloud. The computational problems are only going to get bigger >> and with Amazon EC2 and http://www.eucalyptus.com/ approaches >> computational IO cycles are going to be cheap as long as the >> software/libraries can easily take advantage of it. >> >> Thanks >> >> Scooter >> >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas >> Prlic >> Sent: Monday, May 11, 2009 12:27 AM >> To: biojava-dev >> Subject: [Biojava-dev] Plans for next biojava release - modularization >> >> Hi biojava-devs, >> >> It is time to start working on the next biojava release. ?I ?would >> like to modularize the current code base and apply some of the ideas >> that have emerged around Richard's "biojava 3" code. In principle the >> idea is that all changes should be backwards compatible with the >> interfaces provided by the current biojava 1.7 release. ?Backwards >> compatibility shall only be broken if the functionality is being >> replaced with something that works better, and gets documented >> accordingly. For the build functionality I would suggest to stick with >> what Richard's biojava 3 code base already is providing. Since we will >> try to be backwards compatible all code development should be part of >> the biojava-trunk and the first step will be to move the ant-build >> scripts to a maven build process. Following this procedure will allow >> to use e.g. the code refactoring tools provided by Eclipse, which >> should come in handy. >> >> The modules I would like to see should provide self-contained >> functionality and cross dependencies should be restricted to a >> minimum. I would suggest to have the following modules: >> >> biojava-core: Contains everything that can not easily be modularized >> or nobody volunteers to become a module maintainer. >> biojava-phylogeny: Scooter expressed some interested to provide such a >> module and become package maintainer for it. >> biojava-structure: Everything protein structure related. I would be >> package maintainer. >> biojava-blast: Blast parsing is a frequently requested functionality >> and it would be good to have this code self-contained. A package >> maintainer for this still will need to be nominated at a later stage. >> Any suggestions for other modules? >> >> Let me know what you think about this. >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure under > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivery of the message to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. If you > have received this communication in error, please notify the sender > immediately by e-mail and delete the material from any computer. ?Thank you. From HWillis at scripps.edu Wed May 13 00:13:45 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 12 May 2009 20:13:45 -0400 Subject: [Biojava-dev] Plans for next biojava release - modularization References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu><061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu> <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C855@FLMAIL1.fl.ad.scripps.edu> Andreas The goal for BioJava could be to provide a wrapper for the http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp java code so that inputs/outputs are BioJava. I think they are using Axis for the client web services code. If BioJava 3 is going to be Java 6 minimum then it is easier to use the Java 6 SOAP processing capabilities by pointing to the WSDL code and generating the Java code for the client side. This cuts down on the additional external 3rd parties that are required. I try to stay out of the legacy file parsing business whenever possible. Scooter -----Original Message----- From: andreas.prlic at gmail.com on behalf of Andreas Prlic Sent: Tue 5/12/2009 7:59 PM To: Scooter Willis Cc: biojava-dev Subject: Re: [Biojava-dev] Plans for next biojava release - modularization Hi Scooter, about your suggestion for the blast webservice client code: In principle I like the idea and we have had questions on the mailing list regarding this in the past. Only thing is I think there is already some client code in java available: http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp but I am not sure how good that Java client library is.... Besides this, there is the need for work on our blast parser library and if you are interested in working on that you are welcome. As I mentioned, I think this should become its own module, due to the popularity of that code. Andreas On Tue, May 12, 2009 at 6:34 AM, Scooter Willis wrote: > Mark > > > > It is a challenge on knowing where to draw the line. Allowing both options > is a reasonable approach. The implementation of the algorithm is key to > allow it to be multi-threaded or being able to run in parallel. One approach > is to provide a standard interface such as process() would wait for the > result/return value and run in the parent thread. To run the algorithm in a > thread you can have a startProcess() where you can add yourself as a > progress listener and when complete() method is called you can call > getResults(). You can then also have the corresponding stopProcess() which > would set an internal value to cause all threads to quit. ?Lots of ways to > tackle the problem the key is to start talking about it and at minimum take > advantage of multiple-cores where the external code can set the number of > cores to use. You can get a dual quad core machine these days for < $1000 > but most software implementations are not designed to take advantage of it. > > > > The real question is what exists today in the BioJava API that is considered > long running in normal use case and thus is a candidate to be run in > parallel. It may not be an issue in existing BioJava code. When I first > started using BioJava I went looking for BLAST code only to find a BLAST > parser. I wanted to do a Multiple Sequence Alignment and turns out that > Biojava code calls CLUSTALW as an external processor under the covers. ?I > also needed code to construct trees from an MSA and found the summer of code > project that was only focused on representing the tree. > > > > It would be nice to have a BLAST implementation in Java optimized to run on > a cluster but who has time to rewrite BLAST in Java when you can do BLAST > search via the web and focus on parsing the results. BioJava needs a BLAST > API that makes a web services call to an external service and gets returns > structured results in core BioJava structures. Probably not difficult to do > a Java version of CLUSTALW but again we can push the work out to > http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results > back returned in BioJava structures. > > > > I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web > service -> BioJava code. I haven't done the research but it seems that > http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to > expose common biology ?computational services. If multiple external services > are offering BLAST via web services where each picked a different > implementation then BioJava could provide abstraction to different services. > > > > Thanks > > Scooter > > > > From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] > Sent: Tuesday, May 12, 2009 1:27 AM > To: Scooter Willis > Cc: Andreas Prlic; biojava-dev > Subject: Re: [Biojava-dev] Plans for next biojava release - modularization > > > > Hi - > > This was one thing we discussed previously with respect to biojava 3. > ?Generally I support the idea because almost all computers are now > multi-core and as you say cloud or utility computing is already a reality. > > However, I tend to think that biojava should not control threading or > concurrency. This should be done by the developer. This is because sometimes > mutithreading can be fast on a slow computer but slow on a fast computer > (due to the overhead in spawning threads) so programs need to be tunable. > Also Java app servers and things like Sun Grid Engine, EC2 etc don't like > people attempting to control their own threads. ?What BioJava should do is > expose granular and thread-safe operations that can be threaded or form > discrete tasks on a utility grid or complete in SessionBeans on an App > server. ?For example it would be better if BioJava had a single threaded > method to calculate the GC of a single sequence rather than a multi-threaded > method that calculates the GC of multiple sequences. ?This would let the > developer make a multithreaded version if desired or distribute multiple > tasks based on the single threaded version to a compute cloud (and let the > cloud manage all the tasks). > > Possibly the best situation would be to have the single threaded fine grain > operations that let developers or grid engines control threading and then > higher level APIs that do it for you (or good cookbook examples that show > you how to do it). ?Another idea that was discussed was the use of > properties files to allow people to set how many CPUs they wanted to make > available to the JVM or name packages that can or cannot use threading. > > Finally, there are lots of times when it is highly desirable to use Java > beans because they play well with dozens of Java api's however beans don't > work well with threads because they have public setter methods. ?I would > like to see a lot more bean use in a future BioJava because it would make > life so much easier but a lot of care would need to be taken to make sure > thread safety is preserved. ?There are many patterns that can be used such > as synchronization locks etc to make things thread safe so I think this can > be achieved as long as we are disciplined and consider that all methods may > be used in a multi-threaded application (even if we write the method as a > single thread). ?If there are code checkers that make suggestions on thread > safety it would be great to have these as part of the standard build > process. ?Good documentation would go a long way as well. ?Are there unit > test patterns that can catch these problems as well? ?Suggestions would be > great. > > Progress Listener patterns are good but it depends on the situation and > might be better handled in high level APIs or left to the developer. ?For > example in your NJ code a progress listener would be good if someone fed > 1000 sequences into the method but not if they only put in 10. Also code > running on an old machine might need a progress listener but the same > problem on a new machine may complete almost instantly. ?Probably a > pluggable listener would be the way to go. ?Also it might be possible to do > this using the new JDK APIs that let you take a peek at the stack trace. > Even if your NJ method didn't allow for a progress listener a developer > could still make one by looking at the method calls in the stack. As long as > your NJ method called other methods internally for each sequence (quite > likely) it would be possible to observe the cycle of method calls from the > stack. ?This might make it possible to have a very general BioJava progress > listener that can be told to count the number of times a method is called in > the stack. The name of the method would be the argument. ?If the application > runs in a Java App server you can also do this very easily with a method > Interceptor. > > - Mark > > biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > >> Andreas >> >> Another theme that should be considered is providing a multi-thread >> version of any module with long run time. This would have a couple >> elements. A progress listener interface should be standard where core >> code would update progress messages to listeners that can be used by >> external code to display feedback to the user. I did this with the >> Neighbor Joining code for tree construction and it provides needed >> feedback in a GUI. If not the user gets frustrated because they don't >> know the code they are about to execute may take 10 minutes or 8 hours >> to complete and they think the software is not working. The reverse is >> also true for canceling an operation where you want to have core code >> stop processing a long running loop. Once the code has completed then >> the listener interface for process complete is called allowing the next >> step in the external code to continue. The developer would have the >> choice to call the "process" method or run it in a thread and wait for >> the callback complete method to be called. >> >> This is the first step in the ability to have the core/long running >> processes take advantage of multiple threads to complete the >> computational task faster. Not all code can be parallelized easily but >> if the algorithm can take advantage of running in parallel then it >> should. This then opens up a couple of cloud computing frameworks that >> extend the multi-threaded concepts in Java across a cluster >> http://www.terracotta.org/. If we put an emphasis on having code that >> runs well in a thread we are one step closer to an architecture that can >> run in a cloud. The computational problems are only going to get bigger >> and with Amazon EC2 and http://www.eucalyptus.com/ approaches >> computational IO cycles are going to be cheap as long as the >> software/libraries can easily take advantage of it. >> >> Thanks >> >> Scooter >> >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas >> Prlic >> Sent: Monday, May 11, 2009 12:27 AM >> To: biojava-dev >> Subject: [Biojava-dev] Plans for next biojava release - modularization >> >> Hi biojava-devs, >> >> It is time to start working on the next biojava release. ?I ?would >> like to modularize the current code base and apply some of the ideas >> that have emerged around Richard's "biojava 3" code. In principle the >> idea is that all changes should be backwards compatible with the >> interfaces provided by the current biojava 1.7 release. ?Backwards >> compatibility shall only be broken if the functionality is being >> replaced with something that works better, and gets documented >> accordingly. For the build functionality I would suggest to stick with >> what Richard's biojava 3 code base already is providing. Since we will >> try to be backwards compatible all code development should be part of >> the biojava-trunk and the first step will be to move the ant-build >> scripts to a maven build process. Following this procedure will allow >> to use e.g. the code refactoring tools provided by Eclipse, which >> should come in handy. >> >> The modules I would like to see should provide self-contained >> functionality and cross dependencies should be restricted to a >> minimum. I would suggest to have the following modules: >> >> biojava-core: Contains everything that can not easily be modularized >> or nobody volunteers to become a module maintainer. >> biojava-phylogeny: Scooter expressed some interested to provide such a >> module and become package maintainer for it. >> biojava-structure: Everything protein structure related. I would be >> package maintainer. >> biojava-blast: Blast parsing is a frequently requested functionality >> and it would be good to have this code self-contained. A package >> maintainer for this still will need to be nominated at a later stage. >> Any suggestions for other modules? >> >> Let me know what you think about this. >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure under > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivery of the message to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. If you > have received this communication in error, please notify the sender > immediately by e-mail and delete the material from any computer. ?Thank you. From mark.schreiber at novartis.com Wed May 13 00:09:31 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 13 May 2009 08:09:31 +0800 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com> Message-ID: A while back I gave Richard some code that uses JAXB to objectify (and deobjectify) BLAST XML output. This might be useful for parsing BLAST results from the webservices which normally use BLAST XML. I could probably dig it up again if needed (it was autogenerated anyway). It would probably be a good object model for BLAST output if people want to parse other types of BLAST output (such as flatfile, but who would want to do that!). The BLAST XML seems to accommodate strange flavours of BLAST such as PSI-BLAST etc and also has been much more stable than the default flat file output. - Mark Andreas Prlic Sent by: biojava-dev-bounces at lists.open-bio.org 05/13/2009 08:02 AM To Scooter Willis cc biojava-dev Subject Re: [Biojava-dev] Plans for next biojava release - modularization Hi Scooter, about your suggestion for the blast webservice client code: In principle I like the idea and we have had questions on the mailing list regarding this in the past. Only thing is I think there is already some client code in java available: http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp but I am not sure how good that Java client library is.... Besides this, there is the need for work on our blast parser library and if you are interested in working on that you are welcome. As I mentioned, I think this should become its own module, due to the popularity of that code. Andreas On Tue, May 12, 2009 at 6:34 AM, Scooter Willis wrote: > Mark > > > > It is a challenge on knowing where to draw the line. Allowing both options > is a reasonable approach. The implementation of the algorithm is key to > allow it to be multi-threaded or being able to run in parallel. One approach > is to provide a standard interface such as process() would wait for the > result/return value and run in the parent thread. To run the algorithm in a > thread you can have a startProcess() where you can add yourself as a > progress listener and when complete() method is called you can call > getResults(). You can then also have the corresponding stopProcess() which > would set an internal value to cause all threads to quit. Lots of ways to > tackle the problem the key is to start talking about it and at minimum take > advantage of multiple-cores where the external code can set the number of > cores to use. You can get a dual quad core machine these days for < $1000 > but most software implementations are not designed to take advantage of it. > > > > The real question is what exists today in the BioJava API that is considered > long running in normal use case and thus is a candidate to be run in > parallel. It may not be an issue in existing BioJava code. When I first > started using BioJava I went looking for BLAST code only to find a BLAST > parser. I wanted to do a Multiple Sequence Alignment and turns out that > Biojava code calls CLUSTALW as an external processor under the covers. I > also needed code to construct trees from an MSA and found the summer of code > project that was only focused on representing the tree. > > > > It would be nice to have a BLAST implementation in Java optimized to run on > a cluster but who has time to rewrite BLAST in Java when you can do BLAST > search via the web and focus on parsing the results. BioJava needs a BLAST > API that makes a web services call to an external service and gets returns > structured results in core BioJava structures. Probably not difficult to do > a Java version of CLUSTALW but again we can push the work out to > http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results > back returned in BioJava structures. > > > > I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web > service -> BioJava code. I haven?t done the research but it seems that > http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to > expose common biology computational services. If multiple external services > are offering BLAST via web services where each picked a different > implementation then BioJava could provide abstraction to different services. > > > > Thanks > > Scooter > > > > From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] > Sent: Tuesday, May 12, 2009 1:27 AM > To: Scooter Willis > Cc: Andreas Prlic; biojava-dev > Subject: Re: [Biojava-dev] Plans for next biojava release - modularization > > > > Hi - > > This was one thing we discussed previously with respect to biojava 3. > Generally I support the idea because almost all computers are now > multi-core and as you say cloud or utility computing is already a reality. > > However, I tend to think that biojava should not control threading or > concurrency. This should be done by the developer. This is because sometimes > mutithreading can be fast on a slow computer but slow on a fast computer > (due to the overhead in spawning threads) so programs need to be tunable. > Also Java app servers and things like Sun Grid Engine, EC2 etc don't like > people attempting to control their own threads. What BioJava should do is > expose granular and thread-safe operations that can be threaded or form > discrete tasks on a utility grid or complete in SessionBeans on an App > server. For example it would be better if BioJava had a single threaded > method to calculate the GC of a single sequence rather than a multi-threaded > method that calculates the GC of multiple sequences. This would let the > developer make a multithreaded version if desired or distribute multiple > tasks based on the single threaded version to a compute cloud (and let the > cloud manage all the tasks). > > Possibly the best situation would be to have the single threaded fine grain > operations that let developers or grid engines control threading and then > higher level APIs that do it for you (or good cookbook examples that show > you how to do it). Another idea that was discussed was the use of > properties files to allow people to set how many CPUs they wanted to make > available to the JVM or name packages that can or cannot use threading. > > Finally, there are lots of times when it is highly desirable to use Java > beans because they play well with dozens of Java api's however beans don't > work well with threads because they have public setter methods. I would > like to see a lot more bean use in a future BioJava because it would make > life so much easier but a lot of care would need to be taken to make sure > thread safety is preserved. There are many patterns that can be used such > as synchronization locks etc to make things thread safe so I think this can > be achieved as long as we are disciplined and consider that all methods may > be used in a multi-threaded application (even if we write the method as a > single thread). If there are code checkers that make suggestions on thread > safety it would be great to have these as part of the standard build > process. Good documentation would go a long way as well. Are there unit > test patterns that can catch these problems as well? Suggestions would be > great. > > Progress Listener patterns are good but it depends on the situation and > might be better handled in high level APIs or left to the developer. For > example in your NJ code a progress listener would be good if someone fed > 1000 sequences into the method but not if they only put in 10. Also code > running on an old machine might need a progress listener but the same > problem on a new machine may complete almost instantly. Probably a > pluggable listener would be the way to go. Also it might be possible to do > this using the new JDK APIs that let you take a peek at the stack trace. > Even if your NJ method didn't allow for a progress listener a developer > could still make one by looking at the method calls in the stack. As long as > your NJ method called other methods internally for each sequence (quite > likely) it would be possible to observe the cycle of method calls from the > stack. This might make it possible to have a very general BioJava progress > listener that can be told to count the number of times a method is called in > the stack. The name of the method would be the argument. If the application > runs in a Java App server you can also do this very easily with a method > Interceptor. > > - Mark > > biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > >> Andreas >> >> Another theme that should be considered is providing a multi-thread >> version of any module with long run time. This would have a couple >> elements. A progress listener interface should be standard where core >> code would update progress messages to listeners that can be used by >> external code to display feedback to the user. I did this with the >> Neighbor Joining code for tree construction and it provides needed >> feedback in a GUI. If not the user gets frustrated because they don't >> know the code they are about to execute may take 10 minutes or 8 hours >> to complete and they think the software is not working. The reverse is >> also true for canceling an operation where you want to have core code >> stop processing a long running loop. Once the code has completed then >> the listener interface for process complete is called allowing the next >> step in the external code to continue. The developer would have the >> choice to call the "process" method or run it in a thread and wait for >> the callback complete method to be called. >> >> This is the first step in the ability to have the core/long running >> processes take advantage of multiple threads to complete the >> computational task faster. Not all code can be parallelized easily but >> if the algorithm can take advantage of running in parallel then it >> should. This then opens up a couple of cloud computing frameworks that >> extend the multi-threaded concepts in Java across a cluster >> http://www.terracotta.org/. If we put an emphasis on having code that >> runs well in a thread we are one step closer to an architecture that can >> run in a cloud. The computational problems are only going to get bigger >> and with Amazon EC2 and http://www.eucalyptus.com/ approaches >> computational IO cycles are going to be cheap as long as the >> software/libraries can easily take advantage of it. >> >> Thanks >> >> Scooter >> >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas >> Prlic >> Sent: Monday, May 11, 2009 12:27 AM >> To: biojava-dev >> Subject: [Biojava-dev] Plans for next biojava release - modularization >> >> Hi biojava-devs, >> >> It is time to start working on the next biojava release. I would >> like to modularize the current code base and apply some of the ideas >> that have emerged around Richard's "biojava 3" code. In principle the >> idea is that all changes should be backwards compatible with the >> interfaces provided by the current biojava 1.7 release. Backwards >> compatibility shall only be broken if the functionality is being >> replaced with something that works better, and gets documented >> accordingly. For the build functionality I would suggest to stick with >> what Richard's biojava 3 code base already is providing. Since we will >> try to be backwards compatible all code development should be part of >> the biojava-trunk and the first step will be to move the ant-build >> scripts to a maven build process. Following this procedure will allow >> to use e.g. the code refactoring tools provided by Eclipse, which >> should come in handy. >> >> The modules I would like to see should provide self-contained >> functionality and cross dependencies should be restricted to a >> minimum. I would suggest to have the following modules: >> >> biojava-core: Contains everything that can not easily be modularized >> or nobody volunteers to become a module maintainer. >> biojava-phylogeny: Scooter expressed some interested to provide such a >> module and become package maintainer for it. >> biojava-structure: Everything protein structure related. I would be >> package maintainer. >> biojava-blast: Blast parsing is a frequently requested functionality >> and it would be good to have this code self-contained. A package >> maintainer for this still will need to be nominated at a later stage. >> Any suggestions for other modules? >> >> Let me know what you think about this. >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure under > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivery of the message to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. If you > have received this communication in error, please notify the sender > immediately by e-mail and delete the material from any computer. Thank you. _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Wed May 13 00:23:30 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 12 May 2009 20:23:30 -0400 Subject: [Biojava-dev] Plans for next biojava release - modularization References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu><061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu> <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C855@FLMAIL1.fl.ad.scripps.edu> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C858@FLMAIL1.fl.ad.scripps.edu> Andreas A follow up point related to Mark's comment could be that parsing blast output would not be required or less important if we provide a clean BioJava API to make the web service call with BioJava data structure inputs and give back BioJava data structure outputs. This saves the step of the user doing the web query, file save, parse etc. It would be interesting to know how many users run their own BLAST server for privacy reasons. Scooter -----Original Message----- From: Scooter Willis Sent: Tue 5/12/2009 8:13 PM To: Andreas Prlic Cc: biojava-dev Subject: RE: [Biojava-dev] Plans for next biojava release - modularization Andreas The goal for BioJava could be to provide a wrapper for the http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp java code so that inputs/outputs are BioJava. I think they are using Axis for the client web services code. If BioJava 3 is going to be Java 6 minimum then it is easier to use the Java 6 SOAP processing capabilities by pointing to the WSDL code and generating the Java code for the client side. This cuts down on the additional external 3rd parties that are required. I try to stay out of the legacy file parsing business whenever possible. Scooter -----Original Message----- From: andreas.prlic at gmail.com on behalf of Andreas Prlic Sent: Tue 5/12/2009 7:59 PM To: Scooter Willis Cc: biojava-dev Subject: Re: [Biojava-dev] Plans for next biojava release - modularization Hi Scooter, about your suggestion for the blast webservice client code: In principle I like the idea and we have had questions on the mailing list regarding this in the past. Only thing is I think there is already some client code in java available: http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp but I am not sure how good that Java client library is.... Besides this, there is the need for work on our blast parser library and if you are interested in working on that you are welcome. As I mentioned, I think this should become its own module, due to the popularity of that code. Andreas On Tue, May 12, 2009 at 6:34 AM, Scooter Willis wrote: > Mark > > > > It is a challenge on knowing where to draw the line. Allowing both options > is a reasonable approach. The implementation of the algorithm is key to > allow it to be multi-threaded or being able to run in parallel. One approach > is to provide a standard interface such as process() would wait for the > result/return value and run in the parent thread. To run the algorithm in a > thread you can have a startProcess() where you can add yourself as a > progress listener and when complete() method is called you can call > getResults(). You can then also have the corresponding stopProcess() which > would set an internal value to cause all threads to quit. ?Lots of ways to > tackle the problem the key is to start talking about it and at minimum take > advantage of multiple-cores where the external code can set the number of > cores to use. You can get a dual quad core machine these days for < $1000 > but most software implementations are not designed to take advantage of it. > > > > The real question is what exists today in the BioJava API that is considered > long running in normal use case and thus is a candidate to be run in > parallel. It may not be an issue in existing BioJava code. When I first > started using BioJava I went looking for BLAST code only to find a BLAST > parser. I wanted to do a Multiple Sequence Alignment and turns out that > Biojava code calls CLUSTALW as an external processor under the covers. ?I > also needed code to construct trees from an MSA and found the summer of code > project that was only focused on representing the tree. > > > > It would be nice to have a BLAST implementation in Java optimized to run on > a cluster but who has time to rewrite BLAST in Java when you can do BLAST > search via the web and focus on parsing the results. BioJava needs a BLAST > API that makes a web services call to an external service and gets returns > structured results in core BioJava structures. Probably not difficult to do > a Java version of CLUSTALW but again we can push the work out to > http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results > back returned in BioJava structures. > > > > I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web > service -> BioJava code. I haven't done the research but it seems that > http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to > expose common biology ?computational services. If multiple external services > are offering BLAST via web services where each picked a different > implementation then BioJava could provide abstraction to different services. > > > > Thanks > > Scooter > > > > From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] > Sent: Tuesday, May 12, 2009 1:27 AM > To: Scooter Willis > Cc: Andreas Prlic; biojava-dev > Subject: Re: [Biojava-dev] Plans for next biojava release - modularization > > > > Hi - > > This was one thing we discussed previously with respect to biojava 3. > ?Generally I support the idea because almost all computers are now > multi-core and as you say cloud or utility computing is already a reality. > > However, I tend to think that biojava should not control threading or > concurrency. This should be done by the developer. This is because sometimes > mutithreading can be fast on a slow computer but slow on a fast computer > (due to the overhead in spawning threads) so programs need to be tunable. > Also Java app servers and things like Sun Grid Engine, EC2 etc don't like > people attempting to control their own threads. ?What BioJava should do is > expose granular and thread-safe operations that can be threaded or form > discrete tasks on a utility grid or complete in SessionBeans on an App > server. ?For example it would be better if BioJava had a single threaded > method to calculate the GC of a single sequence rather than a multi-threaded > method that calculates the GC of multiple sequences. ?This would let the > developer make a multithreaded version if desired or distribute multiple > tasks based on the single threaded version to a compute cloud (and let the > cloud manage all the tasks). > > Possibly the best situation would be to have the single threaded fine grain > operations that let developers or grid engines control threading and then > higher level APIs that do it for you (or good cookbook examples that show > you how to do it). ?Another idea that was discussed was the use of > properties files to allow people to set how many CPUs they wanted to make > available to the JVM or name packages that can or cannot use threading. > > Finally, there are lots of times when it is highly desirable to use Java > beans because they play well with dozens of Java api's however beans don't > work well with threads because they have public setter methods. ?I would > like to see a lot more bean use in a future BioJava because it would make > life so much easier but a lot of care would need to be taken to make sure > thread safety is preserved. ?There are many patterns that can be used such > as synchronization locks etc to make things thread safe so I think this can > be achieved as long as we are disciplined and consider that all methods may > be used in a multi-threaded application (even if we write the method as a > single thread). ?If there are code checkers that make suggestions on thread > safety it would be great to have these as part of the standard build > process. ?Good documentation would go a long way as well. ?Are there unit > test patterns that can catch these problems as well? ?Suggestions would be > great. > > Progress Listener patterns are good but it depends on the situation and > might be better handled in high level APIs or left to the developer. ?For > example in your NJ code a progress listener would be good if someone fed > 1000 sequences into the method but not if they only put in 10. Also code > running on an old machine might need a progress listener but the same > problem on a new machine may complete almost instantly. ?Probably a > pluggable listener would be the way to go. ?Also it might be possible to do > this using the new JDK APIs that let you take a peek at the stack trace. > Even if your NJ method didn't allow for a progress listener a developer > could still make one by looking at the method calls in the stack. As long as > your NJ method called other methods internally for each sequence (quite > likely) it would be possible to observe the cycle of method calls from the > stack. ?This might make it possible to have a very general BioJava progress > listener that can be told to count the number of times a method is called in > the stack. The name of the method would be the argument. ?If the application > runs in a Java App server you can also do this very easily with a method > Interceptor. > > - Mark > > biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > >> Andreas >> >> Another theme that should be considered is providing a multi-thread >> version of any module with long run time. This would have a couple >> elements. A progress listener interface should be standard where core >> code would update progress messages to listeners that can be used by >> external code to display feedback to the user. I did this with the >> Neighbor Joining code for tree construction and it provides needed >> feedback in a GUI. If not the user gets frustrated because they don't >> know the code they are about to execute may take 10 minutes or 8 hours >> to complete and they think the software is not working. The reverse is >> also true for canceling an operation where you want to have core code >> stop processing a long running loop. Once the code has completed then >> the listener interface for process complete is called allowing the next >> step in the external code to continue. The developer would have the >> choice to call the "process" method or run it in a thread and wait for >> the callback complete method to be called. >> >> This is the first step in the ability to have the core/long running >> processes take advantage of multiple threads to complete the >> computational task faster. Not all code can be parallelized easily but >> if the algorithm can take advantage of running in parallel then it >> should. This then opens up a couple of cloud computing frameworks that >> extend the multi-threaded concepts in Java across a cluster >> http://www.terracotta.org/. If we put an emphasis on having code that >> runs well in a thread we are one step closer to an architecture that can >> run in a cloud. The computational problems are only going to get bigger >> and with Amazon EC2 and http://www.eucalyptus.com/ approaches >> computational IO cycles are going to be cheap as long as the >> software/libraries can easily take advantage of it. >> >> Thanks >> >> Scooter >> >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas >> Prlic >> Sent: Monday, May 11, 2009 12:27 AM >> To: biojava-dev >> Subject: [Biojava-dev] Plans for next biojava release - modularization >> >> Hi biojava-devs, >> >> It is time to start working on the next biojava release. ?I ?would >> like to modularize the current code base and apply some of the ideas >> that have emerged around Richard's "biojava 3" code. In principle the >> idea is that all changes should be backwards compatible with the >> interfaces provided by the current biojava 1.7 release. ?Backwards >> compatibility shall only be broken if the functionality is being >> replaced with something that works better, and gets documented >> accordingly. For the build functionality I would suggest to stick with >> what Richard's biojava 3 code base already is providing. Since we will >> try to be backwards compatible all code development should be part of >> the biojava-trunk and the first step will be to move the ant-build >> scripts to a maven build process. Following this procedure will allow >> to use e.g. the code refactoring tools provided by Eclipse, which >> should come in handy. >> >> The modules I would like to see should provide self-contained >> functionality and cross dependencies should be restricted to a >> minimum. I would suggest to have the following modules: >> >> biojava-core: Contains everything that can not easily be modularized >> or nobody volunteers to become a module maintainer. >> biojava-phylogeny: Scooter expressed some interested to provide such a >> module and become package maintainer for it. >> biojava-structure: Everything protein structure related. I would be >> package maintainer. >> biojava-blast: Blast parsing is a frequently requested functionality >> and it would be good to have this code self-contained. A package >> maintainer for this still will need to be nominated at a later stage. >> Any suggestions for other modules? >> >> Let me know what you think about this. >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure under > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivery of the message to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. If you > have received this communication in error, please notify the sender > immediately by e-mail and delete the material from any computer. ?Thank you. From andreas at sdsc.edu Wed May 13 00:45:54 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 12 May 2009 17:45:54 -0700 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: References: <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com> Message-ID: <59a41c430905121745p7325d69dgf7e4d916746bf14d@mail.gmail.com> The point with the auto-generated code raises actually another question to me: How shall we deal with auto-generated code? I also have some code that is currently not part on BioJava, but it might be useful for other people: It allows to parse uniprot XML files and serialize / de-serialize the objects to a database using EJBs, hibernate and the uniprot XML files. How far should biojava go in supporting such auto generated or semi-auto generated code? A On Tue, May 12, 2009 at 5:09 PM, wrote: > > A while back I gave Richard some code that uses JAXB to objectify (and > deobjectify) BLAST XML output. This might be useful for parsing BLAST > results from the webservices which normally use BLAST XML. I could probably > dig it up again if needed (it was autogenerated anyway). > > It would probably be a good object model for BLAST output if people want to > parse other types of BLAST output (such as flatfile, but who would want to > do that!). ?The BLAST XML seems to accommodate strange flavours of BLAST > such as PSI-BLAST etc and also has been much more stable than the default > flat file output. > > - Mark > > > > Andreas Prlic > Sent by: biojava-dev-bounces at lists.open-bio.org > > 05/13/2009 08:02 AM > > To > Scooter Willis > cc > biojava-dev > Subject > Re: [Biojava-dev] Plans for next biojava release - modularization > > > > > Hi Scooter, > > about your suggestion for the blast webservice client code: In > principle I like the idea and we have had questions on the mailing > list regarding this in the past. Only thing is I think there is > already some client code in java available: > http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp > but I am not sure how good that Java client library is.... > > Besides this, there is the need for work on our blast parser library > and if you are interested in working on that you are welcome. As I > mentioned, I think this should become its own module, due to the > popularity of that code. > > Andreas > > > > > On Tue, May 12, 2009 at 6:34 AM, Scooter Willis wrote: >> Mark >> >> >> >> It is a challenge on knowing where to draw the line. Allowing both options >> is a reasonable approach. The implementation of the algorithm is key to >> allow it to be multi-threaded or being able to run in parallel. One >> approach >> is to provide a standard interface such as process() would wait for the >> result/return value and run in the parent thread. To run the algorithm in >> a >> thread you can have a startProcess() where you can add yourself as a >> progress listener and when complete() method is called you can call >> getResults(). You can then also have the corresponding stopProcess() which >> would set an internal value to cause all threads to quit. ?Lots of ways to >> tackle the problem the key is to start talking about it and at minimum >> take >> advantage of multiple-cores where the external code can set the number of >> cores to use. You can get a dual quad core machine these days for < $1000 >> but most software implementations are not designed to take advantage of >> it. >> >> >> >> The real question is what exists today in the BioJava API that is >> considered >> long running in normal use case and thus is a candidate to be run in >> parallel. It may not be an issue in existing BioJava code. When I first >> started using BioJava I went looking for BLAST code only to find a BLAST >> parser. I wanted to do a Multiple Sequence Alignment and turns out that >> Biojava code calls CLUSTALW as an external processor under the covers. ?I >> also needed code to construct trees from an MSA and found the summer of >> code >> project that was only focused on representing the tree. >> >> >> >> It would be nice to have a BLAST implementation in Java optimized to run >> on >> a cluster but who has time to rewrite BLAST in Java when you can do BLAST >> search via the web and focus on parsing the results. BioJava needs a BLAST >> API that makes a web services call to an external service and gets returns >> structured results in core BioJava structures. Probably not difficult to >> do >> a Java version of CLUSTALW but again we can push the work out to >> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the >> results >> back returned in BioJava structures. >> >> >> >> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web >> service -> BioJava code. I haven?t done the research but it seems that >> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to >> expose common biology ?computational services. If multiple external >> services >> are offering BLAST via web services where each picked a different >> implementation then BioJava could provide abstraction to different >> services. >> >> >> >> Thanks >> >> Scooter >> >> >> >> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] >> Sent: Tuesday, May 12, 2009 1:27 AM >> To: Scooter Willis >> Cc: Andreas Prlic; biojava-dev >> Subject: Re: [Biojava-dev] Plans for next biojava release - modularization >> >> >> >> Hi - >> >> This was one thing we discussed previously with respect to biojava 3. >> ?Generally I support the idea because almost all computers are now >> multi-core and as you say cloud or utility computing is already a reality. >> >> However, I tend to think that biojava should not control threading or >> concurrency. This should be done by the developer. This is because >> sometimes >> mutithreading can be fast on a slow computer but slow on a fast computer >> (due to the overhead in spawning threads) so programs need to be tunable. >> Also Java app servers and things like Sun Grid Engine, EC2 etc don't like >> people attempting to control their own threads. ?What BioJava should do is >> expose granular and thread-safe operations that can be threaded or form >> discrete tasks on a utility grid or complete in SessionBeans on an App >> server. ?For example it would be better if BioJava had a single threaded >> method to calculate the GC of a single sequence rather than a >> multi-threaded >> method that calculates the GC of multiple sequences. ?This would let the >> developer make a multithreaded version if desired or distribute multiple >> tasks based on the single threaded version to a compute cloud (and let the >> cloud manage all the tasks). >> >> Possibly the best situation would be to have the single threaded fine >> grain >> operations that let developers or grid engines control threading and then >> higher level APIs that do it for you (or good cookbook examples that show >> you how to do it). ?Another idea that was discussed was the use of >> properties files to allow people to set how many CPUs they wanted to make >> available to the JVM or name packages that can or cannot use threading. >> >> Finally, there are lots of times when it is highly desirable to use Java >> beans because they play well with dozens of Java api's however beans don't >> work well with threads because they have public setter methods. ?I would >> like to see a lot more bean use in a future BioJava because it would make >> life so much easier but a lot of care would need to be taken to make sure >> thread safety is preserved. ?There are many patterns that can be used such >> as synchronization locks etc to make things thread safe so I think this >> can >> be achieved as long as we are disciplined and consider that all methods >> may >> be used in a multi-threaded application (even if we write the method as a >> single thread). ?If there are code checkers that make suggestions on >> thread >> safety it would be great to have these as part of the standard build >> process. ?Good documentation would go a long way as well. ?Are there unit >> test patterns that can catch these problems as well? ?Suggestions would be >> great. >> >> Progress Listener patterns are good but it depends on the situation and >> might be better handled in high level APIs or left to the developer. ?For >> example in your NJ code a progress listener would be good if someone fed >> 1000 sequences into the method but not if they only put in 10. Also code >> running on an old machine might need a progress listener but the same >> problem on a new machine may complete almost instantly. ?Probably a >> pluggable listener would be the way to go. ?Also it might be possible to >> do >> this using the new JDK APIs that let you take a peek at the stack trace. >> Even if your NJ method didn't allow for a progress listener a developer >> could still make one by looking at the method calls in the stack. As long >> as >> your NJ method called other methods internally for each sequence (quite >> likely) it would be possible to observe the cycle of method calls from the >> stack. ?This might make it possible to have a very general BioJava >> progress >> listener that can be told to count the number of times a method is called >> in >> the stack. The name of the method would be the argument. ?If the >> application >> runs in a Java App server you can also do this very easily with a method >> Interceptor. >> >> - Mark >> >> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: >> >>> Andreas >>> >>> Another theme that should be considered is providing a multi-thread >>> version of any module with long run time. This would have a couple >>> elements. A progress listener interface should be standard where core >>> code would update progress messages to listeners that can be used by >>> external code to display feedback to the user. I did this with the >>> Neighbor Joining code for tree construction and it provides needed >>> feedback in a GUI. If not the user gets frustrated because they don't >>> know the code they are about to execute may take 10 minutes or 8 hours >>> to complete and they think the software is not working. The reverse is >>> also true for canceling an operation where you want to have core code >>> stop processing a long running loop. Once the code has completed then >>> the listener interface for process complete is called allowing the next >>> step in the external code to continue. The developer would have the >>> choice to call the "process" method or run it in a thread and wait for >>> the callback complete method to be called. >>> >>> This is the first step in the ability to have the core/long running >>> processes take advantage of multiple threads to complete the >>> computational task faster. Not all code can be parallelized easily but >>> if the algorithm can take advantage of running in parallel then it >>> should. This then opens up a couple of cloud computing frameworks that >>> extend the multi-threaded concepts in Java across a cluster >>> http://www.terracotta.org/. If we put an emphasis on having code that >>> runs well in a thread we are one step closer to an architecture that can >>> run in a cloud. The computational problems are only going to get bigger >>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches >>> computational IO cycles are going to be cheap as long as the >>> software/libraries can easily take advantage of it. >>> >>> Thanks >>> >>> Scooter >>> >>> -----Original Message----- >>> From: biojava-dev-bounces at lists.open-bio.org >>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas >>> Prlic >>> Sent: Monday, May 11, 2009 12:27 AM >>> To: biojava-dev >>> Subject: [Biojava-dev] Plans for next biojava release - modularization >>> >>> Hi biojava-devs, >>> >>> It is time to start working on the next biojava release. ?I ?would >>> like to modularize the current code base and apply some of the ideas >>> that have emerged around Richard's "biojava 3" code. In principle the >>> idea is that all changes should be backwards compatible with the >>> interfaces provided by the current biojava 1.7 release. ?Backwards >>> compatibility shall only be broken if the functionality is being >>> replaced with something that works better, and gets documented >>> accordingly. For the build functionality I would suggest to stick with >>> what Richard's biojava 3 code base already is providing. Since we will >>> try to be backwards compatible all code development should be part of >>> the biojava-trunk and the first step will be to move the ant-build >>> scripts to a maven build process. Following this procedure will allow >>> to use e.g. the code refactoring tools provided by Eclipse, which >>> should come in handy. >>> >>> The modules I would like to see should provide self-contained >>> functionality and cross dependencies should be restricted to a >>> minimum. I would suggest to have the following modules: >>> >>> biojava-core: Contains everything that can not easily be modularized >>> or nobody volunteers to become a module maintainer. >>> biojava-phylogeny: Scooter expressed some interested to provide such a >>> module and become package maintainer for it. >>> biojava-structure: Everything protein structure related. I would be >>> package maintainer. >>> biojava-blast: Blast parsing is a frequently requested functionality >>> and it would be good to have this code self-contained. A package >>> maintainer for this still will need to be nominated at a later stage. >>> Any suggestions for other modules? >>> >>> Let me know what you think about this. >>> >>> Andreas >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _________________________ >> >> CONFIDENTIALITY NOTICE >> >> The information contained in this e-mail message is intended only for the >> exclusive use of the individual or entity named above and may contain >> information that is privileged, confidential or exempt from disclosure >> under >> applicable law. If the reader of this message is not the intended >> recipient, >> or the employee or agent responsible for delivery of the message to the >> intended recipient, you are hereby notified that any dissemination, >> distribution or copying of this communication is strictly prohibited. If >> you >> have received this communication in error, please notify the sender >> immediately by e-mail and delete the material from any computer. ?Thank >> you. > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From mark.schreiber at novartis.com Wed May 13 02:15:27 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 13 May 2009 10:15:27 +0800 Subject: [Biojava-dev] Plans for next biojava release - modularization In-Reply-To: <59a41c430905121745p7325d69dgf7e4d916746bf14d@mail.gmail.com> Message-ID: Hi - I think it depends if the code is going to be auto-generated at each build or only once. I have autogenerated Entity classes for BioSQL tables. My recommendation would be that these be used for JPA mapping to BioSQL from BioJava. I think these only need be generated once (unless the BioSQL schema changes), especially as the autogeneration didn't quite catch some of the subtleties of the schema. They can also be in their own module, not the core. Classes that map to XML or webservice clients can be autogenerated from XML schema, DTD or WSDL once or at every build (automatically from ANT and probably Maven). In these cases it may pay to do it with every build because these classes are completely boiler plate code and should never need to be manually modified. Also it means the code for these utility classes will never be in the code base and at will not be possible for someone to change it accidentally (and the code base will be smaller). Only the XSD or WSDL will be in subversion (and any higher level code that makes use of the boilerplate client code). Improvements in the boilerplate code or changes that come with updates to JAXB and similar will automatically appear at the next build (when we change JAXB versions). Conceptually the BLAST XML parsing module may consist of only the BLAST XSD (or DTD) and a high-level biojava class like the following: public interface BlastParser { public Serializable[] parseBlast(URL url){ Calls bioler plate code... } public Serializable[] parseBlast(String blastXMLOutput){ Calls bioler plate code... } } The code for the bit that does the JAXB marshalling etc could be generated at build time. The Serializable array would be the objects that JAXB generates. Probably they would be a more specific stub that implements serializable (eg BlastResult or similar depending on the XSD). I think it really comes down to a question of how much the generated code is boilerplate code that will never be changed. If it is not 'modifiable' then it can be generated at build. If the autogenerated code is an outline of a class where method bodies need to be filled in or customized then they should not be autogenerated at build time. A good example would be JUnit classes that can be autogenerated to give you a template that will compile and run but probably will not perform a sensible test. The developer of the test could autogenerate the template but would then need to make the test sensible. At that point the test should be in the code base and should not be regenerated at build time. - Mark biojava-dev-bounces at lists.open-bio.org wrote on 05/13/2009 08:45:54 AM: > The point with the auto-generated code raises actually another > question to me: How shall we deal with auto-generated code? > > I also have some code that is currently not part on BioJava, but it > might be useful for other people: It allows to parse uniprot XML files > and serialize / de-serialize the objects to a database using EJBs, > hibernate and the uniprot XML files. > > How far should biojava go in supporting such auto generated or > semi-auto generated code? > A > > > On Tue, May 12, 2009 at 5:09 PM, wrote: > > > > A while back I gave Richard some code that uses JAXB to objectify (and > > deobjectify) BLAST XML output. This might be useful for parsing BLAST > > results from the webservices which normally use BLAST XML. I could probably > > dig it up again if needed (it was autogenerated anyway). > > > > It would probably be a good object model for BLAST output if people want to > > parse other types of BLAST output (such as flatfile, but who would want to > > do that!). The BLAST XML seems to accommodate strange flavours of BLAST > > such as PSI-BLAST etc and also has been much more stable than the default > > flat file output. > > > > - Mark > > > > > > > > Andreas Prlic > > Sent by: biojava-dev-bounces at lists.open-bio.org > > > > 05/13/2009 08:02 AM > > > > To > > Scooter Willis > > cc > > biojava-dev > > Subject > > Re: [Biojava-dev] Plans for next biojava release - modularization > > > > > > > > > > Hi Scooter, > > > > about your suggestion for the blast webservice client code: In > > principle I like the idea and we have had questions on the mailing > > list regarding this in the past. Only thing is I think there is > > already some client code in java available: > > http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp > > but I am not sure how good that Java client library is.... > > > > Besides this, there is the need for work on our blast parser library > > and if you are interested in working on that you are welcome. As I > > mentioned, I think this should become its own module, due to the > > popularity of that code. > > > > Andreas > > > > > > > > > > On Tue, May 12, 2009 at 6:34 AM, Scooter Willis wrote: > >> Mark > >> > >> > >> > >> It is a challenge on knowing where to draw the line. Allowing both options > >> is a reasonable approach. The implementation of the algorithm is key to > >> allow it to be multi-threaded or being able to run in parallel. One > >> approach > >> is to provide a standard interface such as process() would wait for the > >> result/return value and run in the parent thread. To run the algorithm in > >> a > >> thread you can have a startProcess() where you can add yourself as a > >> progress listener and when complete() method is called you can call > >> getResults(). You can then also have the corresponding stopProcess() which > >> would set an internal value to cause all threads to quit. Lots of ways to > >> tackle the problem the key is to start talking about it and at minimum > >> take > >> advantage of multiple-cores where the external code can set the number of > >> cores to use. You can get a dual quad core machine these days for < $1000 > >> but most software implementations are not designed to take advantage of > >> it. > >> > >> > >> > >> The real question is what exists today in the BioJava API that is > >> considered > >> long running in normal use case and thus is a candidate to be run in > >> parallel. It may not be an issue in existing BioJava code. When I first > >> started using BioJava I went looking for BLAST code only to find a BLAST > >> parser. I wanted to do a Multiple Sequence Alignment and turns out that > >> Biojava code calls CLUSTALW as an external processor under the covers. I > >> also needed code to construct trees from an MSA and found the summer of > >> code > >> project that was only focused on representing the tree. > >> > >> > >> > >> It would be nice to have a BLAST implementation in Java optimized to run > >> on > >> a cluster but who has time to rewrite BLAST in Java when you can do BLAST > >> search via the web and focus on parsing the results. BioJava needs a BLAST > >> API that makes a web services call to an external service and gets returns > >> structured results in core BioJava structures. Probably not difficult to > >> do > >> a Java version of CLUSTALW but again we can push the work out to > >> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the > >> results > >> back returned in BioJava structures. > >> > >> > >> > >> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web > >> service -> BioJava code. I haven?t done the research but it seems that > >> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to > >> expose common biology computational services. If multiple external > >> services > >> are offering BLAST via web services where each picked a different > >> implementation then BioJava could provide abstraction to different > >> services. > >> > >> > >> > >> Thanks > >> > >> Scooter > >> > >> > >> > >> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] > >> Sent: Tuesday, May 12, 2009 1:27 AM > >> To: Scooter Willis > >> Cc: Andreas Prlic; biojava-dev > >> Subject: Re: [Biojava-dev] Plans for next biojava release - modularization > >> > >> > >> > >> Hi - > >> > >> This was one thing we discussed previously with respect to biojava 3. > >> Generally I support the idea because almost all computers are now > >> multi-core and as you say cloud or utility computing is already a reality. > >> > >> However, I tend to think that biojava should not control threading or > >> concurrency. This should be done by the developer. This is because > >> sometimes > >> mutithreading can be fast on a slow computer but slow on a fast computer > >> (due to the overhead in spawning threads) so programs need to be tunable. > >> Also Java app servers and things like Sun Grid Engine, EC2 etc don't like > >> people attempting to control their own threads. What BioJava should do is > >> expose granular and thread-safe operations that can be threaded or form > >> discrete tasks on a utility grid or complete in SessionBeans on an App > >> server. For example it would be better if BioJava had a single threaded > >> method to calculate the GC of a single sequence rather than a > >> multi-threaded > >> method that calculates the GC of multiple sequences. This would let the > >> developer make a multithreaded version if desired or distribute multiple > >> tasks based on the single threaded version to a compute cloud (and let the > >> cloud manage all the tasks). > >> > >> Possibly the best situation would be to have the single threaded fine > >> grain > >> operations that let developers or grid engines control threading and then > >> higher level APIs that do it for you (or good cookbook examples that show > >> you how to do it). Another idea that was discussed was the use of > >> properties files to allow people to set how many CPUs they wanted to make > >> available to the JVM or name packages that can or cannot use threading. > >> > >> Finally, there are lots of times when it is highly desirable to use Java > >> beans because they play well with dozens of Java api's however beans don't > >> work well with threads because they have public setter methods. I would > >> like to see a lot more bean use in a future BioJava because it would make > >> life so much easier but a lot of care would need to be taken to make sure > >> thread safety is preserved. There are many patterns that can be used such > >> as synchronization locks etc to make things thread safe so I think this > >> can > >> be achieved as long as we are disciplined and consider that all methods > >> may > >> be used in a multi-threaded application (even if we write the method as a > >> single thread). If there are code checkers that make suggestions on > >> thread > >> safety it would be great to have these as part of the standard build > >> process. Good documentation would go a long way as well. Are there unit > >> test patterns that can catch these problems as well? Suggestions would be > >> great. > >> > >> Progress Listener patterns are good but it depends on the situation and > >> might be better handled in high level APIs or left to the developer. For > >> example in your NJ code a progress listener would be good if someone fed > >> 1000 sequences into the method but not if they only put in 10. Also code > >> running on an old machine might need a progress listener but the same > >> problem on a new machine may complete almost instantly. Probably a > >> pluggable listener would be the way to go. Also it might be possible to > >> do > >> this using the new JDK APIs that let you take a peek at the stack trace. > >> Even if your NJ method didn't allow for a progress listener a developer > >> could still make one by looking at the method calls in the stack. As long > >> as > >> your NJ method called other methods internally for each sequence (quite > >> likely) it would be possible to observe the cycle of method calls from the > >> stack. This might make it possible to have a very general BioJava > >> progress > >> listener that can be told to count the number of times a method is called > >> in > >> the stack. The name of the method would be the argument. If the > >> application > >> runs in a Java App server you can also do this very easily with a method > >> Interceptor. > >> > >> - Mark > >> > >> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM: > >> > >>> Andreas > >>> > >>> Another theme that should be considered is providing a multi-thread > >>> version of any module with long run time. This would have a couple > >>> elements. A progress listener interface should be standard where core > >>> code would update progress messages to listeners that can be used by > >>> external code to display feedback to the user. I did this with the > >>> Neighbor Joining code for tree construction and it provides needed > >>> feedback in a GUI. If not the user gets frustrated because they don't > >>> know the code they are about to execute may take 10 minutes or 8 hours > >>> to complete and they think the software is not working. The reverse is > >>> also true for canceling an operation where you want to have core code > >>> stop processing a long running loop. Once the code has completed then > >>> the listener interface for process complete is called allowing the next > >>> step in the external code to continue. The developer would have the > >>> choice to call the "process" method or run it in a thread and wait for > >>> the callback complete method to be called. > >>> > >>> This is the first step in the ability to have the core/long running > >>> processes take advantage of multiple threads to complete the > >>> computational task faster. Not all code can be parallelized easily but > >>> if the algorithm can take advantage of running in parallel then it > >>> should. This then opens up a couple of cloud computing frameworks that > >>> extend the multi-threaded concepts in Java across a cluster > >>> http://www.terracotta.org/. If we put an emphasis on having code that > >>> runs well in a thread we are one step closer to an architecture that can > >>> run in a cloud. The computational problems are only going to get bigger > >>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches > >>> computational IO cycles are going to be cheap as long as the > >>> software/libraries can easily take advantage of it. > >>> > >>> Thanks > >>> > >>> Scooter > >>> > >>> -----Original Message----- > >>> From: biojava-dev-bounces at lists.open-bio.org > >>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas > >>> Prlic > >>> Sent: Monday, May 11, 2009 12:27 AM > >>> To: biojava-dev > >>> Subject: [Biojava-dev] Plans for next biojava release - modularization > >>> > >>> Hi biojava-devs, > >>> > >>> It is time to start working on the next biojava release. I would > >>> like to modularize the current code base and apply some of the ideas > >>> that have emerged around Richard's "biojava 3" code. In principle the > >>> idea is that all changes should be backwards compatible with the > >>> interfaces provided by the current biojava 1.7 release. Backwards > >>> compatibility shall only be broken if the functionality is being > >>> replaced with something that works better, and gets documented > >>> accordingly. For the build functionality I would suggest to stick with > >>> what Richard's biojava 3 code base already is providing. Since we will > >>> try to be backwards compatible all code development should be part of > >>> the biojava-trunk and the first step will be to move the ant-build > >>> scripts to a maven build process. Following this procedure will allow > >>> to use e.g. the code refactoring tools provided by Eclipse, which > >>> should come in handy. > >>> > >>> The modules I would like to see should provide self-contained > >>> functionality and cross dependencies should be restricted to a > >>> minimum. I would suggest to have the following modules: > >>> > >>> biojava-core: Contains everything that can not easily be modularized > >>> or nobody volunteers to become a module maintainer. > >>> biojava-phylogeny: Scooter expressed some interested to provide such a > >>> module and become package maintainer for it. > >>> biojava-structure: Everything protein structure related. I would be > >>> package maintainer. > >>> biojava-blast: Blast parsing is a frequently requested functionality > >>> and it would be good to have this code self-contained. A package > >>> maintainer for this still will need to be nominated at a later stage. > >>> Any suggestions for other modules? > >>> > >>> Let me know what you think about this. > >>> > >>> Andreas > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >>> > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > >> _________________________ > >> > >> CONFIDENTIALITY NOTICE > >> > >> The information contained in this e-mail message is intended only for the > >> exclusive use of the individual or entity named above and may contain > >> information that is privileged, confidential or exempt from disclosure > >> under > >> applicable law. If the reader of this message is not the intended > >> recipient, > >> or the employee or agent responsible for delivery of the message to the > >> intended recipient, you are hereby notified that any dissemination, > >> distribution or copying of this communication is strictly prohibited. If > >> you > >> have received this communication in error, please notify the sender > >> immediately by e-mail and delete the material from any computer. Thank > >> you. > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From msmoot at ucsd.edu Thu May 21 23:47:22 2009 From: msmoot at ucsd.edu (Mike Smoot) Date: Thu, 21 May 2009 16:47:22 -0700 Subject: [Biojava-dev] an outsider's take on Biojava 3 Message-ID: Hi Everyone, I thought I'd respond to Andreas' request for participation in the BioJava 3 design discussions that he made last week on the normal BioJava list. I'm the lead developer on the Cytoscape project (http://cytoscape.org), so I thought I'd provide some perspective on what a project using BioJava might look for in BioJava 3. Basically, I'd just like to voice my strong support for the "Basic Principles" listed here: http://biojava.org/wiki/BioJava3_Design. Finer granularity of jars, acyclic dependencies, and the separation of API and implementation are precisely the things we're doing in Cytoscape 3. The first two points will go a long way towards making it easier to use specific parts of the library without needing everything at once. The second point will allow alternative implementations of certain interfaces, which is one approach to dealing with issues like parallel vs. non-parallel versions of algorithms. Maven also sounds great. If I could add one bullet to the list, it would be to add OSGi metadata to the jars to allow easy integration with OSGi-based projects (such as Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven plugins to make this dead simple and it wouldn't impact anyone not using OSGi. Please take that with a large grain of salt, I just thought you might appreciate an outsider's perspective! thanks, Mike -- ____________________________________________________________ Michael Smoot, Ph.D. Bioengineering Department tel: 858-822-4756 University of California San Diego From markjschreiber at gmail.com Fri May 22 02:59:14 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 22 May 2009 10:59:14 +0800 Subject: [Biojava-dev] an outsider's take on Biojava 3 In-Reply-To: References: Message-ID: <93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com> Thanks for the comments. The OSGi system sounds interesting. I think we should consider it. I have also added two more recommendations for the Design Principles: On Fri, May 22, 2009 at 7:47 AM, Mike Smoot wrote: > Hi Everyone, > > I thought I'd respond to Andreas' request for participation in the BioJava 3 > design discussions that he made last week on the normal BioJava list. ?I'm > the lead developer on the Cytoscape project (http://cytoscape.org), so I > thought I'd provide some perspective on what a project using BioJava might > look for in BioJava 3. > > Basically, I'd just like to voice my strong support for the "Basic > Principles" listed here: http://biojava.org/wiki/BioJava3_Design. ?Finer > granularity of jars, acyclic dependencies, and the separation of API and > implementation are precisely the things we're doing in Cytoscape 3. ?The > first two points will go a long way towards making it easier to use specific > parts of the library without needing everything at once. ?The second point > will allow alternative implementations of certain interfaces, which is one > approach to dealing with issues like parallel vs. non-parallel versions of > algorithms. ?Maven also sounds great. > > If I could add one bullet to the list, it would be to add OSGi metadata to > the jars to allow easy integration with OSGi-based projects (such as > Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven > plugins to make this dead simple and it wouldn't impact anyone not using > OSGi. > > Please take that with a large grain of salt, I just thought you might > appreciate an outsider's perspective! > > thanks, > Mike > > -- > ____________________________________________________________ > Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department > tel: 858-822-4756 ? ? ? ? University of California San Diego > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From markjschreiber at gmail.com Fri May 22 03:01:57 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 22 May 2009 11:01:57 +0800 Subject: [Biojava-dev] an outsider's take on Biojava 3 In-Reply-To: <93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com> References: <93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com> Message-ID: <93b45ca50905212001v70067680mafb8f0bc36f6c497@mail.gmail.com> Sorry, sent before I said what the new principles were. 1. Extensive use of the Logging API 2. (At the risk of having a fatwa declared against me) Most biojava exceptions should derive from RuntimeException and be unchecked See the wiki page for more details. - Mark On Fri, May 22, 2009 at 10:59 AM, Mark Schreiber wrote: > Thanks for the comments. The OSGi system sounds interesting. I think > we should consider it. > > I have also added two more recommendations for the Design Principles: > > > On Fri, May 22, 2009 at 7:47 AM, Mike Smoot wrote: >> Hi Everyone, >> >> I thought I'd respond to Andreas' request for participation in the BioJava 3 >> design discussions that he made last week on the normal BioJava list. ?I'm >> the lead developer on the Cytoscape project (http://cytoscape.org), so I >> thought I'd provide some perspective on what a project using BioJava might >> look for in BioJava 3. >> >> Basically, I'd just like to voice my strong support for the "Basic >> Principles" listed here: http://biojava.org/wiki/BioJava3_Design. ?Finer >> granularity of jars, acyclic dependencies, and the separation of API and >> implementation are precisely the things we're doing in Cytoscape 3. ?The >> first two points will go a long way towards making it easier to use specific >> parts of the library without needing everything at once. ?The second point >> will allow alternative implementations of certain interfaces, which is one >> approach to dealing with issues like parallel vs. non-parallel versions of >> algorithms. ?Maven also sounds great. >> >> If I could add one bullet to the list, it would be to add OSGi metadata to >> the jars to allow easy integration with OSGi-based projects (such as >> Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven >> plugins to make this dead simple and it wouldn't impact anyone not using >> OSGi. >> >> Please take that with a large grain of salt, I just thought you might >> appreciate an outsider's perspective! >> >> thanks, >> Mike >> >> -- >> ____________________________________________________________ >> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department >> tel: 858-822-4756 ? ? ? ? University of California San Diego >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > From holland at eaglegenomics.com Fri May 22 09:02:43 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 22 May 2009 10:02:43 +0100 Subject: [Biojava-dev] an outsider's take on Biojava 3 In-Reply-To: <93b45ca50905212001v70067680mafb8f0bc36f6c497@mail.gmail.com> References: <93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com> <93b45ca50905212001v70067680mafb8f0bc36f6c497@mail.gmail.com> Message-ID: <1242982963.10413.6.camel@buzzybee> RuntimeException is good for things that can't be recovered from. If the user has provided bad coordinates or invalid sequence, that's a recoverable error (because there's a chance that they came from user input via a user interface, which can be corrected and retried). Even file parsing exceptions should be recoverable - the user can move on to the next record without borking the entire file (we already see broken records quite a lot in Genbank downloads). But, for things like being unable to call out to Blast, or being unable to convert DNA to Protein because of a misconfiguration internally somewhere, I agree that RuntimeExceptions are probably best. These are unrecoverable and indicate that changes need to be made to the programming code or BioJava itself. So in my mind then RuntimeExceptions are good for highlighting programming errors, but not good for errors relating to invalid input data. On Fri, 2009-05-22 at 11:01 +0800, Mark Schreiber wrote: > Sorry, sent before I said what the new principles were. > > 1. Extensive use of the Logging API > 2. (At the risk of having a fatwa declared against me) Most biojava > exceptions should derive from RuntimeException and be unchecked > > See the wiki page for more details. > > - Mark > > On Fri, May 22, 2009 at 10:59 AM, Mark Schreiber > wrote: > > Thanks for the comments. The OSGi system sounds interesting. I think > > we should consider it. > > > > I have also added two more recommendations for the Design Principles: > > > > > > On Fri, May 22, 2009 at 7:47 AM, Mike Smoot wrote: > >> Hi Everyone, > >> > >> I thought I'd respond to Andreas' request for participation in the BioJava 3 > >> design discussions that he made last week on the normal BioJava list. I'm > >> the lead developer on the Cytoscape project (http://cytoscape.org), so I > >> thought I'd provide some perspective on what a project using BioJava might > >> look for in BioJava 3. > >> > >> Basically, I'd just like to voice my strong support for the "Basic > >> Principles" listed here: http://biojava.org/wiki/BioJava3_Design. Finer > >> granularity of jars, acyclic dependencies, and the separation of API and > >> implementation are precisely the things we're doing in Cytoscape 3. The > >> first two points will go a long way towards making it easier to use specific > >> parts of the library without needing everything at once. The second point > >> will allow alternative implementations of certain interfaces, which is one > >> approach to dealing with issues like parallel vs. non-parallel versions of > >> algorithms. Maven also sounds great. > >> > >> If I could add one bullet to the list, it would be to add OSGi metadata to > >> the jars to allow easy integration with OSGi-based projects (such as > >> Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven > >> plugins to make this dead simple and it wouldn't impact anyone not using > >> OSGi. > >> > >> Please take that with a large grain of salt, I just thought you might > >> appreciate an outsider's perspective! > >> > >> thanks, > >> Mike > >> > >> -- > >> ____________________________________________________________ > >> Michael Smoot, Ph.D. Bioengineering Department > >> tel: 858-822-4756 University of California San Diego > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andreas at sdsc.edu Mon May 25 04:22:09 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 24 May 2009 21:22:09 -0700 Subject: [Biojava-dev] next steps Message-ID: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> Hi, While talking about design requirements, I think we also need to think pragmatically about how much time we will have to refactor code vs. re-writing modules from scratch. To get started with the next steps, I suggest the following procedure: First thing will be to move to Maven. Then components should be refactored into independent sub-modules. Then the submodules can get improved to follow the new design guidelines. Once we have reached a certain stability with the re-organized code base, we will make the next release. Any comments? If there is general agreement about this, I would take the next step and replace the ant build system with a maven based one. Andreas From andreas at sdsc.edu Mon May 25 15:14:06 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 25 May 2009 08:14:06 -0700 Subject: [Biojava-dev] next steps In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> Message-ID: <59a41c430905250814p2cfcc627h477e688637f50ccb@mail.gmail.com> > build some sort of graph relationship tool. It is also easy enough to start > dragging packages around to different projects in netbeans and resolve > compiler errors. yea, same for Eclipse. The Eclipse Maven plugin allows to auto-convert a project to Maven (quite easy). I have played around with it and it was quite easy to get a mavenized biojava with the dependencies correctly converted. That's why I thought it might be the first step. You suggest to first do the modularization and then the maven meta data. I still have to figure out how to make make independent submodules as part of Maven in eclipse now.... let me play around a bit more and see how it goes... The package list sounds good and java 1.6 too. Andreas > > The advantage of smaller tightly group functional jars is that it allows you > to have more frequent minor releases with out updating and releasing the > entire biojava package. It also allows individuals to own a smaller block of > code for unit test, documentation and examples. > > With Maven this becomes less of an issue to worry about multiple parts and > pieces and their relationships. I think we need to divide up into a > reasonable approximation of the jars before doing the meta data for maven. > > Looking at the current package structure this is an attempt of grouping > jars. I do not have enough code familiarity with all of biojava so this is > strictly based on package names. > > biojava-core Any classes that organize data structures and would probably > include org.biojava.bio.seq.*. Any utility classes that can be used by other > packages org.biojava.utils.* > > biojava-structure org.biojava.bio.structure.* > > biojava-gui org.biojava.bio.gui > > biojava-phylo A package that has a few classes for viewing trees structures > using the jgrapht-jdk package. I need to play with the code and see if it > actually uses graph generated by jgrapht for anything special. I have code > that will render a tree as a simple graphic. I have used jgrapht?for other > projects so it is not a bad "graphing" package for network diagrams. It > could be refactored out. > > Not sure how to tackle the org.biojava.bio.program package as it seems to > have lots of distinct functional code. > > biojava-ws-blast - A web service approach to doing blast. The api would hide > the web services call > > biojava-blast - Blast parsing code. We could have one package for anything > blast related > > biojava-ws-clustalw - A web services approach to doing clustalw multiple > sequence alignment The api would hide the web services call > > biojava-alignment - Code for doing sequence alignment. We could have one > package for anything alignment related > > Does anyone know if you can get usage statistics from maven as to what jar > files are being downloaded? This would help provide good statistics on what > code is being used which will help focus on improvements in documentation > etc. > > I assume we are going to make Java 1.6 the minimum requirement moving > forward? This simplifies some of the web services api requirements to > minimize the number of external packages that are required. > > > Scooter > > > > > > > > ________________________________ > From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic > Sent: Mon 5/25/2009 12:22 AM > To: biojava-dev at lists.open-bio.org > Subject: [Biojava-dev] next steps > > Hi, > > While talking about design requirements, I think we also need to think > pragmatically about how much time we will have to refactor code vs. > re-writing modules from scratch. To get started with the next steps, I > ?suggest the following procedure: First thing will be to move to > Maven. Then components should be refactored into independent > sub-modules. Then the submodules can get improved to follow the new > design guidelines. Once we have reached a certain stability with the > re-organized code base, we will make the next release. > > Any comments? If there is general agreement about this, I would take > the next step and replace the ant build system with a maven based one. > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From HWillis at scripps.edu Mon May 25 14:48:50 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 25 May 2009 10:48:50 -0400 Subject: [Biojava-dev] next steps References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> Andreas I was looking at the biojava code yesterday to see how easy it would be to divide up into functionally grouped jars based on package hierarchy. I tried to find some refactoring tools that would give a network graph view of class relationships. It is simple enough to parse source for import statements and build some sort of graph relationship tool. It is also easy enough to start dragging packages around to different projects in netbeans and resolve compiler errors. The advantage of smaller tightly group functional jars is that it allows you to have more frequent minor releases with out updating and releasing the entire biojava package. It also allows individuals to own a smaller block of code for unit test, documentation and examples. With Maven this becomes less of an issue to worry about multiple parts and pieces and their relationships. I think we need to divide up into a reasonable approximation of the jars before doing the meta data for maven. Looking at the current package structure this is an attempt of grouping jars. I do not have enough code familiarity with all of biojava so this is strictly based on package names. biojava-core Any classes that organize data structures and would probably include org.biojava.bio.seq.*. Any utility classes that can be used by other packages org.biojava.utils.* biojava-structure org.biojava.bio.structure.* biojava-gui org.biojava.bio.gui biojava-phylo A package that has a few classes for viewing trees structures using the jgrapht-jdk package. I need to play with the code and see if it actually uses graph generated by jgrapht for anything special. I have code that will render a tree as a simple graphic. I have used jgrapht for other projects so it is not a bad "graphing" package for network diagrams. It could be refactored out. Not sure how to tackle the org.biojava.bio.program package as it seems to have lots of distinct functional code. biojava-ws-blast - A web service approach to doing blast. The api would hide the web services call biojava-blast - Blast parsing code. We could have one package for anything blast related biojava-ws-clustalw - A web services approach to doing clustalw multiple sequence alignment The api would hide the web services call biojava-alignment - Code for doing sequence alignment. We could have one package for anything alignment related Does anyone know if you can get usage statistics from maven as to what jar files are being downloaded? This would help provide good statistics on what code is being used which will help focus on improvements in documentation etc. I assume we are going to make Java 1.6 the minimum requirement moving forward? This simplifies some of the web services api requirements to minimize the number of external packages that are required. Scooter ________________________________ From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic Sent: Mon 5/25/2009 12:22 AM To: biojava-dev at lists.open-bio.org Subject: [Biojava-dev] next steps Hi, While talking about design requirements, I think we also need to think pragmatically about how much time we will have to refactor code vs. re-writing modules from scratch. To get started with the next steps, I suggest the following procedure: First thing will be to move to Maven. Then components should be refactored into independent sub-modules. Then the submodules can get improved to follow the new design guidelines. Once we have reached a certain stability with the re-organized code base, we will make the next release. Any comments? If there is general agreement about this, I would take the next step and replace the ant build system with a maven based one. Andreas _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From msmoot at ucsd.edu Mon May 25 17:07:57 2009 From: msmoot at ucsd.edu (Mike Smoot) Date: Mon, 25 May 2009 10:07:57 -0700 Subject: [Biojava-dev] next steps In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> Message-ID: On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: > > I was looking at the biojava code yesterday to see how easy it would be to > divide up into functionally grouped jars based on package hierarchy. I tried > to find some refactoring tools that would give a network graph view of class > relationships. It is simple enough to parse source for import statements and > build some sort of graph relationship tool. It is also easy enough to start > dragging packages around to different projects in netbeans and resolve > compiler errors. > JDepend is a nice tool for evaluating package dependencies. http://www.clarkware.com/software/JDepend.html Mike -- ____________________________________________________________ Michael Smoot, Ph.D. Bioengineering Department tel: 858-822-4756 University of California San Diego From HWillis at scripps.edu Mon May 25 22:59:10 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 25 May 2009 18:59:10 -0400 Subject: [Biojava-dev] next steps References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> I attached the JDepend output for BioJava. This will help on the circular dependencies where core classes should not have dependencies on other packages and if they do it should be refactored into the core class. Scooter ________________________________ From: mike.smoot at gmail.com on behalf of Mike Smoot Sent: Mon 5/25/2009 1:07 PM To: Scooter Willis Cc: Andreas Prlic; biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] next steps On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: I was looking at the biojava code yesterday to see how easy it would be to divide up into functionally grouped jars based on package hierarchy. I tried to find some refactoring tools that would give a network graph view of class relationships. It is simple enough to parse source for import statements and build some sort of graph relationship tool. It is also easy enough to start dragging packages around to different projects in netbeans and resolve compiler errors. JDepend is a nice tool for evaluating package dependencies. http://www.clarkware.com/software/JDepend.html Mike -- ____________________________________________________________ Michael Smoot, Ph.D. Bioengineering Department tel: 858-822-4756 University of California San Diego -------------- next part -------------- A non-text attachment was scrubbed... Name: report.xml Type: text/xml Size: 567706 bytes Desc: report.xml URL: From andreas at sdsc.edu Thu May 28 04:31:15 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 27 May 2009 21:31:15 -0700 Subject: [Biojava-dev] next steps In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> Message-ID: <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> Hi Scooter, quick update: There is also an eclipse plugin for JDepend, that provides a user interface to browse thought the dependencies. As I already mentioned earlier, I had some quick progress with the maven plugin to convert the project to maven and create a first pom. At the moment I am testing how best to create sub-projects that should contain the modules. The plugin does not seem to make it easy to create new modules, so I agree with your earlier suggestion that it is best to modularize first and the mavenize 2nd... Should we create a branch in svn and play around with refactoring there and once we are happy with it we can switch that branch to become the trunk? Andreas On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: > I attached the JDepend output for BioJava. This will help on the circular > dependencies where core classes should not have dependencies on other > packages and if they do it should be refactored into the core class. > > Scooter > ________________________________ > From: mike.smoot at gmail.com on behalf of Mike Smoot > Sent: Mon 5/25/2009 1:07 PM > To: Scooter Willis > Cc: Andreas Prlic; biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] next steps > > > > On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: >> >> I was looking at the biojava code yesterday to see how easy it would be to >> divide up into functionally grouped jars based on package hierarchy. I tried >> to find some refactoring tools that would give a network graph view of class >> relationships. It is simple enough to parse source for import statements and >> build some sort of graph relationship tool. It is also easy enough to start >> dragging packages around to different projects in netbeans and resolve >> compiler errors. > > JDepend is a nice tool for evaluating package dependencies. > > http://www.clarkware.com/software/JDepend.html > > > Mike > > -- > ____________________________________________________________ > Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department > tel: 858-822-4756 ? ? ? ? University of California San Diego > From juberpatel at gmail.com Thu May 28 07:09:29 2009 From: juberpatel at gmail.com (juber patel) Date: Thu, 28 May 2009 12:39:29 +0530 Subject: [Biojava-dev] next steps In-Reply-To: <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> Message-ID: just a small observation: Maven may not be easy to use and switch to maven should be done after some consideration. I have personally not used it, but have seen people on the Mahout list struggling with maven. Its utility may not justify its complexity. juber On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic wrote: > Hi Scooter, > > quick update: There is also an eclipse plugin for JDepend, that > provides a user interface to browse thought the dependencies. > > As I already mentioned earlier, I had some quick progress with the > maven plugin to convert the project to maven and create a first pom. > At the moment I am testing how ?best to create ?sub-projects that > should contain the modules. ?The plugin does not seem to make it easy > to create new modules, so I agree with your earlier suggestion that it > is best to modularize first and the mavenize 2nd... Should we create a > branch in svn and play around with refactoring there and once we are > happy with it we can switch that branch to become the trunk? > > Andreas > > > > > On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: >> I attached the JDepend output for BioJava. This will help on the circular >> dependencies where core classes should not have dependencies on other >> packages and if they do it should be refactored into the core class. >> >> Scooter >> ________________________________ >> From: mike.smoot at gmail.com on behalf of Mike Smoot >> Sent: Mon 5/25/2009 1:07 PM >> To: Scooter Willis >> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org >> Subject: Re: [Biojava-dev] next steps >> >> >> >> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: >>> >>> I was looking at the biojava code yesterday to see how easy it would be to >>> divide up into functionally grouped jars based on package hierarchy. I tried >>> to find some refactoring tools that would give a network graph view of class >>> relationships. It is simple enough to parse source for import statements and >>> build some sort of graph relationship tool. It is also easy enough to start >>> dragging packages around to different projects in netbeans and resolve >>> compiler errors. >> >> JDepend is a nice tool for evaluating package dependencies. >> >> http://www.clarkware.com/software/JDepend.html >> >> >> Mike >> >> -- >> ____________________________________________________________ >> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department >> tel: 858-822-4756 ? ? ? ? University of California San Diego >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Juber Patel http://juberpatel.googlepages.com From holland at eaglegenomics.com Thu May 28 06:55:28 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 28 May 2009 07:55:28 +0100 Subject: [Biojava-dev] next steps In-Reply-To: <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> Message-ID: <1243493728.5260.1.camel@buzzybee> I found when creating modules for the testbed biojava3 that it was easier to do it by hand. Only two things need to be done - first of all a list of modules needs to be added to the parent pom.xml of the project, then each module has its own pom.xml referencing the parent pom.xml. Once created this way it only takes a project refresh in Eclipse/NetBeans for the new module to show up. See the example pom.xmls under the old biojava3 branch for details. cheers, Richard On Wed, 2009-05-27 at 21:31 -0700, Andreas Prlic wrote: > Hi Scooter, > > quick update: There is also an eclipse plugin for JDepend, that > provides a user interface to browse thought the dependencies. > > As I already mentioned earlier, I had some quick progress with the > maven plugin to convert the project to maven and create a first pom. > At the moment I am testing how best to create sub-projects that > should contain the modules. The plugin does not seem to make it easy > to create new modules, so I agree with your earlier suggestion that it > is best to modularize first and the mavenize 2nd... Should we create a > branch in svn and play around with refactoring there and once we are > happy with it we can switch that branch to become the trunk? > > Andreas > > > > > On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: > > I attached the JDepend output for BioJava. This will help on the circular > > dependencies where core classes should not have dependencies on other > > packages and if they do it should be refactored into the core class. > > > > Scooter > > ________________________________ > > From: mike.smoot at gmail.com on behalf of Mike Smoot > > Sent: Mon 5/25/2009 1:07 PM > > To: Scooter Willis > > Cc: Andreas Prlic; biojava-dev at lists.open-bio.org > > Subject: Re: [Biojava-dev] next steps > > > > > > > > On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: > >> > >> I was looking at the biojava code yesterday to see how easy it would be to > >> divide up into functionally grouped jars based on package hierarchy. I tried > >> to find some refactoring tools that would give a network graph view of class > >> relationships. It is simple enough to parse source for import statements and > >> build some sort of graph relationship tool. It is also easy enough to start > >> dragging packages around to different projects in netbeans and resolve > >> compiler errors. > > > > JDepend is a nice tool for evaluating package dependencies. > > > > http://www.clarkware.com/software/JDepend.html > > > > > > Mike > > > > -- > > ____________________________________________________________ > > Michael Smoot, Ph.D. Bioengineering Department > > tel: 858-822-4756 University of California San Diego > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Thu May 28 08:16:05 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 28 May 2009 09:16:05 +0100 Subject: [Biojava-dev] next steps In-Reply-To: References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> Message-ID: <4A1E4845.8080906@ebi.ac.uk> Maven's big plus points are easy integration into just about any IDE & its transitive dependency management capability. On a project like BioJava (need people to get setup & running quickly over a wide range of development environments) these two points really make it one of the only viable choices I can would use. This isn't to say the other build systems are not as good/better (rake, raven, gant, gradle, ant) just they do not fit the bill as well. Andy juber patel wrote: > just a small observation: > > Maven may not be easy to use and switch to maven should be done after > some consideration. I have personally not used it, but have seen > people on the Mahout list struggling with maven. Its utility may not > justify its complexity. > > juber > > > On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic wrote: >> Hi Scooter, >> >> quick update: There is also an eclipse plugin for JDepend, that >> provides a user interface to browse thought the dependencies. >> >> As I already mentioned earlier, I had some quick progress with the >> maven plugin to convert the project to maven and create a first pom. >> At the moment I am testing how best to create sub-projects that >> should contain the modules. The plugin does not seem to make it easy >> to create new modules, so I agree with your earlier suggestion that it >> is best to modularize first and the mavenize 2nd... Should we create a >> branch in svn and play around with refactoring there and once we are >> happy with it we can switch that branch to become the trunk? >> >> Andreas >> >> >> >> >> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: >>> I attached the JDepend output for BioJava. This will help on the circular >>> dependencies where core classes should not have dependencies on other >>> packages and if they do it should be refactored into the core class. >>> >>> Scooter >>> ________________________________ >>> From: mike.smoot at gmail.com on behalf of Mike Smoot >>> Sent: Mon 5/25/2009 1:07 PM >>> To: Scooter Willis >>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org >>> Subject: Re: [Biojava-dev] next steps >>> >>> >>> >>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: >>>> I was looking at the biojava code yesterday to see how easy it would be to >>>> divide up into functionally grouped jars based on package hierarchy. I tried >>>> to find some refactoring tools that would give a network graph view of class >>>> relationships. It is simple enough to parse source for import statements and >>>> build some sort of graph relationship tool. It is also easy enough to start >>>> dragging packages around to different projects in netbeans and resolve >>>> compiler errors. >>> JDepend is a nice tool for evaluating package dependencies. >>> >>> http://www.clarkware.com/software/JDepend.html >>> >>> >>> Mike >>> >>> -- >>> ____________________________________________________________ >>> Michael Smoot, Ph.D. Bioengineering Department >>> tel: 858-822-4756 University of California San Diego >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > From james at carmanconsulting.com Thu May 28 09:37:53 2009 From: james at carmanconsulting.com (James Carman) Date: Thu, 28 May 2009 05:37:53 -0400 Subject: [Biojava-dev] next steps In-Reply-To: References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> Message-ID: Maven really isn't that hard. I have no idea what the Mahout folks are having troubles with, but I'm sure it can be addressed. Maven't benefits greatly outweigh its complexity (which isn't that high, IMHO). If you folks want a hand "mavenizing" your project, I wouldn't mind helping. On Thu, May 28, 2009 at 3:09 AM, juber patel wrote: > just a small observation: > > Maven may not be easy to use and switch to maven should be done after > some consideration. I have personally not used it, but have seen > people on the Mahout list struggling with maven. Its utility may not > justify its complexity. > > juber > > > On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic wrote: >> Hi Scooter, >> >> quick update: There is also an eclipse plugin for JDepend, that >> provides a user interface to browse thought the dependencies. >> >> As I already mentioned earlier, I had some quick progress with the >> maven plugin to convert the project to maven and create a first pom. >> At the moment I am testing how ?best to create ?sub-projects that >> should contain the modules. ?The plugin does not seem to make it easy >> to create new modules, so I agree with your earlier suggestion that it >> is best to modularize first and the mavenize 2nd... Should we create a >> branch in svn and play around with refactoring there and once we are >> happy with it we can switch that branch to become the trunk? >> >> Andreas >> >> >> >> >> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: >>> I attached the JDepend output for BioJava. This will help on the circular >>> dependencies where core classes should not have dependencies on other >>> packages and if they do it should be refactored into the core class. >>> >>> Scooter >>> ________________________________ >>> From: mike.smoot at gmail.com on behalf of Mike Smoot >>> Sent: Mon 5/25/2009 1:07 PM >>> To: Scooter Willis >>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org >>> Subject: Re: [Biojava-dev] next steps >>> >>> >>> >>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: >>>> >>>> I was looking at the biojava code yesterday to see how easy it would be to >>>> divide up into functionally grouped jars based on package hierarchy. I tried >>>> to find some refactoring tools that would give a network graph view of class >>>> relationships. It is simple enough to parse source for import statements and >>>> build some sort of graph relationship tool. It is also easy enough to start >>>> dragging packages around to different projects in netbeans and resolve >>>> compiler errors. >>> >>> JDepend is a nice tool for evaluating package dependencies. >>> >>> http://www.clarkware.com/software/JDepend.html >>> >>> >>> Mike >>> >>> -- >>> ____________________________________________________________ >>> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department >>> tel: 858-822-4756 ? ? ? ? University of California San Diego >>> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > > -- > Juber Patel ? ? ? ?http://juberpatel.googlepages.com > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From HWillis at scripps.edu Thu May 28 13:10:43 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 May 2009 09:10:43 -0400 Subject: [Biojava-dev] next steps References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C861@FLMAIL1.fl.ad.scripps.edu> Maven should be viewed as an additional option for developers where once a version of BioJava is released the Maven repository is updated and we need to make sure we have all the meta-data/dependency information correct. This doesn't mean that BioJava development needs to be done in Maven but simply is another way to get the jars after they have been released. BioJava as a single Jar is not that hard to integrate into your project given that we have a handful of external jars files that we provide as part of the download. For other projects I have worked with where they only package the jar for that project and then give you web links to download 10 other external projects then that is a pain. You go to each website to figure out the download process and find that they are now all in different releases then Maven is a great solution because the developers of biojava took the time to get the exact version of jar files from external packages referenced properly and did not leave it to the "customer" to figure out. If we use apache commons as a model I personally would rather grab the package of interest say biojava-blast and add into my development environment. Maven is an Apache project yet when you go to http://commons.apache.org/ and grab the component of interest Maven isn't even listed as an option. This is probably because it is an overkill for a single jar. Doesn't mean that you can't get commons jar's via maven when you load a larger project. In our case we may have a couple components where it can get a little complicated by external jar dependencies. Using biojava-blast as an example where it has a web service client that is either using axis or the latest greatest sun JSR. The project I am importing biojava-blast via Maven into already uses axis but an older version because everything works and I haven't needed to do the upgrade. Maven may make the integration step easier but it doesn't solve the problem where I as the developer now need to do something to resolve the version conflicts. So I view Maven as a nice option for developers who are a big fan of Maven and makes them smile when they can grab the code they need from BioJava via Maven. We should plan on having an apache commons like page to download and publish the jars in maven as well. Scooter ________________________________ From: biojava-dev-bounces at lists.open-bio.org on behalf of James Carman Sent: Thu 5/28/2009 5:37 AM To: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] next steps Maven really isn't that hard. I have no idea what the Mahout folks are having troubles with, but I'm sure it can be addressed. Maven't benefits greatly outweigh its complexity (which isn't that high, IMHO). If you folks want a hand "mavenizing" your project, I wouldn't mind helping. On Thu, May 28, 2009 at 3:09 AM, juber patel wrote: > just a small observation: > > Maven may not be easy to use and switch to maven should be done after > some consideration. I have personally not used it, but have seen > people on the Mahout list struggling with maven. Its utility may not > justify its complexity. > > juber > > > On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic wrote: >> Hi Scooter, >> >> quick update: There is also an eclipse plugin for JDepend, that >> provides a user interface to browse thought the dependencies. >> >> As I already mentioned earlier, I had some quick progress with the >> maven plugin to convert the project to maven and create a first pom. >> At the moment I am testing how best to create sub-projects that >> should contain the modules. The plugin does not seem to make it easy >> to create new modules, so I agree with your earlier suggestion that it >> is best to modularize first and the mavenize 2nd... Should we create a >> branch in svn and play around with refactoring there and once we are >> happy with it we can switch that branch to become the trunk? >> >> Andreas >> >> >> >> >> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: >>> I attached the JDepend output for BioJava. This will help on the circular >>> dependencies where core classes should not have dependencies on other >>> packages and if they do it should be refactored into the core class. >>> >>> Scooter >>> ________________________________ >>> From: mike.smoot at gmail.com on behalf of Mike Smoot >>> Sent: Mon 5/25/2009 1:07 PM >>> To: Scooter Willis >>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org >>> Subject: Re: [Biojava-dev] next steps >>> >>> >>> >>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: >>>> >>>> I was looking at the biojava code yesterday to see how easy it would be to >>>> divide up into functionally grouped jars based on package hierarchy. I tried >>>> to find some refactoring tools that would give a network graph view of class >>>> relationships. It is simple enough to parse source for import statements and >>>> build some sort of graph relationship tool. It is also easy enough to start >>>> dragging packages around to different projects in netbeans and resolve >>>> compiler errors. >>> >>> JDepend is a nice tool for evaluating package dependencies. >>> >>> http://www.clarkware.com/software/JDepend.html >>> >>> >>> Mike >>> >>> -- >>> ____________________________________________________________ >>> Michael Smoot, Ph.D. Bioengineering Department >>> tel: 858-822-4756 University of California San Diego >>> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > > -- > Juber Patel http://juberpatel.googlepages.com > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Thu May 28 13:37:27 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 May 2009 09:37:27 -0400 Subject: [Biojava-dev] BioJava BLAST web services Message-ID: <061BFD133FA1584693D19C79A0072F5F76C863@FLMAIL1.fl.ad.scripps.edu> I am planning on doing some testing of a couple BLAST web services interfaces(assuming more than one exists) and see what they truly have in common and see how that would impact a BJ3 front end to multiple providers. My assumption is that they will be the same. I noticed on the NCBI Blast implementations the user was required to pass their email address as part of the web service call. They are concerned with abuse from external processes and they only allow one sequence per request. Same-Same but different is always fun! >From wikipedia the following are listed as BLAST resources where more than one may offer a web service interface. Should BioJava3 try and support more than one? Thanks Scooter Variations of BLAST * WU-BLAST - the original gapping BLAST with statistics, developed and maintained by Warren Gish at Washington University in St. Louis * EBI's BLAST Services - EBI's main blast services page. * FSA-BLAST - a new, faster but still accurate version of NCBI BLAST based on recently published algorithmic improvements * NBIC mpiBLAST - at the Netherlands Bioinformatics Centre * Parallel BLAST - a dual scheduling BLAST tested on the Blue Gene/L * mpiBLAST - open-source parallel BLAST * A/G BLAST - implementation for PowerPC G4/G5 processors and Mac OS X, from Apple Computer 's Advanced Computation Group and Genentech . * STRAP - the protein workbench STRAP contains a comfortable BLAST front-end with a cache for BLAST results [edit ] Commercial versions * ThermoBLAST by DNA Software Inc. - scans entire genomes quickly and accurately combing the power of BLAST with the most advanced thermodynamics parameters * PatternHunter - an alternative software which provides similar functionality to BLAST while claiming increased speed and sensitivity * KoriBlast - a reliable graphical environment dedicated to sequence data mining. KoriBlast combines Blast searches with advanced data management capabilities and a state-of-the-art graphical user interface. * microbial identification BLAST - a quality controlled database for in-vitro diagnostics. SepsiTest combines broad-range-PCR using ultra-pure reagents with Blast searches in a quality controlled environment. From james at carmanconsulting.com Thu May 28 13:45:23 2009 From: james at carmanconsulting.com (James Carman) Date: Thu, 28 May 2009 09:45:23 -0400 Subject: [Biojava-dev] next steps In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C861@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C861@FLMAIL1.fl.ad.scripps.edu> Message-ID: I would say that you should use the Apache Commons projects as a model (I'm an Apache Commons PMC member, so I'm a bit biased). The maven-generated site will include information on the dependencies (including whether they are optional and where you can get them provided the other projects play nicely and include that information). And, yes, when you *do* use Maven, it will download all required transitive dependencies for you and add it to your classpath automagically. That's why it's so nice. Well, that's one of the MANY reasons it's so nice. The release plugin also saves a LOT of headaches, if you ask me (once you get it configured properly). On Thu, May 28, 2009 at 9:10 AM, Scooter Willis wrote: > Maven should be viewed as an additional option for developers where once a > version of BioJava is released the Maven repository is updated and we need > to make sure we have all the meta-data/dependency information correct. This > doesn't mean that BioJava development needs to be done in Maven but simply > is another way to get the jars after they have been released. BioJava as a > single Jar is not that hard to integrate into your project given that we > have a handful of external jars files that? we provide as part of the > download. For other projects I have worked with where they only package the > jar for that project and then give you web links to download 10 other > external projects then that is a pain.?You go to each website to figure out > the download process and find that they are now all in different releases > then Maven is a great solution because the developers of biojava took the > time to get the exact version of jar files from external packages referenced > properly and did not leave it to the "customer" to figure out. > > If we use apache commons as a model I personally?would rather grab the > package of interest say biojava-blast and add into my development > environment. Maven is an Apache project yet when you go to > http://commons.apache.org/?and?grab the component of interest Maven isn't > even listed as an option. This is probably because it is an overkill for a > single?jar. Doesn't mean that you can't get?commons?jar's via maven when you > load a larger project. > > In our case we may have a couple components where it can get a little > complicated by external jar dependencies. Using biojava-blast as an example > where it?has a web service client that is either using axis or the latest > greatest sun JSR. The project I am importing biojava-blast via Maven into > already uses axis but an older version because everything works and I > haven't needed to? do the upgrade. Maven may make the integration step > easier but it doesn't solve the problem where I as the developer now need to > do? something to resolve the version conflicts. > > So I view Maven as a nice option for developers who are a big fan of Maven > and makes them smile when they can grab the code they need from BioJava via > Maven. We should plan on having an apache commons like page to download and > publish the jars in maven as well. > > Scooter > ________________________________ > From: biojava-dev-bounces at lists.open-bio.org on behalf of James Carman > Sent: Thu 5/28/2009 5:37 AM > To: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] next steps > > Maven really isn't that hard.? I have no idea what the Mahout folks > are having troubles with, but I'm sure it can be addressed.? Maven't > benefits greatly outweigh its complexity (which isn't that high, > IMHO).? If you folks want a hand "mavenizing" your project, I wouldn't > mind helping. > > On Thu, May 28, 2009 at 3:09 AM, juber patel wrote: >> just a small observation: >> >> Maven may not be easy to use and switch to maven should be done after >> some consideration. I have personally not used it, but have seen >> people on the Mahout list struggling with maven. Its utility may not >> justify its complexity. >> >> juber >> >> >> On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic wrote: >>> Hi Scooter, >>> >>> quick update: There is also an eclipse plugin for JDepend, that >>> provides a user interface to browse thought the dependencies. >>> >>> As I already mentioned earlier, I had some quick progress with the >>> maven plugin to convert the project to maven and create a first pom. >>> At the moment I am testing how ?best to create ?sub-projects that >>> should contain the modules. ?The plugin does not seem to make it easy >>> to create new modules, so I agree with your earlier suggestion that it >>> is best to modularize first and the mavenize 2nd... Should we create a >>> branch in svn and play around with refactoring there and once we are >>> happy with it we can switch that branch to become the trunk? >>> >>> Andreas >>> >>> >>> >>> >>> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis >>> wrote: >>>> I attached the JDepend output for BioJava. This will help on the >>>> circular >>>> dependencies where core classes should not have dependencies on other >>>> packages and if they do it should be refactored into the core class. >>>> >>>> Scooter >>>> ________________________________ >>>> From: mike.smoot at gmail.com on behalf of Mike Smoot >>>> Sent: Mon 5/25/2009 1:07 PM >>>> To: Scooter Willis >>>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org >>>> Subject: Re: [Biojava-dev] next steps >>>> >>>> >>>> >>>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis >>>> wrote: >>>>> >>>>> I was looking at the biojava code yesterday to see how easy it would be >>>>> to >>>>> divide up into functionally grouped jars based on package hierarchy. I >>>>> tried >>>>> to find some refactoring tools that would give a network graph view of >>>>> class >>>>> relationships. It is simple enough to parse source for import >>>>> statements and >>>>> build some sort of graph relationship tool. It is also easy enough to >>>>> start >>>>> dragging packages around to different projects in netbeans and resolve >>>>> compiler errors. >>>> >>>> JDepend is a nice tool for evaluating package dependencies. >>>> >>>> http://www.clarkware.com/software/JDepend.html >>>> >>>> >>>> Mike >>>> >>>> -- >>>> ____________________________________________________________ >>>> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department >>>> tel: 858-822-4756 ? ? ? ? University of California San Diego >>>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> >> >> >> -- >> Juber Patel ? ? ? ?http://juberpatel.googlepages.com >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From andreas at sdsc.edu Thu May 28 16:53:33 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 28 May 2009 09:53:33 -0700 Subject: [Biojava-dev] hierarchical vs flat module organisation Message-ID: <59a41c430905280953w964ab36q7baf1fd5eb21e62a@mail.gmail.com> Hi, from the different posts it seems there are two types of suggestions for how to organize modules: hierarchical vs. flat. I wonder if the best way to organize this is to mix the designs. There could be few top-level modules like core, webservices, phylo, structure. These would be equivalent to projects in the workspace. These can then contain-submodules like webservices-blast-ebi webservices-blast-ncbi webservices-whatever or structure-core structure-viewers The submodules would be sub-folders in the projects. Any thoughts on that? Andreas From HWillis at scripps.edu Thu May 28 18:09:32 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 May 2009 14:09:32 -0400 Subject: [Biojava-dev] hierarchical vs flat module organisation References: <59a41c430905280953w964ab36q7baf1fd5eb21e62a@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C867@FLMAIL1.fl.ad.scripps.edu> Andreas I think the organization should make the most sense to the user of BioJava and should be functionally grouped. I show up looking for specific biology algorithms/code. Blast, Sequence Alignment, Tree construction etc. In that module I would then find different features that I can then explore to solve the problem. The question becomes would I pick a module based on how it solved the problem. Given that BioJava does not have a native solution do to BLAST nor does the developer want to deal with all the configuration the BLAST-web services call simply becomes the only choice. The results of parsing a BLAST output and making a BLAST web service call should be the same structured result where I would then use other BioJava api's against the results. I think we should group by function an that gives the developer a collection of tools to work with. Scooter ________________________________ From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic Sent: Thu 5/28/2009 12:53 PM To: biojava-dev Subject: [Biojava-dev] hierarchical vs flat module organisation Hi, from the different posts it seems there are two types of suggestions for how to organize modules: hierarchical vs. flat. I wonder if the best way to organize this is to mix the designs. There could be few top-level modules like core, webservices, phylo, structure. These would be equivalent to projects in the workspace. These can then contain-submodules like webservices-blast-ebi webservices-blast-ncbi webservices-whatever or structure-core structure-viewers The submodules would be sub-folders in the projects. Any thoughts on that? Andreas _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Thu May 28 17:57:27 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 May 2009 13:57:27 -0400 Subject: [Biojava-dev] next steps References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com><061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu><061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C864@FLMAIL1.fl.ad.scripps.edu> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C866@FLMAIL1.fl.ad.scripps.edu> Andreas I think each jar probably needs its own svn trunk. This is how apache commons is setup. The advantage of this is that everything is modularized with nice defined boundaries on dependencies. If you have once source tree that builds multiple jars then it becomes very easy to grab a class from another jar and forcing additional dependencies. You also don't need to worry about a single user having access to the entire source tree. If you have a new developer who wants to get involved with a specific interest then easy to give him access to that package without worrying about breaking other packages. Do you think we should call the functional grouping packages or modules or something else? If you take a wack at the refactoring based on X number of modules then you could check each one in a different subversion trunk. Each module will probably have a dependency on biojava-core which will also be a separate subversion trunk. In Netbeans I would setup a project for each and then I can add the biojava-core project as an external project dependency. This also allows each module to be released independently and more frequently. We probably need to come up with a versioning convention that is part of the jar name. Not sure if any of the ant build tools automate the upticking of major/minor version number when packaging jars. For the user of biojava they would download a single jar for the module of interest where the download contains all the external jars that are required including biojava-core. For maven that would be done via POM. As part of the refactoring now is the time to make any major namespace changes you want to make. I assume that eclipse refactoring makes this easy. Check all the code in and BioJava3 has begun! Scooter ________________________________ From: andreas.prlic at gmail.com on behalf of Andreas Prlic Sent: Thu 5/28/2009 12:31 AM To: Scooter Willis Cc: biojava-dev Subject: Re: [Biojava-dev] next steps Hi Scooter, quick update: There is also an eclipse plugin for JDepend, that provides a user interface to browse thought the dependencies. As I already mentioned earlier, I had some quick progress with the maven plugin to convert the project to maven and create a first pom. At the moment I am testing how best to create sub-projects that should contain the modules. The plugin does not seem to make it easy to create new modules, so I agree with your earlier suggestion that it is best to modularize first and the mavenize 2nd... Should we create a branch in svn and play around with refactoring there and once we are happy with it we can switch that branch to become the trunk? Andreas On Mon, May 25, 2009 at 3:59 PM, Scooter Willis wrote: > I attached the JDepend output for BioJava. This will help on the circular > dependencies where core classes should not have dependencies on other > packages and if they do it should be refactored into the core class. > > Scooter > ________________________________ > From: mike.smoot at gmail.com on behalf of Mike Smoot > Sent: Mon 5/25/2009 1:07 PM > To: Scooter Willis > Cc: Andreas Prlic; biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] next steps > > > > On Mon, May 25, 2009 at 7:48 AM, Scooter Willis wrote: >> >> I was looking at the biojava code yesterday to see how easy it would be to >> divide up into functionally grouped jars based on package hierarchy. I tried >> to find some refactoring tools that would give a network graph view of class >> relationships. It is simple enough to parse source for import statements and >> build some sort of graph relationship tool. It is also easy enough to start >> dragging packages around to different projects in netbeans and resolve >> compiler errors. > > JDepend is a nice tool for evaluating package dependencies. > > http://www.clarkware.com/software/JDepend.html > > > Mike > > -- > ____________________________________________________________ > Michael Smoot, Ph.D. Bioengineering Department > tel: 858-822-4756 University of California San Diego > From andreas.prlic at gmail.com Fri May 29 04:53:22 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Thu, 28 May 2009 21:53:22 -0700 Subject: [Biojava-dev] next steps In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C866@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C864@FLMAIL1.fl.ad.scripps.edu> <061BFD133FA1584693D19C79A0072F5F76C866@FLMAIL1.fl.ad.scripps.edu> Message-ID: <59a41c430905282153r5c82b7cfp1648807b6042eaf5@mail.gmail.com> > I think each jar probably needs its own svn trunk. This is how apache > commons is setup. The advantage of this is that everything is modularized > with nice defined boundaries on dependencies. If you have once source tree > that builds multiple jars then it becomes very easy to grab a class from > another jar and forcing additional dependencies. sounds good. Guess it might be good not to have too many .jar files in the end as well. > You also don't need to worry about a single user having access to the entire > source tree. If you have a new developer who wants to get involved with a > specific interest then easy to give him access to that package without > worrying about breaking other packages. might be useful in the future. For now I think it is good enough to give developers write access to all of biojava. > > Do you think we should call the functional grouping packages or modules or > something else? What about: we call a toplevel project, a package. A package can then consist of several modules. Not sure if we should have a jar per package or per module. > If you take a wack at the refactoring based on X number of modules then you > could check each one in a different subversion trunk. Each module will > probably have a dependency on biojava-core which will also be a separate > subversion trunk. In Netbeans I would setup a project for each and then I > can add the biojava-core project as an external project dependency. Sounds good and you would do the same in eclipse. This > also allows each module to be released independently and more frequently. We > probably need to come up with a versioning convention that is part of the > jar name. I think we should stick to the maven naming conventions. http://maven.apache.org/guides/mini/guide-naming-conventions.html e.g. groupId org.biojava.phylo for the phylogenetic package artifactId biojava-phylo version 3.0.0 or 3.0.0-SNAPSHOT if it is a nightly build Not sure if any of the ant build tools automate the upticking of > major/minor version number when packaging jars. Not sure about ant, but maven has a built in release plugin. if it is set up correctly you can just write mvn release:prepare and the release is being prepared... > As part of the refactoring now is the time to make any major namespace > changes you want to make. I assume that eclipse refactoring makes this easy. Namespace changes are tricky. In principle I don;t want to break backwards compatibility with the existing code base. On the other side having package names starting with org.biojava.structure, rather than org.biojava.bio.structure would be simpler. If in doubt I am for backwards compatibility. One case where I would like to see a change is the core blast parsing modules. org.biojava.bio.program.sax does not indicate at all that this has to do with blast. Andreas From heuermh at acm.org Fri May 29 16:29:04 2009 From: heuermh at acm.org (Michael Heuer) Date: Fri, 29 May 2009 12:29:04 -0400 (EDT) Subject: [Biojava-dev] next steps In-Reply-To: <59a41c430905282153r5c82b7cfp1648807b6042eaf5@mail.gmail.com> Message-ID: Andreas Prlic wrote: > > I think each jar probably needs its own svn trunk. This is how apache > > commons is setup. The advantage of this is that everything is modularized > > with nice defined boundaries on dependencies. If you have once source tree > > that builds multiple jars then it becomes very easy to grab a class from > > another jar and forcing additional dependencies. > > sounds good. Guess it might be good not to have too many .jar files > in the end as well. > > > You also don't need to worry about a single user having access to the entire > > source tree. If you have a new developer who wants to get involved with a > > specific interest then easy to give him access to that package without > > worrying about breaking other packages. > > might be useful in the future. For now I think it is good enough to > give developers write access to all of biojava. > > > > > > Do you think we should call the functional grouping packages or modules or > > something else? > > What about: we call a toplevel project, a package. A package can then > consist of several modules. Not sure if we should have a jar per > package or per module. > > > > If you take a wack at the refactoring based on X number of modules then you > > could check each one in a different subversion trunk. Each module will > > probably have a dependency on biojava-core which will also be a separate > > subversion trunk. In Netbeans I would setup a project for each and then I > > can add the biojava-core project as an external project dependency. > > Sounds good and you would do the same in eclipse. > > This > > also allows each module to be released independently and more frequently. We > > probably need to come up with a versioning convention that is part of the > > jar name. > > I think we should stick to the maven naming conventions. > http://maven.apache.org/guides/mini/guide-naming-conventions.html > e.g. > groupId org.biojava.phylo for the phylogenetic package > artifactId biojava-phylo > version 3.0.0 or 3.0.0-SNAPSHOT if it is a nightly build > > > Not sure if any of the ant build tools automate the upticking of > > major/minor version number when packaging jars. > > Not sure about ant, but maven has a built in release plugin. if it is > set up correctly you can just write > mvn release:prepare > and the release is being prepared... > > > > As part of the refactoring now is the time to make any major namespace > > changes you want to make. I assume that eclipse refactoring makes this easy. > > Namespace changes are tricky. In principle I don;t want to break > backwards compatibility with the existing code base. On the other side > having package names starting with org.biojava.structure, rather than > org.biojava.bio.structure would be simpler. If in doubt I am for > backwards compatibility. One case where I would like to see a change > is the core blast parsing modules. org.biojava.bio.program.sax does > not indicate at all that this has to do with blast.