From harryzs1981 at gmail.com  Wed May  6 09:13:42 2009
From: harryzs1981 at gmail.com (sheng zhao)
Date: Wed, 6 May 2009 15:13:42 +0200
Subject: [Biojava-dev] Biojava-doc in chm forma
Message-ID: <3d23b1eb0905060613m643adf87sdef55a05a083dd51@mail.gmail.com>

Hi

Where can I find Biojava-doc in chm format??

Thanks !

harry

From andreas at sdsc.edu  Mon May 11 00:26:58 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 10 May 2009 21:26:58 -0700
Subject: [Biojava-dev] Plans for next biojava release - modularization
Message-ID: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>

Hi biojava-devs,

It is time to start working on the next biojava release.  I  would
like to modularize the current code base and apply some of the ideas
that have emerged around Richard's "biojava 3" code. In principle the
idea is that all changes should be backwards compatible with the
interfaces provided by the current biojava 1.7 release.  Backwards
compatibility shall only be broken if the functionality is being
replaced with something that works better, and gets documented
accordingly. For the build functionality I would suggest to stick with
what Richard's biojava 3 code base already is providing. Since we will
try to be backwards compatible all code development should be part of
the biojava-trunk and the first step will be to move the ant-build
scripts to a maven build process. Following this procedure will allow
to use e.g. the code refactoring tools provided by Eclipse, which
should come in handy.

The modules I would like to see should provide self-contained
functionality and cross dependencies should be restricted to a
minimum. I would suggest to have the following modules:

biojava-core: Contains everything that can not easily be modularized
or nobody volunteers to become a module maintainer.
biojava-phylogeny: Scooter expressed some interested to provide such a
module and become package maintainer for it.
biojava-structure: Everything protein structure related. I would be
package maintainer.
biojava-blast: Blast parsing is a frequently requested functionality
and it would be good to have this code self-contained. A package
maintainer for this still will need to be nominated at a later stage.
Any suggestions for other modules?

Let me know what you think about this.

Andreas

From HWillis at scripps.edu  Mon May 11 09:50:58 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Mon, 11 May 2009 09:50:58 -0400
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>
References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>

Andreas

Another theme that should be considered is providing a multi-thread
version of any module with long run time. This would have a couple
elements. A progress listener interface should be standard where core
code would update progress messages to listeners that can be used by
external code to display feedback to the user. I did this with the
Neighbor Joining code for tree construction and it provides needed
feedback in a GUI. If not the user gets frustrated because they don't
know the code they are about to execute may take 10 minutes or 8 hours
to complete and they think the software is not working. The reverse is
also true for canceling an operation where you want to have core code
stop processing a long running loop. Once the code has completed then
the listener interface for process complete is called allowing the next
step in the external code to continue. The developer would have the
choice to call the "process" method or run it in a thread and wait for
the callback complete method to be called. 

This is the first step in the ability to have the core/long running
processes take advantage of multiple threads to complete the
computational task faster. Not all code can be parallelized easily but
if the algorithm can take advantage of running in parallel then it
should. This then opens up a couple of cloud computing frameworks that
extend the multi-threaded concepts in Java across a cluster
http://www.terracotta.org/. If we put an emphasis on having code that
runs well in a thread we are one step closer to an architecture that can
run in a cloud. The computational problems are only going to get bigger
and with Amazon EC2 and http://www.eucalyptus.com/ approaches
computational IO cycles are going to be cheap as long as the
software/libraries can easily take advantage of it.

Thanks

Scooter

-----Original Message-----
From: biojava-dev-bounces at lists.open-bio.org
[mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
Prlic
Sent: Monday, May 11, 2009 12:27 AM
To: biojava-dev
Subject: [Biojava-dev] Plans for next biojava release - modularization

Hi biojava-devs,

It is time to start working on the next biojava release.  I  would
like to modularize the current code base and apply some of the ideas
that have emerged around Richard's "biojava 3" code. In principle the
idea is that all changes should be backwards compatible with the
interfaces provided by the current biojava 1.7 release.  Backwards
compatibility shall only be broken if the functionality is being
replaced with something that works better, and gets documented
accordingly. For the build functionality I would suggest to stick with
what Richard's biojava 3 code base already is providing. Since we will
try to be backwards compatible all code development should be part of
the biojava-trunk and the first step will be to move the ant-build
scripts to a maven build process. Following this procedure will allow
to use e.g. the code refactoring tools provided by Eclipse, which
should come in handy.

The modules I would like to see should provide self-contained
functionality and cross dependencies should be restricted to a
minimum. I would suggest to have the following modules:

biojava-core: Contains everything that can not easily be modularized
or nobody volunteers to become a module maintainer.
biojava-phylogeny: Scooter expressed some interested to provide such a
module and become package maintainer for it.
biojava-structure: Everything protein structure related. I would be
package maintainer.
biojava-blast: Blast parsing is a frequently requested functionality
and it would be good to have this code self-contained. A package
maintainer for this still will need to be nominated at a later stage.
Any suggestions for other modules?

Let me know what you think about this.

Andreas
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From andreas at sdsc.edu  Mon May 11 18:53:14 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 11 May 2009 15:53:14 -0700
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>
References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <59a41c430905111553n743dbcb3hbb21ec59294cb723@mail.gmail.com>

Hi Scooter,

I like the idea of supporting multiple threads and parallelizing code
where possible. Is there a reference implementation that you would
recommend for how progress listeners should be implemented?  I suppose
the neighbor joining code you mention below is not part of biojava...

Andreas


On Mon, May 11, 2009 at 6:50 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Andreas
>
> Another theme that should be considered is providing a multi-thread
> version of any module with long run time. This would have a couple
> elements. A progress listener interface should be standard where core
> code would update progress messages to listeners that can be used by
> external code to display feedback to the user. I did this with the
> Neighbor Joining code for tree construction and it provides needed
> feedback in a GUI. If not the user gets frustrated because they don't
> know the code they are about to execute may take 10 minutes or 8 hours
> to complete and they think the software is not working. The reverse is
> also true for canceling an operation where you want to have core code
> stop processing a long running loop. Once the code has completed then
> the listener interface for process complete is called allowing the next
> step in the external code to continue. The developer would have the
> choice to call the "process" method or run it in a thread and wait for
> the callback complete method to be called.
>
> This is the first step in the ability to have the core/long running
> processes take advantage of multiple threads to complete the
> computational task faster. Not all code can be parallelized easily but
> if the algorithm can take advantage of running in parallel then it
> should. This then opens up a couple of cloud computing frameworks that
> extend the multi-threaded concepts in Java across a cluster
> http://www.terracotta.org/. If we put an emphasis on having code that
> runs well in a thread we are one step closer to an architecture that can
> run in a cloud. The computational problems are only going to get bigger
> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> computational IO cycles are going to be cheap as long as the
> software/libraries can easily take advantage of it.
>
> Thanks
>
> Scooter
>
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> Prlic
> Sent: Monday, May 11, 2009 12:27 AM
> To: biojava-dev
> Subject: [Biojava-dev] Plans for next biojava release - modularization
>
> Hi biojava-devs,
>
> It is time to start working on the next biojava release. ?I ?would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release. ?Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
>
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
>
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
>
> Let me know what you think about this.
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From HWillis at scripps.edu  Mon May 11 20:34:11 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Mon, 11 May 2009 20:34:11 -0400
Subject: [Biojava-dev] Plans for next biojava release - modularization
References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com><061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905111553n743dbcb3hbb21ec59294cb723@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C84F@FLMAIL1.fl.ad.scripps.edu>


Andreas

This is what I put together for the tree code as the interface. In the loop code of the algorithm you simply call the appropriate progress message where it could be cleaned up to have one progress method and a float for percentage complete. Passing the instance of NJTree was required for this specific case because all the work was done when the NJTree class was instantiated. It really should be cleaned up so that it has a process method and is runnable in a thread if needed. The progress listener could be generic for all long running classes. I have wrapped the NJTree code in a TreeConstructor class which bridges the biojava framework and allows the NJTree code to be replaced by something that is compatible with the BioJava open source license if needed. I am still playing around with performance optimizations and need to see if Jalview would contribute the NJTree code to BioJava. If not, I would do my own implementation as the algorithm is not difficult.

I was also thinking that we could have Java code that provides functionality such as Blast by making a web service call to an external publicly supported service. Instead of parsing Blast results flat files you can make a call to an external service http://www.ebi.ac.uk/Tools/webservices/services/wublast via web services and get well structured results. 

Scooter 


package org.biojavax.phylo;

import org.biojavax.phylo.jalview.NJTree;

/**
 *
 * @author willishf
 */
public interface NJTreeProgressListener {
    public void progress(NJTree njtree,String state, int percentageComplete);
    public void progress(NJTree njtree,String state, int currentCount,int totalCount);
    public void complete(NJTree njtree);
    public void canceled(NJTree njtree);
}

**********************************************************************************************
This code could be abstracted out into a base class or simply added into a class that needs to 
notify external listeners
**********************************************************************************************
    Vector<NJTreeProgressListener> progessListenerVector = new Vector<NJTreeProgressListener>();

    public void addProgessListener(NJTreeProgressListener treeProgessListener) {
        if (treeProgessListener != null) {
            progessListenerVector.add(treeProgessListener);
        }
    }

    public void removeProgessListener(NJTreeProgressListener treeProgessListener) {
        if (treeProgessListener != null) {
            progessListenerVector.remove(treeProgessListener);
        }
    }

    public void broadcastComplete() {
        for (NJTreeProgressListener treeProgressListener : progessListenerVector) {
            treeProgressListener.complete(this);
        }
    }

    public void updateProgress(String state, int percentage) {
        for (NJTreeProgressListener treeProgressListener : progessListenerVector) {
            treeProgressListener.progress(this,state, percentage);
        }
    }

    public void updateProgress(String state, int currentCount, int totalCount) {
        for (NJTreeProgressListener treeProgressListener : progessListenerVector) {
            treeProgressListener.progress(this,state, currentCount, totalCount);
        }
    }

***************************************************************************************


/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */
package org.biojavax.phylo;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Vector;
import org.biojava.bio.BioException;
import org.biojavax.phylo.jalview.NJTreeNew;
import org.biojavax.phylo.jalview.TreeConstructionAlgorithm;
import org.biojavax.phylo.jalview.TreeType;

import org.biojava.bio.seq.*;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequenceIterator;
import org.biojavax.phylo.jalview.NJSequence;
import org.biojavax.phylo.jalview.NJTree;

/**
 *
 * @author willishf
 */
public class TreeConstructor extends Thread {

   
    NJTree njtree = null;
    NJSequence[] sequences = null;
    TreeType treeType;
    TreeConstructionAlgorithm treeConstructionAlgorithm;
    NJTreeProgressListener treeProgessListener;

    public TreeConstructor(SequenceIterator iter, TreeType _treeType, TreeConstructionAlgorithm _treeConstructionAlgorithm, NJTreeProgressListener _treeProgessListener) {
        treeType = _treeType;
        treeConstructionAlgorithm = _treeConstructionAlgorithm;
        treeProgessListener = _treeProgessListener;
        ArrayList<NJSequence> sequenceArray = new ArrayList<NJSequence>();
        while (iter.hasNext()) {
            try {
                Sequence seq = iter.nextSequence();
                NJSequence njsequence = new NJSequence(seq.getName(), seq.seqString());
                sequenceArray.add(njsequence);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        sequences = new NJSequence[sequenceArray.size()];
        sequenceArray.toArray(sequences);
    }

    public TreeConstructor(Vector<RichSequence> sequenceVector, TreeType _treeType, TreeConstructionAlgorithm _treeConstructionAlgorithm, NJTreeProgressListener _treeProgessListener) {
        treeType = _treeType;
        treeConstructionAlgorithm = _treeConstructionAlgorithm;
        treeProgessListener = _treeProgessListener;
        sequences = new NJSequence[sequenceVector.size()];
        int index = 0;
        for (RichSequence seq : sequenceVector) {

            NJSequence njsequence = new NJSequence(seq.getName(), seq.seqString());
            sequences[index] = njsequence;
            index++;
        }

    }

    public void cancel(){
        if(njtree != null)
            njtree.cancel();
    }

    public void process() throws Exception {
        njtree = new NJTree(sequences, treeType, treeConstructionAlgorithm, treeProgessListener);
    }

    @Override
    public void run() {
        try {
            process();
        } catch (Exception e) {
            e.printStackTrace();

        }
    }

    public String getNewickString() {
        if (njtree != null) {
            return njtree.toString();
        } 
        return "";
    }

    public static void main(String[] args) {
        if (args.length == 0) {
            args = new String[3];
            args[0] = "C:\\MutualInformation\\project\\hiv\\hiv-genes-genome.fasta";


        }
        try {
            //prepare a BufferedReader for file io
            BufferedReader br = new BufferedReader(new FileReader(args[0]));
            SimpleNamespace ns = new SimpleNamespace("biojava");

            // You can use any of the convenience methods found in the BioJava 1.6 API
            RichSequenceIterator rsi = RichSequence.IOTools.readFastaProtein(br, ns);

            long readTime = System.currentTimeMillis();
            TreeConstructor treeConstructor = new TreeConstructor(rsi, TreeType.NJ, TreeConstructionAlgorithm.PID, new ProgessListenerStub());
            treeConstructor.process();
            long treeTime = System.currentTimeMillis();
            String newick = treeConstructor.getNewickString();


            System.out.println("Tree time " + (treeTime - readTime));
            System.out.println(newick);

        } catch (FileNotFoundException ex) {
            //can't find file specified by args[0]
            ex.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }

    }
}


-----Original Message-----
From: andreas.prlic at gmail.com on behalf of Andreas Prlic
Sent: Mon 5/11/2009 6:53 PM
To: Scooter Willis
Cc: biojava-dev
Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
 
Hi Scooter,

I like the idea of supporting multiple threads and parallelizing code
where possible. Is there a reference implementation that you would
recommend for how progress listeners should be implemented?  I suppose
the neighbor joining code you mention below is not part of biojava...

Andreas


On Mon, May 11, 2009 at 6:50 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Andreas
>
> Another theme that should be considered is providing a multi-thread
> version of any module with long run time. This would have a couple
> elements. A progress listener interface should be standard where core
> code would update progress messages to listeners that can be used by
> external code to display feedback to the user. I did this with the
> Neighbor Joining code for tree construction and it provides needed
> feedback in a GUI. If not the user gets frustrated because they don't
> know the code they are about to execute may take 10 minutes or 8 hours
> to complete and they think the software is not working. The reverse is
> also true for canceling an operation where you want to have core code
> stop processing a long running loop. Once the code has completed then
> the listener interface for process complete is called allowing the next
> step in the external code to continue. The developer would have the
> choice to call the "process" method or run it in a thread and wait for
> the callback complete method to be called.
>
> This is the first step in the ability to have the core/long running
> processes take advantage of multiple threads to complete the
> computational task faster. Not all code can be parallelized easily but
> if the algorithm can take advantage of running in parallel then it
> should. This then opens up a couple of cloud computing frameworks that
> extend the multi-threaded concepts in Java across a cluster
> http://www.terracotta.org/. If we put an emphasis on having code that
> runs well in a thread we are one step closer to an architecture that can
> run in a cloud. The computational problems are only going to get bigger
> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> computational IO cycles are going to be cheap as long as the
> software/libraries can easily take advantage of it.
>
> Thanks
>
> Scooter
>
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> Prlic
> Sent: Monday, May 11, 2009 12:27 AM
> To: biojava-dev
> Subject: [Biojava-dev] Plans for next biojava release - modularization
>
> Hi biojava-devs,
>
> It is time to start working on the next biojava release. ?I ?would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release. ?Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
>
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
>
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
>
> Let me know what you think about this.
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From mark.schreiber at novartis.com  Tue May 12 01:26:33 2009
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 12 May 2009 13:26:33 +0800
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com>

Hi -

This was one thing we discussed previously with respect to biojava 3. 
Generally I support the idea because almost all computers are now 
multi-core and as you say cloud or utility computing is already a reality.

However, I tend to think that biojava should not control threading or 
concurrency. This should be done by the developer. This is because 
sometimes mutithreading can be fast on a slow computer but slow on a fast 
computer (due to the overhead in spawning threads) so programs need to be 
tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc 
don't like people attempting to control their own threads.  What BioJava 
should do is expose granular and thread-safe operations that can be 
threaded or form discrete tasks on a utility grid or complete in 
SessionBeans on an App server.  For example it would be better if BioJava 
had a single threaded method to calculate the GC of a single sequence 
rather than a multi-threaded method that calculates the GC of multiple 
sequences.  This would let the developer make a multithreaded version if 
desired or distribute multiple tasks based on the single threaded version 
to a compute cloud (and let the cloud manage all the tasks).

Possibly the best situation would be to have the single threaded fine 
grain operations that let developers or grid engines control threading and 
then higher level APIs that do it for you (or good cookbook examples that 
show you how to do it).  Another idea that was discussed was the use of 
properties files to allow people to set how many CPUs they wanted to make 
available to the JVM or name packages that can or cannot use threading.

Finally, there are lots of times when it is highly desirable to use Java 
beans because they play well with dozens of Java api's however beans don't 
work well with threads because they have public setter methods.  I would 
like to see a lot more bean use in a future BioJava because it would make 
life so much easier but a lot of care would need to be taken to make sure 
thread safety is preserved.  There are many patterns that can be used such 
as synchronization locks etc to make things thread safe so I think this 
can be achieved as long as we are disciplined and consider that all 
methods may be used in a multi-threaded application (even if we write the 
method as a single thread).  If there are code checkers that make 
suggestions on thread safety it would be great to have these as part of 
the standard build process.  Good documentation would go a long way as 
well.  Are there unit test patterns that can catch these problems as well? 
 Suggestions would be great.

Progress Listener patterns are good but it depends on the situation and 
might be better handled in high level APIs or left to the developer.  For 
example in your NJ code a progress listener would be good if someone fed 
1000 sequences into the method but not if they only put in 10. Also code 
running on an old machine might need a progress listener but the same 
problem on a new machine may complete almost instantly.  Probably a 
pluggable listener would be the way to go.  Also it might be possible to 
do this using the new JDK APIs that let you take a peek at the stack 
trace. Even if your NJ method didn't allow for a progress listener a 
developer could still make one by looking at the method calls in the 
stack. As long as your NJ method called other methods internally for each 
sequence (quite likely) it would be possible to observe the cycle of 
method calls from the stack.  This might make it possible to have a very 
general BioJava progress listener that can be told to count the number of 
times a method is called in the stack. The name of the method would be the 
argument.  If the application runs in a Java App server you can also do 
this very easily with a method Interceptor.

- Mark

biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:

> Andreas
> 
> Another theme that should be considered is providing a multi-thread
> version of any module with long run time. This would have a couple
> elements. A progress listener interface should be standard where core
> code would update progress messages to listeners that can be used by
> external code to display feedback to the user. I did this with the
> Neighbor Joining code for tree construction and it provides needed
> feedback in a GUI. If not the user gets frustrated because they don't
> know the code they are about to execute may take 10 minutes or 8 hours
> to complete and they think the software is not working. The reverse is
> also true for canceling an operation where you want to have core code
> stop processing a long running loop. Once the code has completed then
> the listener interface for process complete is called allowing the next
> step in the external code to continue. The developer would have the
> choice to call the "process" method or run it in a thread and wait for
> the callback complete method to be called. 
> 
> This is the first step in the ability to have the core/long running
> processes take advantage of multiple threads to complete the
> computational task faster. Not all code can be parallelized easily but
> if the algorithm can take advantage of running in parallel then it
> should. This then opens up a couple of cloud computing frameworks that
> extend the multi-threaded concepts in Java across a cluster
> http://www.terracotta.org/. If we put an emphasis on having code that
> runs well in a thread we are one step closer to an architecture that can
> run in a cloud. The computational problems are only going to get bigger
> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> computational IO cycles are going to be cheap as long as the
> software/libraries can easily take advantage of it.
> 
> Thanks
> 
> Scooter
> 
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> Prlic
> Sent: Monday, May 11, 2009 12:27 AM
> To: biojava-dev
> Subject: [Biojava-dev] Plans for next biojava release - modularization
> 
> Hi biojava-devs,
> 
> It is time to start working on the next biojava release.  I  would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release.  Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
> 
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
> 
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
> 
> Let me know what you think about this.
> 
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.

From ayates at ebi.ac.uk  Tue May 12 04:27:52 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 12 May 2009 09:27:52 +0100
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com>
References: <OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com>
Message-ID: <4A093308.4030409@ebi.ac.uk>

I agree with Mark.

Later versions of the Java environment will make concurrent programming
easier not to mention languages already available on the VM (Scala &
Clojure) that make it very easy indeed. Our goal in biojava must be to
write code which will behave well in one of these environments.

I don't want us to fall into the trap of earlier biojava where things
like own implementations of database connection pooling data sources
(sorry I don't mean to pick on any one part of the code but it
highlights very well what we should avoid). We're
bioinformaticians/engineers; lets do what we do best and work well
within our chosen field. Let other people like Doug Lea deal with the
pain that is concurrent programming & the alike :)

Andy

mark.schreiber at novartis.com wrote:
> Hi -
> 
> This was one thing we discussed previously with respect to biojava 3. 
> Generally I support the idea because almost all computers are now 
> multi-core and as you say cloud or utility computing is already a reality.
> 
> However, I tend to think that biojava should not control threading or 
> concurrency. This should be done by the developer. This is because 
> sometimes mutithreading can be fast on a slow computer but slow on a fast 
> computer (due to the overhead in spawning threads) so programs need to be 
> tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc 
> don't like people attempting to control their own threads.  What BioJava 
> should do is expose granular and thread-safe operations that can be 
> threaded or form discrete tasks on a utility grid or complete in 
> SessionBeans on an App server.  For example it would be better if BioJava 
> had a single threaded method to calculate the GC of a single sequence 
> rather than a multi-threaded method that calculates the GC of multiple 
> sequences.  This would let the developer make a multithreaded version if 
> desired or distribute multiple tasks based on the single threaded version 
> to a compute cloud (and let the cloud manage all the tasks).
> 
> Possibly the best situation would be to have the single threaded fine 
> grain operations that let developers or grid engines control threading and 
> then higher level APIs that do it for you (or good cookbook examples that 
> show you how to do it).  Another idea that was discussed was the use of 
> properties files to allow people to set how many CPUs they wanted to make 
> available to the JVM or name packages that can or cannot use threading.
> 
> Finally, there are lots of times when it is highly desirable to use Java 
> beans because they play well with dozens of Java api's however beans don't 
> work well with threads because they have public setter methods.  I would 
> like to see a lot more bean use in a future BioJava because it would make 
> life so much easier but a lot of care would need to be taken to make sure 
> thread safety is preserved.  There are many patterns that can be used such 
> as synchronization locks etc to make things thread safe so I think this 
> can be achieved as long as we are disciplined and consider that all 
> methods may be used in a multi-threaded application (even if we write the 
> method as a single thread).  If there are code checkers that make 
> suggestions on thread safety it would be great to have these as part of 
> the standard build process.  Good documentation would go a long way as 
> well.  Are there unit test patterns that can catch these problems as well? 
>  Suggestions would be great.
> 
> Progress Listener patterns are good but it depends on the situation and 
> might be better handled in high level APIs or left to the developer.  For 
> example in your NJ code a progress listener would be good if someone fed 
> 1000 sequences into the method but not if they only put in 10. Also code 
> running on an old machine might need a progress listener but the same 
> problem on a new machine may complete almost instantly.  Probably a 
> pluggable listener would be the way to go.  Also it might be possible to 
> do this using the new JDK APIs that let you take a peek at the stack 
> trace. Even if your NJ method didn't allow for a progress listener a 
> developer could still make one by looking at the method calls in the 
> stack. As long as your NJ method called other methods internally for each 
> sequence (quite likely) it would be possible to observe the cycle of 
> method calls from the stack.  This might make it possible to have a very 
> general BioJava progress listener that can be told to count the number of 
> times a method is called in the stack. The name of the method would be the 
> argument.  If the application runs in a Java App server you can also do 
> this very easily with a method Interceptor.
> 
> - Mark
> 
> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
> 
>> Andreas
>>
>> Another theme that should be considered is providing a multi-thread
>> version of any module with long run time. This would have a couple
>> elements. A progress listener interface should be standard where core
>> code would update progress messages to listeners that can be used by
>> external code to display feedback to the user. I did this with the
>> Neighbor Joining code for tree construction and it provides needed
>> feedback in a GUI. If not the user gets frustrated because they don't
>> know the code they are about to execute may take 10 minutes or 8 hours
>> to complete and they think the software is not working. The reverse is
>> also true for canceling an operation where you want to have core code
>> stop processing a long running loop. Once the code has completed then
>> the listener interface for process complete is called allowing the next
>> step in the external code to continue. The developer would have the
>> choice to call the "process" method or run it in a thread and wait for
>> the callback complete method to be called. 
>>
>> This is the first step in the ability to have the core/long running
>> processes take advantage of multiple threads to complete the
>> computational task faster. Not all code can be parallelized easily but
>> if the algorithm can take advantage of running in parallel then it
>> should. This then opens up a couple of cloud computing frameworks that
>> extend the multi-threaded concepts in Java across a cluster
>> http://www.terracotta.org/. If we put an emphasis on having code that
>> runs well in a thread we are one step closer to an architecture that can
>> run in a cloud. The computational problems are only going to get bigger
>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>> computational IO cycles are going to be cheap as long as the
>> software/libraries can easily take advantage of it.
>>
>> Thanks
>>
>> Scooter
>>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>> Prlic
>> Sent: Monday, May 11, 2009 12:27 AM
>> To: biojava-dev
>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release.  I  would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release.  Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> _________________________
> 
> CONFIDENTIALITY NOTICE
> 
> The information contained in this e-mail message is intended only for the 
> exclusive use of the individual or entity named above and may contain 
> information that is privileged, confidential or exempt from disclosure 
> under applicable law. If the reader of this message is not the intended 
> recipient, or the employee or agent responsible for delivery of the 
> message to the intended recipient, you are hereby notified that any 
> dissemination, distribution or copying of this communication is strictly 
> prohibited. If you have received this communication in error, please 
> notify the sender immediately by e-mail and delete the material from any 
> computer.  Thank you.
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

From holland at eaglegenomics.com  Tue May 12 04:26:26 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 12 May 2009 09:26:26 +0100
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>
References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>
Message-ID: <1242116786.7101.7.camel@buzzybee>

The BJ3 code contains only as much code as is needed to represent
sequences and to parse/write simple FASTA. It should be viewed as a
concept. In particular the file parsing mechanism is quite flexible (if
a little complex) but easily wrapped with simple one-liner utility
methods to provide end-users with easier-to-use APIs.

Sequence representation in BJ3 is done via the Collections API. It's set
up in such a way that you can write something yourself that implements
the List API and behaves like a List but internally uses a more compact
or even offline storage mechanism to represent the sequence. This allows
you to reuse sequences wherever Lists can be used, e.g. in Iterators or
foreach-loops.

Everything written so far has been documented here:

  http://biojava.org/wiki/BioJava3:HowTo

cheers,
Richard


On Sun, 2009-05-10 at 21:26 -0700, Andreas Prlic wrote:
> Hi biojava-devs,
> 
> It is time to start working on the next biojava release.  I  would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release.  Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
> 
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
> 
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
> 
> Let me know what you think about this.
> 
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From HWillis at scripps.edu  Tue May 12 09:34:51 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Tue, 12 May 2009 09:34:51 -0400
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com>
References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>
	<OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu>

Mark

 
It is a challenge on knowing where to draw the line. Allowing both
options is a reasonable approach. The implementation of the algorithm is
key to allow it to be multi-threaded or being able to run in parallel.
One approach is to provide a standard interface such as process() would
wait for the result/return value and run in the parent thread. To run
the algorithm in a thread you can have a startProcess() where you can
add yourself as a progress listener and when complete() method is called
you can call getResults(). You can then also have the corresponding
stopProcess() which would set an internal value to cause all threads to
quit.  Lots of ways to tackle the problem the key is to start talking
about it and at minimum take advantage of multiple-cores where the
external code can set the number of cores to use. You can get a dual
quad core machine these days for < $1000 but most software
implementations are not designed to take advantage of it. 

 
The real question is what exists today in the BioJava API that is
considered long running in normal use case and thus is a candidate to be
run in parallel. It may not be an issue in existing BioJava code. When I
first started using BioJava I went looking for BLAST code only to find a
BLAST parser. I wanted to do a Multiple Sequence Alignment and turns out
that Biojava code calls CLUSTALW as an external processor under the
covers.  I also needed code to construct trees from an MSA and found the
summer of code project that was only focused on representing the tree. 

 
It would be nice to have a BLAST implementation in Java optimized to run
on a cluster but who has time to rewrite BLAST in Java when you can do
BLAST search via the web and focus on parsing the results. BioJava needs
a BLAST API that makes a web services call to an external service and
gets returns structured results in core BioJava structures. Probably not
difficult to do a Java version of CLUSTALW but again we can push the
work out to http://www.ebi.ac.uk/Tools/webservices/services/clustalw and
get the results back returned in BioJava structures. 

 
I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web
service -> BioJava code. I haven't done the research but it seems that
http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work
to expose common biology  computational services. If multiple external
services are offering BLAST via web services where each picked a
different implementation then BioJava could provide abstraction to
different services.

 
Thanks


Scooter

 
From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] 
Sent: Tuesday, May 12, 2009 1:27 AM
To: Scooter Willis
Cc: Andreas Prlic; biojava-dev
Subject: Re: [Biojava-dev] Plans for next biojava release -
modularization

 
Hi - 

This was one thing we discussed previously with respect to biojava 3.
Generally I support the idea because almost all computers are now
multi-core and as you say cloud or utility computing is already a
reality. 

However, I tend to think that biojava should not control threading or
concurrency. This should be done by the developer. This is because
sometimes mutithreading can be fast on a slow computer but slow on a
fast computer (due to the overhead in spawning threads) so programs need
to be tunable. Also Java app servers and things like Sun Grid Engine,
EC2 etc don't like people attempting to control their own threads.  What
BioJava should do is expose granular and thread-safe operations that can
be threaded or form discrete tasks on a utility grid or complete in
SessionBeans on an App server.  For example it would be better if
BioJava had a single threaded method to calculate the GC of a single
sequence rather than a multi-threaded method that calculates the GC of
multiple sequences.  This would let the developer make a multithreaded
version if desired or distribute multiple tasks based on the single
threaded version to a compute cloud (and let the cloud manage all the
tasks). 

Possibly the best situation would be to have the single threaded fine
grain operations that let developers or grid engines control threading
and then higher level APIs that do it for you (or good cookbook examples
that show you how to do it).  Another idea that was discussed was the
use of properties files to allow people to set how many CPUs they wanted
to make available to the JVM or name packages that can or cannot use
threading. 

Finally, there are lots of times when it is highly desirable to use Java
beans because they play well with dozens of Java api's however beans
don't work well with threads because they have public setter methods.  I
would like to see a lot more bean use in a future BioJava because it
would make life so much easier but a lot of care would need to be taken
to make sure thread safety is preserved.  There are many patterns that
can be used such as synchronization locks etc to make things thread safe
so I think this can be achieved as long as we are disciplined and
consider that all methods may be used in a multi-threaded application
(even if we write the method as a single thread).  If there are code
checkers that make suggestions on thread safety it would be great to
have these as part of the standard build process.  Good documentation
would go a long way as well.  Are there unit test patterns that can
catch these problems as well?  Suggestions would be great. 

Progress Listener patterns are good but it depends on the situation and
might be better handled in high level APIs or left to the developer.
For example in your NJ code a progress listener would be good if someone
fed 1000 sequences into the method but not if they only put in 10. Also
code running on an old machine might need a progress listener but the
same problem on a new machine may complete almost instantly.  Probably a
pluggable listener would be the way to go.  Also it might be possible to
do this using the new JDK APIs that let you take a peek at the stack
trace. Even if your NJ method didn't allow for a progress listener a
developer could still make one by looking at the method calls in the
stack. As long as your NJ method called other methods internally for
each sequence (quite likely) it would be possible to observe the cycle
of method calls from the stack.  This might make it possible to have a
very general BioJava progress listener that can be told to count the
number of times a method is called in the stack. The name of the method
would be the argument.  If the application runs in a Java App server you
can also do this very easily with a method Interceptor. 

- Mark 

biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:

> Andreas
> 
> Another theme that should be considered is providing a multi-thread
> version of any module with long run time. This would have a couple
> elements. A progress listener interface should be standard where core
> code would update progress messages to listeners that can be used by
> external code to display feedback to the user. I did this with the
> Neighbor Joining code for tree construction and it provides needed
> feedback in a GUI. If not the user gets frustrated because they don't
> know the code they are about to execute may take 10 minutes or 8 hours
> to complete and they think the software is not working. The reverse is
> also true for canceling an operation where you want to have core code
> stop processing a long running loop. Once the code has completed then
> the listener interface for process complete is called allowing the
next
> step in the external code to continue. The developer would have the
> choice to call the "process" method or run it in a thread and wait for
> the callback complete method to be called. 
> 
> This is the first step in the ability to have the core/long running
> processes take advantage of multiple threads to complete the
> computational task faster. Not all code can be parallelized easily but
> if the algorithm can take advantage of running in parallel then it
> should. This then opens up a couple of cloud computing frameworks that
> extend the multi-threaded concepts in Java across a cluster
> http://www.terracotta.org/. If we put an emphasis on having code that
> runs well in a thread we are one step closer to an architecture that
can
> run in a cloud. The computational problems are only going to get
bigger
> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> computational IO cycles are going to be cheap as long as the
> software/libraries can easily take advantage of it.
> 
> Thanks
> 
> Scooter
> 
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> Prlic
> Sent: Monday, May 11, 2009 12:27 AM
> To: biojava-dev
> Subject: [Biojava-dev] Plans for next biojava release - modularization
> 
> Hi biojava-devs,
> 
> It is time to start working on the next biojava release.  I  would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release.  Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
> 
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
> 
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
> 
> Let me know what you think about this.
> 
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for
the exclusive use of the individual or entity named above and may
contain information that is privileged, confidential or exempt from
disclosure under applicable law. If the reader of this message is not
the intended recipient, or the employee or agent responsible for
delivery of the message to the intended recipient, you are hereby
notified that any dissemination, distribution or copying of this
communication is strictly prohibited. If you have received this
communication in error, please notify the sender immediately by e-mail
and delete the material from any computer.  Thank you.


From andreas at sdsc.edu  Tue May 12 19:52:51 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 12 May 2009 16:52:51 -0700
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <1242116786.7101.7.camel@buzzybee>
References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>
	<1242116786.7101.7.camel@buzzybee>
Message-ID: <59a41c430905121652s7c548985xd9261734b42a4182@mail.gmail.com>

Hi Richard,

Do you think the BJ3 code could form the beginning of a new
biojava-sequence module and can become part of the next release?

Andreas

On Tue, May 12, 2009 at 1:26 AM, Richard Holland
<holland at eaglegenomics.com> wrote:
> The BJ3 code contains only as much code as is needed to represent
> sequences and to parse/write simple FASTA. It should be viewed as a
> concept. In particular the file parsing mechanism is quite flexible (if
> a little complex) but easily wrapped with simple one-liner utility
> methods to provide end-users with easier-to-use APIs.
>
> Sequence representation in BJ3 is done via the Collections API. It's set
> up in such a way that you can write something yourself that implements
> the List API and behaves like a List but internally uses a more compact
> or even offline storage mechanism to represent the sequence. This allows
> you to reuse sequences wherever Lists can be used, e.g. in Iterators or
> foreach-loops.
>
> Everything written so far has been documented here:
>
> ?http://biojava.org/wiki/BioJava3:HowTo
>
> cheers,
> Richard
>
>
>
> On Sun, 2009-05-10 at 21:26 -0700, Andreas Prlic wrote:
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release. ?I ?would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release. ?Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
>


From andreas at sdsc.edu  Tue May 12 19:59:11 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 12 May 2009 16:59:11 -0700
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu>
References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>
	<OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com>
	<061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com>

Hi Scooter,

about your suggestion for the blast webservice client code: In
principle I like the idea and we have had questions on the mailing
list regarding this in the past. Only thing is I think there is
already some client code in java available:
http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp
but I am not sure how good that Java client library is....

Besides this, there is the need for work on our blast parser library
and if you are interested in working on that you are welcome. As I
mentioned, I think this should become its own module, due to the
popularity of that code.

Andreas


On Tue, May 12, 2009 at 6:34 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Mark
>
>
>
> It is a challenge on knowing where to draw the line. Allowing both options
> is a reasonable approach. The implementation of the algorithm is key to
> allow it to be multi-threaded or being able to run in parallel. One approach
> is to provide a standard interface such as process() would wait for the
> result/return value and run in the parent thread. To run the algorithm in a
> thread you can have a startProcess() where you can add yourself as a
> progress listener and when complete() method is called you can call
> getResults(). You can then also have the corresponding stopProcess() which
> would set an internal value to cause all threads to quit. ?Lots of ways to
> tackle the problem the key is to start talking about it and at minimum take
> advantage of multiple-cores where the external code can set the number of
> cores to use. You can get a dual quad core machine these days for < $1000
> but most software implementations are not designed to take advantage of it.
>
>
>
> The real question is what exists today in the BioJava API that is considered
> long running in normal use case and thus is a candidate to be run in
> parallel. It may not be an issue in existing BioJava code. When I first
> started using BioJava I went looking for BLAST code only to find a BLAST
> parser. I wanted to do a Multiple Sequence Alignment and turns out that
> Biojava code calls CLUSTALW as an external processor under the covers. ?I
> also needed code to construct trees from an MSA and found the summer of code
> project that was only focused on representing the tree.
>
>
>
> It would be nice to have a BLAST implementation in Java optimized to run on
> a cluster but who has time to rewrite BLAST in Java when you can do BLAST
> search via the web and focus on parsing the results. BioJava needs a BLAST
> API that makes a web services call to an external service and gets returns
> structured results in core BioJava structures. Probably not difficult to do
> a Java version of CLUSTALW but again we can push the work out to
> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results
> back returned in BioJava structures.
>
>
>
> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web
> service -> BioJava code. I haven?t done the research but it seems that
> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to
> expose common biology ?computational services. If multiple external services
> are offering BLAST via web services where each picked a different
> implementation then BioJava could provide abstraction to different services.
>
>
>
> Thanks
>
> Scooter
>
>
>
> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com]
> Sent: Tuesday, May 12, 2009 1:27 AM
> To: Scooter Willis
> Cc: Andreas Prlic; biojava-dev
> Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
>
>
>
> Hi -
>
> This was one thing we discussed previously with respect to biojava 3.
> ?Generally I support the idea because almost all computers are now
> multi-core and as you say cloud or utility computing is already a reality.
>
> However, I tend to think that biojava should not control threading or
> concurrency. This should be done by the developer. This is because sometimes
> mutithreading can be fast on a slow computer but slow on a fast computer
> (due to the overhead in spawning threads) so programs need to be tunable.
> Also Java app servers and things like Sun Grid Engine, EC2 etc don't like
> people attempting to control their own threads. ?What BioJava should do is
> expose granular and thread-safe operations that can be threaded or form
> discrete tasks on a utility grid or complete in SessionBeans on an App
> server. ?For example it would be better if BioJava had a single threaded
> method to calculate the GC of a single sequence rather than a multi-threaded
> method that calculates the GC of multiple sequences. ?This would let the
> developer make a multithreaded version if desired or distribute multiple
> tasks based on the single threaded version to a compute cloud (and let the
> cloud manage all the tasks).
>
> Possibly the best situation would be to have the single threaded fine grain
> operations that let developers or grid engines control threading and then
> higher level APIs that do it for you (or good cookbook examples that show
> you how to do it). ?Another idea that was discussed was the use of
> properties files to allow people to set how many CPUs they wanted to make
> available to the JVM or name packages that can or cannot use threading.
>
> Finally, there are lots of times when it is highly desirable to use Java
> beans because they play well with dozens of Java api's however beans don't
> work well with threads because they have public setter methods. ?I would
> like to see a lot more bean use in a future BioJava because it would make
> life so much easier but a lot of care would need to be taken to make sure
> thread safety is preserved. ?There are many patterns that can be used such
> as synchronization locks etc to make things thread safe so I think this can
> be achieved as long as we are disciplined and consider that all methods may
> be used in a multi-threaded application (even if we write the method as a
> single thread). ?If there are code checkers that make suggestions on thread
> safety it would be great to have these as part of the standard build
> process. ?Good documentation would go a long way as well. ?Are there unit
> test patterns that can catch these problems as well? ?Suggestions would be
> great.
>
> Progress Listener patterns are good but it depends on the situation and
> might be better handled in high level APIs or left to the developer. ?For
> example in your NJ code a progress listener would be good if someone fed
> 1000 sequences into the method but not if they only put in 10. Also code
> running on an old machine might need a progress listener but the same
> problem on a new machine may complete almost instantly. ?Probably a
> pluggable listener would be the way to go. ?Also it might be possible to do
> this using the new JDK APIs that let you take a peek at the stack trace.
> Even if your NJ method didn't allow for a progress listener a developer
> could still make one by looking at the method calls in the stack. As long as
> your NJ method called other methods internally for each sequence (quite
> likely) it would be possible to observe the cycle of method calls from the
> stack. ?This might make it possible to have a very general BioJava progress
> listener that can be told to count the number of times a method is called in
> the stack. The name of the method would be the argument. ?If the application
> runs in a Java App server you can also do this very easily with a method
> Interceptor.
>
> - Mark
>
> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
>
>> Andreas
>>
>> Another theme that should be considered is providing a multi-thread
>> version of any module with long run time. This would have a couple
>> elements. A progress listener interface should be standard where core
>> code would update progress messages to listeners that can be used by
>> external code to display feedback to the user. I did this with the
>> Neighbor Joining code for tree construction and it provides needed
>> feedback in a GUI. If not the user gets frustrated because they don't
>> know the code they are about to execute may take 10 minutes or 8 hours
>> to complete and they think the software is not working. The reverse is
>> also true for canceling an operation where you want to have core code
>> stop processing a long running loop. Once the code has completed then
>> the listener interface for process complete is called allowing the next
>> step in the external code to continue. The developer would have the
>> choice to call the "process" method or run it in a thread and wait for
>> the callback complete method to be called.
>>
>> This is the first step in the ability to have the core/long running
>> processes take advantage of multiple threads to complete the
>> computational task faster. Not all code can be parallelized easily but
>> if the algorithm can take advantage of running in parallel then it
>> should. This then opens up a couple of cloud computing frameworks that
>> extend the multi-threaded concepts in Java across a cluster
>> http://www.terracotta.org/. If we put an emphasis on having code that
>> runs well in a thread we are one step closer to an architecture that can
>> run in a cloud. The computational problems are only going to get bigger
>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>> computational IO cycles are going to be cheap as long as the
>> software/libraries can easily take advantage of it.
>>
>> Thanks
>>
>> Scooter
>>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>> Prlic
>> Sent: Monday, May 11, 2009 12:27 AM
>> To: biojava-dev
>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release. ?I ?would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release. ?Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> _________________________
>
> CONFIDENTIALITY NOTICE
>
> The information contained in this e-mail message is intended only for the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure under
> applicable law. If the reader of this message is not the intended recipient,
> or the employee or agent responsible for delivery of the message to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If you
> have received this communication in error, please notify the sender
> immediately by e-mail and delete the material from any computer. ?Thank you.


From HWillis at scripps.edu  Tue May 12 20:13:45 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Tue, 12 May 2009 20:13:45 -0400
Subject: [Biojava-dev] Plans for next biojava release - modularization
References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu><OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com><061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C855@FLMAIL1.fl.ad.scripps.edu>

Andreas

The goal for BioJava could be to provide a wrapper for the http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp java code so that inputs/outputs are BioJava. I think they are using Axis for the client web services code. If BioJava 3 is going to be Java 6 minimum then it is easier to use the Java 6 SOAP processing capabilities by pointing to the WSDL code and generating the Java code for the client side. This cuts down on the additional external 3rd parties that are required.

I try to stay out of the legacy file parsing business whenever possible. 

Scooter 

-----Original Message-----
From: andreas.prlic at gmail.com on behalf of Andreas Prlic
Sent: Tue 5/12/2009 7:59 PM
To: Scooter Willis
Cc: biojava-dev
Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
 
Hi Scooter,

about your suggestion for the blast webservice client code: In
principle I like the idea and we have had questions on the mailing
list regarding this in the past. Only thing is I think there is
already some client code in java available:
http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp
but I am not sure how good that Java client library is....

Besides this, there is the need for work on our blast parser library
and if you are interested in working on that you are welcome. As I
mentioned, I think this should become its own module, due to the
popularity of that code.

Andreas


On Tue, May 12, 2009 at 6:34 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Mark
>
>
>
> It is a challenge on knowing where to draw the line. Allowing both options
> is a reasonable approach. The implementation of the algorithm is key to
> allow it to be multi-threaded or being able to run in parallel. One approach
> is to provide a standard interface such as process() would wait for the
> result/return value and run in the parent thread. To run the algorithm in a
> thread you can have a startProcess() where you can add yourself as a
> progress listener and when complete() method is called you can call
> getResults(). You can then also have the corresponding stopProcess() which
> would set an internal value to cause all threads to quit. ?Lots of ways to
> tackle the problem the key is to start talking about it and at minimum take
> advantage of multiple-cores where the external code can set the number of
> cores to use. You can get a dual quad core machine these days for < $1000
> but most software implementations are not designed to take advantage of it.
>
>
>
> The real question is what exists today in the BioJava API that is considered
> long running in normal use case and thus is a candidate to be run in
> parallel. It may not be an issue in existing BioJava code. When I first
> started using BioJava I went looking for BLAST code only to find a BLAST
> parser. I wanted to do a Multiple Sequence Alignment and turns out that
> Biojava code calls CLUSTALW as an external processor under the covers. ?I
> also needed code to construct trees from an MSA and found the summer of code
> project that was only focused on representing the tree.
>
>
>
> It would be nice to have a BLAST implementation in Java optimized to run on
> a cluster but who has time to rewrite BLAST in Java when you can do BLAST
> search via the web and focus on parsing the results. BioJava needs a BLAST
> API that makes a web services call to an external service and gets returns
> structured results in core BioJava structures. Probably not difficult to do
> a Java version of CLUSTALW but again we can push the work out to
> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results
> back returned in BioJava structures.
>
>
>
> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web
> service -> BioJava code. I haven't done the research but it seems that
> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to
> expose common biology ?computational services. If multiple external services
> are offering BLAST via web services where each picked a different
> implementation then BioJava could provide abstraction to different services.
>
>
>
> Thanks
>
> Scooter
>
>
>
> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com]
> Sent: Tuesday, May 12, 2009 1:27 AM
> To: Scooter Willis
> Cc: Andreas Prlic; biojava-dev
> Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
>
>
>
> Hi -
>
> This was one thing we discussed previously with respect to biojava 3.
> ?Generally I support the idea because almost all computers are now
> multi-core and as you say cloud or utility computing is already a reality.
>
> However, I tend to think that biojava should not control threading or
> concurrency. This should be done by the developer. This is because sometimes
> mutithreading can be fast on a slow computer but slow on a fast computer
> (due to the overhead in spawning threads) so programs need to be tunable.
> Also Java app servers and things like Sun Grid Engine, EC2 etc don't like
> people attempting to control their own threads. ?What BioJava should do is
> expose granular and thread-safe operations that can be threaded or form
> discrete tasks on a utility grid or complete in SessionBeans on an App
> server. ?For example it would be better if BioJava had a single threaded
> method to calculate the GC of a single sequence rather than a multi-threaded
> method that calculates the GC of multiple sequences. ?This would let the
> developer make a multithreaded version if desired or distribute multiple
> tasks based on the single threaded version to a compute cloud (and let the
> cloud manage all the tasks).
>
> Possibly the best situation would be to have the single threaded fine grain
> operations that let developers or grid engines control threading and then
> higher level APIs that do it for you (or good cookbook examples that show
> you how to do it). ?Another idea that was discussed was the use of
> properties files to allow people to set how many CPUs they wanted to make
> available to the JVM or name packages that can or cannot use threading.
>
> Finally, there are lots of times when it is highly desirable to use Java
> beans because they play well with dozens of Java api's however beans don't
> work well with threads because they have public setter methods. ?I would
> like to see a lot more bean use in a future BioJava because it would make
> life so much easier but a lot of care would need to be taken to make sure
> thread safety is preserved. ?There are many patterns that can be used such
> as synchronization locks etc to make things thread safe so I think this can
> be achieved as long as we are disciplined and consider that all methods may
> be used in a multi-threaded application (even if we write the method as a
> single thread). ?If there are code checkers that make suggestions on thread
> safety it would be great to have these as part of the standard build
> process. ?Good documentation would go a long way as well. ?Are there unit
> test patterns that can catch these problems as well? ?Suggestions would be
> great.
>
> Progress Listener patterns are good but it depends on the situation and
> might be better handled in high level APIs or left to the developer. ?For
> example in your NJ code a progress listener would be good if someone fed
> 1000 sequences into the method but not if they only put in 10. Also code
> running on an old machine might need a progress listener but the same
> problem on a new machine may complete almost instantly. ?Probably a
> pluggable listener would be the way to go. ?Also it might be possible to do
> this using the new JDK APIs that let you take a peek at the stack trace.
> Even if your NJ method didn't allow for a progress listener a developer
> could still make one by looking at the method calls in the stack. As long as
> your NJ method called other methods internally for each sequence (quite
> likely) it would be possible to observe the cycle of method calls from the
> stack. ?This might make it possible to have a very general BioJava progress
> listener that can be told to count the number of times a method is called in
> the stack. The name of the method would be the argument. ?If the application
> runs in a Java App server you can also do this very easily with a method
> Interceptor.
>
> - Mark
>
> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
>
>> Andreas
>>
>> Another theme that should be considered is providing a multi-thread
>> version of any module with long run time. This would have a couple
>> elements. A progress listener interface should be standard where core
>> code would update progress messages to listeners that can be used by
>> external code to display feedback to the user. I did this with the
>> Neighbor Joining code for tree construction and it provides needed
>> feedback in a GUI. If not the user gets frustrated because they don't
>> know the code they are about to execute may take 10 minutes or 8 hours
>> to complete and they think the software is not working. The reverse is
>> also true for canceling an operation where you want to have core code
>> stop processing a long running loop. Once the code has completed then
>> the listener interface for process complete is called allowing the next
>> step in the external code to continue. The developer would have the
>> choice to call the "process" method or run it in a thread and wait for
>> the callback complete method to be called.
>>
>> This is the first step in the ability to have the core/long running
>> processes take advantage of multiple threads to complete the
>> computational task faster. Not all code can be parallelized easily but
>> if the algorithm can take advantage of running in parallel then it
>> should. This then opens up a couple of cloud computing frameworks that
>> extend the multi-threaded concepts in Java across a cluster
>> http://www.terracotta.org/. If we put an emphasis on having code that
>> runs well in a thread we are one step closer to an architecture that can
>> run in a cloud. The computational problems are only going to get bigger
>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>> computational IO cycles are going to be cheap as long as the
>> software/libraries can easily take advantage of it.
>>
>> Thanks
>>
>> Scooter
>>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>> Prlic
>> Sent: Monday, May 11, 2009 12:27 AM
>> To: biojava-dev
>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release. ?I ?would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release. ?Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> _________________________
>
> CONFIDENTIALITY NOTICE
>
> The information contained in this e-mail message is intended only for the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure under
> applicable law. If the reader of this message is not the intended recipient,
> or the employee or agent responsible for delivery of the message to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If you
> have received this communication in error, please notify the sender
> immediately by e-mail and delete the material from any computer. ?Thank you.


From mark.schreiber at novartis.com  Tue May 12 20:09:31 2009
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 13 May 2009 08:09:31 +0800
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com>
Message-ID: <OF8495A026.AC43734D-ON482575B5.000057FD-482575B5.0000DF4C@ah.novartis.com>

A while back I gave Richard some code that uses JAXB to objectify (and 
deobjectify) BLAST XML output. This might be useful for parsing BLAST 
results from the webservices which normally use BLAST XML. I could 
probably dig it up again if needed (it was autogenerated anyway).

It would probably be a good object model for BLAST output if people want 
to parse other types of BLAST output (such as flatfile, but who would want 
to do that!).  The BLAST XML seems to accommodate strange flavours of 
BLAST such as PSI-BLAST etc and also has been much more stable than the 
default flat file output.

- Mark


Andreas Prlic <andreas at sdsc.edu> 
Sent by: biojava-dev-bounces at lists.open-bio.org
05/13/2009 08:02 AM

To
Scooter Willis <HWillis at scripps.edu>
cc
biojava-dev <biojava-dev at lists.open-bio.org>
Subject
Re: [Biojava-dev] Plans for next biojava release - modularization


Hi Scooter,

about your suggestion for the blast webservice client code: In
principle I like the idea and we have had questions on the mailing
list regarding this in the past. Only thing is I think there is
already some client code in java available:
http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp
but I am not sure how good that Java client library is....

Besides this, there is the need for work on our blast parser library
and if you are interested in working on that you are welcome. As I
mentioned, I think this should become its own module, due to the
popularity of that code.

Andreas


On Tue, May 12, 2009 at 6:34 AM, Scooter Willis <HWillis at scripps.edu> 
wrote:
> Mark
>
>
>
> It is a challenge on knowing where to draw the line. Allowing both 
options
> is a reasonable approach. The implementation of the algorithm is key to
> allow it to be multi-threaded or being able to run in parallel. One 
approach
> is to provide a standard interface such as process() would wait for the
> result/return value and run in the parent thread. To run the algorithm 
in a
> thread you can have a startProcess() where you can add yourself as a
> progress listener and when complete() method is called you can call
> getResults(). You can then also have the corresponding stopProcess() 
which
> would set an internal value to cause all threads to quit.  Lots of ways 
to
> tackle the problem the key is to start talking about it and at minimum 
take
> advantage of multiple-cores where the external code can set the number 
of
> cores to use. You can get a dual quad core machine these days for < 
$1000
> but most software implementations are not designed to take advantage of 
it.
>
>
>
> The real question is what exists today in the BioJava API that is 
considered
> long running in normal use case and thus is a candidate to be run in
> parallel. It may not be an issue in existing BioJava code. When I first
> started using BioJava I went looking for BLAST code only to find a BLAST
> parser. I wanted to do a Multiple Sequence Alignment and turns out that
> Biojava code calls CLUSTALW as an external processor under the covers. 
 I
> also needed code to construct trees from an MSA and found the summer of 
code
> project that was only focused on representing the tree.
>
>
>
> It would be nice to have a BLAST implementation in Java optimized to run 
on
> a cluster but who has time to rewrite BLAST in Java when you can do 
BLAST
> search via the web and focus on parsing the results. BioJava needs a 
BLAST
> API that makes a web services call to an external service and gets 
returns
> structured results in core BioJava structures. Probably not difficult to 
do
> a Java version of CLUSTALW but again we can push the work out to
> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the 
results
> back returned in BioJava structures.
>
>
>
> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web
> service -> BioJava code. I haven?t done the research but it seems that
> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work 
to
> expose common biology  computational services. If multiple external 
services
> are offering BLAST via web services where each picked a different
> implementation then BioJava could provide abstraction to different 
services.
>
>
>
> Thanks
>
> Scooter
>
>
>
> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com]
> Sent: Tuesday, May 12, 2009 1:27 AM
> To: Scooter Willis
> Cc: Andreas Prlic; biojava-dev
> Subject: Re: [Biojava-dev] Plans for next biojava release - 
modularization
>
>
>
> Hi -
>
> This was one thing we discussed previously with respect to biojava 3.
>  Generally I support the idea because almost all computers are now
> multi-core and as you say cloud or utility computing is already a 
reality.
>
> However, I tend to think that biojava should not control threading or
> concurrency. This should be done by the developer. This is because 
sometimes
> mutithreading can be fast on a slow computer but slow on a fast computer
> (due to the overhead in spawning threads) so programs need to be 
tunable.
> Also Java app servers and things like Sun Grid Engine, EC2 etc don't 
like
> people attempting to control their own threads.  What BioJava should do 
is
> expose granular and thread-safe operations that can be threaded or form
> discrete tasks on a utility grid or complete in SessionBeans on an App
> server.  For example it would be better if BioJava had a single threaded
> method to calculate the GC of a single sequence rather than a 
multi-threaded
> method that calculates the GC of multiple sequences.  This would let the
> developer make a multithreaded version if desired or distribute multiple
> tasks based on the single threaded version to a compute cloud (and let 
the
> cloud manage all the tasks).
>
> Possibly the best situation would be to have the single threaded fine 
grain
> operations that let developers or grid engines control threading and 
then
> higher level APIs that do it for you (or good cookbook examples that 
show
> you how to do it).  Another idea that was discussed was the use of
> properties files to allow people to set how many CPUs they wanted to 
make
> available to the JVM or name packages that can or cannot use threading.
>
> Finally, there are lots of times when it is highly desirable to use Java
> beans because they play well with dozens of Java api's however beans 
don't
> work well with threads because they have public setter methods.  I would
> like to see a lot more bean use in a future BioJava because it would 
make
> life so much easier but a lot of care would need to be taken to make 
sure
> thread safety is preserved.  There are many patterns that can be used 
such
> as synchronization locks etc to make things thread safe so I think this 
can
> be achieved as long as we are disciplined and consider that all methods 
may
> be used in a multi-threaded application (even if we write the method as 
a
> single thread).  If there are code checkers that make suggestions on 
thread
> safety it would be great to have these as part of the standard build
> process.  Good documentation would go a long way as well.  Are there 
unit
> test patterns that can catch these problems as well?  Suggestions would 
be
> great.
>
> Progress Listener patterns are good but it depends on the situation and
> might be better handled in high level APIs or left to the developer. 
 For
> example in your NJ code a progress listener would be good if someone fed
> 1000 sequences into the method but not if they only put in 10. Also code
> running on an old machine might need a progress listener but the same
> problem on a new machine may complete almost instantly.  Probably a
> pluggable listener would be the way to go.  Also it might be possible to 
do
> this using the new JDK APIs that let you take a peek at the stack trace.
> Even if your NJ method didn't allow for a progress listener a developer
> could still make one by looking at the method calls in the stack. As 
long as
> your NJ method called other methods internally for each sequence (quite
> likely) it would be possible to observe the cycle of method calls from 
the
> stack.  This might make it possible to have a very general BioJava 
progress
> listener that can be told to count the number of times a method is 
called in
> the stack. The name of the method would be the argument.  If the 
application
> runs in a Java App server you can also do this very easily with a method
> Interceptor.
>
> - Mark
>
> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
>
>> Andreas
>>
>> Another theme that should be considered is providing a multi-thread
>> version of any module with long run time. This would have a couple
>> elements. A progress listener interface should be standard where core
>> code would update progress messages to listeners that can be used by
>> external code to display feedback to the user. I did this with the
>> Neighbor Joining code for tree construction and it provides needed
>> feedback in a GUI. If not the user gets frustrated because they don't
>> know the code they are about to execute may take 10 minutes or 8 hours
>> to complete and they think the software is not working. The reverse is
>> also true for canceling an operation where you want to have core code
>> stop processing a long running loop. Once the code has completed then
>> the listener interface for process complete is called allowing the next
>> step in the external code to continue. The developer would have the
>> choice to call the "process" method or run it in a thread and wait for
>> the callback complete method to be called.
>>
>> This is the first step in the ability to have the core/long running
>> processes take advantage of multiple threads to complete the
>> computational task faster. Not all code can be parallelized easily but
>> if the algorithm can take advantage of running in parallel then it
>> should. This then opens up a couple of cloud computing frameworks that
>> extend the multi-threaded concepts in Java across a cluster
>> http://www.terracotta.org/. If we put an emphasis on having code that
>> runs well in a thread we are one step closer to an architecture that 
can
>> run in a cloud. The computational problems are only going to get bigger
>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>> computational IO cycles are going to be cheap as long as the
>> software/libraries can easily take advantage of it.
>>
>> Thanks
>>
>> Scooter
>>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>> Prlic
>> Sent: Monday, May 11, 2009 12:27 AM
>> To: biojava-dev
>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release.  I  would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release.  Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> _________________________
>
> CONFIDENTIALITY NOTICE
>
> The information contained in this e-mail message is intended only for 
the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure 
under
> applicable law. If the reader of this message is not the intended 
recipient,
> or the employee or agent responsible for delivery of the message to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If 
you
> have received this communication in error, please notify the sender
> immediately by e-mail and delete the material from any computer.  Thank 
you.

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From HWillis at scripps.edu  Tue May 12 20:23:30 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Tue, 12 May 2009 20:23:30 -0400
Subject: [Biojava-dev] Plans for next biojava release - modularization
References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu><OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com><061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C855@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C858@FLMAIL1.fl.ad.scripps.edu>


Andreas

A follow up point related to Mark's comment could be that parsing blast output would not be required or less important if we provide a clean BioJava API to make the web service call with BioJava data structure inputs and give back BioJava data structure outputs. This saves the step of the user doing the web query, file save, parse etc. It would be interesting to know how many users run their own BLAST server for privacy reasons.

Scooter

-----Original Message-----
From: Scooter Willis
Sent: Tue 5/12/2009 8:13 PM
To: Andreas Prlic
Cc: biojava-dev
Subject: RE: [Biojava-dev] Plans for next biojava release - modularization
 
Andreas

The goal for BioJava could be to provide a wrapper for the http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp java code so that inputs/outputs are BioJava. I think they are using Axis for the client web services code. If BioJava 3 is going to be Java 6 minimum then it is easier to use the Java 6 SOAP processing capabilities by pointing to the WSDL code and generating the Java code for the client side. This cuts down on the additional external 3rd parties that are required.

I try to stay out of the legacy file parsing business whenever possible. 

Scooter 

-----Original Message-----
From: andreas.prlic at gmail.com on behalf of Andreas Prlic
Sent: Tue 5/12/2009 7:59 PM
To: Scooter Willis
Cc: biojava-dev
Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
 
Hi Scooter,

about your suggestion for the blast webservice client code: In
principle I like the idea and we have had questions on the mailing
list regarding this in the past. Only thing is I think there is
already some client code in java available:
http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp
but I am not sure how good that Java client library is....

Besides this, there is the need for work on our blast parser library
and if you are interested in working on that you are welcome. As I
mentioned, I think this should become its own module, due to the
popularity of that code.

Andreas


On Tue, May 12, 2009 at 6:34 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Mark
>
>
>
> It is a challenge on knowing where to draw the line. Allowing both options
> is a reasonable approach. The implementation of the algorithm is key to
> allow it to be multi-threaded or being able to run in parallel. One approach
> is to provide a standard interface such as process() would wait for the
> result/return value and run in the parent thread. To run the algorithm in a
> thread you can have a startProcess() where you can add yourself as a
> progress listener and when complete() method is called you can call
> getResults(). You can then also have the corresponding stopProcess() which
> would set an internal value to cause all threads to quit. ?Lots of ways to
> tackle the problem the key is to start talking about it and at minimum take
> advantage of multiple-cores where the external code can set the number of
> cores to use. You can get a dual quad core machine these days for < $1000
> but most software implementations are not designed to take advantage of it.
>
>
>
> The real question is what exists today in the BioJava API that is considered
> long running in normal use case and thus is a candidate to be run in
> parallel. It may not be an issue in existing BioJava code. When I first
> started using BioJava I went looking for BLAST code only to find a BLAST
> parser. I wanted to do a Multiple Sequence Alignment and turns out that
> Biojava code calls CLUSTALW as an external processor under the covers. ?I
> also needed code to construct trees from an MSA and found the summer of code
> project that was only focused on representing the tree.
>
>
>
> It would be nice to have a BLAST implementation in Java optimized to run on
> a cluster but who has time to rewrite BLAST in Java when you can do BLAST
> search via the web and focus on parsing the results. BioJava needs a BLAST
> API that makes a web services call to an external service and gets returns
> structured results in core BioJava structures. Probably not difficult to do
> a Java version of CLUSTALW but again we can push the work out to
> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results
> back returned in BioJava structures.
>
>
>
> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web
> service -> BioJava code. I haven't done the research but it seems that
> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to
> expose common biology ?computational services. If multiple external services
> are offering BLAST via web services where each picked a different
> implementation then BioJava could provide abstraction to different services.
>
>
>
> Thanks
>
> Scooter
>
>
>
> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com]
> Sent: Tuesday, May 12, 2009 1:27 AM
> To: Scooter Willis
> Cc: Andreas Prlic; biojava-dev
> Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
>
>
>
> Hi -
>
> This was one thing we discussed previously with respect to biojava 3.
> ?Generally I support the idea because almost all computers are now
> multi-core and as you say cloud or utility computing is already a reality.
>
> However, I tend to think that biojava should not control threading or
> concurrency. This should be done by the developer. This is because sometimes
> mutithreading can be fast on a slow computer but slow on a fast computer
> (due to the overhead in spawning threads) so programs need to be tunable.
> Also Java app servers and things like Sun Grid Engine, EC2 etc don't like
> people attempting to control their own threads. ?What BioJava should do is
> expose granular and thread-safe operations that can be threaded or form
> discrete tasks on a utility grid or complete in SessionBeans on an App
> server. ?For example it would be better if BioJava had a single threaded
> method to calculate the GC of a single sequence rather than a multi-threaded
> method that calculates the GC of multiple sequences. ?This would let the
> developer make a multithreaded version if desired or distribute multiple
> tasks based on the single threaded version to a compute cloud (and let the
> cloud manage all the tasks).
>
> Possibly the best situation would be to have the single threaded fine grain
> operations that let developers or grid engines control threading and then
> higher level APIs that do it for you (or good cookbook examples that show
> you how to do it). ?Another idea that was discussed was the use of
> properties files to allow people to set how many CPUs they wanted to make
> available to the JVM or name packages that can or cannot use threading.
>
> Finally, there are lots of times when it is highly desirable to use Java
> beans because they play well with dozens of Java api's however beans don't
> work well with threads because they have public setter methods. ?I would
> like to see a lot more bean use in a future BioJava because it would make
> life so much easier but a lot of care would need to be taken to make sure
> thread safety is preserved. ?There are many patterns that can be used such
> as synchronization locks etc to make things thread safe so I think this can
> be achieved as long as we are disciplined and consider that all methods may
> be used in a multi-threaded application (even if we write the method as a
> single thread). ?If there are code checkers that make suggestions on thread
> safety it would be great to have these as part of the standard build
> process. ?Good documentation would go a long way as well. ?Are there unit
> test patterns that can catch these problems as well? ?Suggestions would be
> great.
>
> Progress Listener patterns are good but it depends on the situation and
> might be better handled in high level APIs or left to the developer. ?For
> example in your NJ code a progress listener would be good if someone fed
> 1000 sequences into the method but not if they only put in 10. Also code
> running on an old machine might need a progress listener but the same
> problem on a new machine may complete almost instantly. ?Probably a
> pluggable listener would be the way to go. ?Also it might be possible to do
> this using the new JDK APIs that let you take a peek at the stack trace.
> Even if your NJ method didn't allow for a progress listener a developer
> could still make one by looking at the method calls in the stack. As long as
> your NJ method called other methods internally for each sequence (quite
> likely) it would be possible to observe the cycle of method calls from the
> stack. ?This might make it possible to have a very general BioJava progress
> listener that can be told to count the number of times a method is called in
> the stack. The name of the method would be the argument. ?If the application
> runs in a Java App server you can also do this very easily with a method
> Interceptor.
>
> - Mark
>
> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
>
>> Andreas
>>
>> Another theme that should be considered is providing a multi-thread
>> version of any module with long run time. This would have a couple
>> elements. A progress listener interface should be standard where core
>> code would update progress messages to listeners that can be used by
>> external code to display feedback to the user. I did this with the
>> Neighbor Joining code for tree construction and it provides needed
>> feedback in a GUI. If not the user gets frustrated because they don't
>> know the code they are about to execute may take 10 minutes or 8 hours
>> to complete and they think the software is not working. The reverse is
>> also true for canceling an operation where you want to have core code
>> stop processing a long running loop. Once the code has completed then
>> the listener interface for process complete is called allowing the next
>> step in the external code to continue. The developer would have the
>> choice to call the "process" method or run it in a thread and wait for
>> the callback complete method to be called.
>>
>> This is the first step in the ability to have the core/long running
>> processes take advantage of multiple threads to complete the
>> computational task faster. Not all code can be parallelized easily but
>> if the algorithm can take advantage of running in parallel then it
>> should. This then opens up a couple of cloud computing frameworks that
>> extend the multi-threaded concepts in Java across a cluster
>> http://www.terracotta.org/. If we put an emphasis on having code that
>> runs well in a thread we are one step closer to an architecture that can
>> run in a cloud. The computational problems are only going to get bigger
>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>> computational IO cycles are going to be cheap as long as the
>> software/libraries can easily take advantage of it.
>>
>> Thanks
>>
>> Scooter
>>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>> Prlic
>> Sent: Monday, May 11, 2009 12:27 AM
>> To: biojava-dev
>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release. ?I ?would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release. ?Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> _________________________
>
> CONFIDENTIALITY NOTICE
>
> The information contained in this e-mail message is intended only for the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure under
> applicable law. If the reader of this message is not the intended recipient,
> or the employee or agent responsible for delivery of the message to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If you
> have received this communication in error, please notify the sender
> immediately by e-mail and delete the material from any computer. ?Thank you.


From andreas at sdsc.edu  Tue May 12 20:45:54 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 12 May 2009 17:45:54 -0700
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <OF8495A026.AC43734D-ON482575B5.000057FD-482575B5.0000DF4C@ah.novartis.com>
References: <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com>
	<OF8495A026.AC43734D-ON482575B5.000057FD-482575B5.0000DF4C@ah.novartis.com>
Message-ID: <59a41c430905121745p7325d69dgf7e4d916746bf14d@mail.gmail.com>

The point with the auto-generated code raises actually another
question to me: How shall we deal with auto-generated code?

I also have some code that is  currently not part on BioJava, but it
might be useful for other people: It allows to parse uniprot XML files
and serialize / de-serialize the objects to a database using EJBs,
hibernate and the uniprot XML files.

How far should biojava go in supporting such auto generated or
semi-auto generated code?
A


On Tue, May 12, 2009 at 5:09 PM,  <mark.schreiber at novartis.com> wrote:
>
> A while back I gave Richard some code that uses JAXB to objectify (and
> deobjectify) BLAST XML output. This might be useful for parsing BLAST
> results from the webservices which normally use BLAST XML. I could probably
> dig it up again if needed (it was autogenerated anyway).
>
> It would probably be a good object model for BLAST output if people want to
> parse other types of BLAST output (such as flatfile, but who would want to
> do that!). ?The BLAST XML seems to accommodate strange flavours of BLAST
> such as PSI-BLAST etc and also has been much more stable than the default
> flat file output.
>
> - Mark
>
>
>
> Andreas Prlic <andreas at sdsc.edu>
> Sent by: biojava-dev-bounces at lists.open-bio.org
>
> 05/13/2009 08:02 AM
>
> To
> Scooter Willis <HWillis at scripps.edu>
> cc
> biojava-dev <biojava-dev at lists.open-bio.org>
> Subject
> Re: [Biojava-dev] Plans for next biojava release - modularization
>
>
>
>
> Hi Scooter,
>
> about your suggestion for the blast webservice client code: In
> principle I like the idea and we have had questions on the mailing
> list regarding this in the past. Only thing is I think there is
> already some client code in java available:
> http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp
> but I am not sure how good that Java client library is....
>
> Besides this, there is the need for work on our blast parser library
> and if you are interested in working on that you are welcome. As I
> mentioned, I think this should become its own module, due to the
> popularity of that code.
>
> Andreas
>
>
>
>
> On Tue, May 12, 2009 at 6:34 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>> Mark
>>
>>
>>
>> It is a challenge on knowing where to draw the line. Allowing both options
>> is a reasonable approach. The implementation of the algorithm is key to
>> allow it to be multi-threaded or being able to run in parallel. One
>> approach
>> is to provide a standard interface such as process() would wait for the
>> result/return value and run in the parent thread. To run the algorithm in
>> a
>> thread you can have a startProcess() where you can add yourself as a
>> progress listener and when complete() method is called you can call
>> getResults(). You can then also have the corresponding stopProcess() which
>> would set an internal value to cause all threads to quit. ?Lots of ways to
>> tackle the problem the key is to start talking about it and at minimum
>> take
>> advantage of multiple-cores where the external code can set the number of
>> cores to use. You can get a dual quad core machine these days for < $1000
>> but most software implementations are not designed to take advantage of
>> it.
>>
>>
>>
>> The real question is what exists today in the BioJava API that is
>> considered
>> long running in normal use case and thus is a candidate to be run in
>> parallel. It may not be an issue in existing BioJava code. When I first
>> started using BioJava I went looking for BLAST code only to find a BLAST
>> parser. I wanted to do a Multiple Sequence Alignment and turns out that
>> Biojava code calls CLUSTALW as an external processor under the covers. ?I
>> also needed code to construct trees from an MSA and found the summer of
>> code
>> project that was only focused on representing the tree.
>>
>>
>>
>> It would be nice to have a BLAST implementation in Java optimized to run
>> on
>> a cluster but who has time to rewrite BLAST in Java when you can do BLAST
>> search via the web and focus on parsing the results. BioJava needs a BLAST
>> API that makes a web services call to an external service and gets returns
>> structured results in core BioJava structures. Probably not difficult to
>> do
>> a Java version of CLUSTALW but again we can push the work out to
>> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the
>> results
>> back returned in BioJava structures.
>>
>>
>>
>> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web
>> service -> BioJava code. I haven?t done the research but it seems that
>> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to
>> expose common biology ?computational services. If multiple external
>> services
>> are offering BLAST via web services where each picked a different
>> implementation then BioJava could provide abstraction to different
>> services.
>>
>>
>>
>> Thanks
>>
>> Scooter
>>
>>
>>
>> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com]
>> Sent: Tuesday, May 12, 2009 1:27 AM
>> To: Scooter Willis
>> Cc: Andreas Prlic; biojava-dev
>> Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
>>
>>
>>
>> Hi -
>>
>> This was one thing we discussed previously with respect to biojava 3.
>> ?Generally I support the idea because almost all computers are now
>> multi-core and as you say cloud or utility computing is already a reality.
>>
>> However, I tend to think that biojava should not control threading or
>> concurrency. This should be done by the developer. This is because
>> sometimes
>> mutithreading can be fast on a slow computer but slow on a fast computer
>> (due to the overhead in spawning threads) so programs need to be tunable.
>> Also Java app servers and things like Sun Grid Engine, EC2 etc don't like
>> people attempting to control their own threads. ?What BioJava should do is
>> expose granular and thread-safe operations that can be threaded or form
>> discrete tasks on a utility grid or complete in SessionBeans on an App
>> server. ?For example it would be better if BioJava had a single threaded
>> method to calculate the GC of a single sequence rather than a
>> multi-threaded
>> method that calculates the GC of multiple sequences. ?This would let the
>> developer make a multithreaded version if desired or distribute multiple
>> tasks based on the single threaded version to a compute cloud (and let the
>> cloud manage all the tasks).
>>
>> Possibly the best situation would be to have the single threaded fine
>> grain
>> operations that let developers or grid engines control threading and then
>> higher level APIs that do it for you (or good cookbook examples that show
>> you how to do it). ?Another idea that was discussed was the use of
>> properties files to allow people to set how many CPUs they wanted to make
>> available to the JVM or name packages that can or cannot use threading.
>>
>> Finally, there are lots of times when it is highly desirable to use Java
>> beans because they play well with dozens of Java api's however beans don't
>> work well with threads because they have public setter methods. ?I would
>> like to see a lot more bean use in a future BioJava because it would make
>> life so much easier but a lot of care would need to be taken to make sure
>> thread safety is preserved. ?There are many patterns that can be used such
>> as synchronization locks etc to make things thread safe so I think this
>> can
>> be achieved as long as we are disciplined and consider that all methods
>> may
>> be used in a multi-threaded application (even if we write the method as a
>> single thread). ?If there are code checkers that make suggestions on
>> thread
>> safety it would be great to have these as part of the standard build
>> process. ?Good documentation would go a long way as well. ?Are there unit
>> test patterns that can catch these problems as well? ?Suggestions would be
>> great.
>>
>> Progress Listener patterns are good but it depends on the situation and
>> might be better handled in high level APIs or left to the developer. ?For
>> example in your NJ code a progress listener would be good if someone fed
>> 1000 sequences into the method but not if they only put in 10. Also code
>> running on an old machine might need a progress listener but the same
>> problem on a new machine may complete almost instantly. ?Probably a
>> pluggable listener would be the way to go. ?Also it might be possible to
>> do
>> this using the new JDK APIs that let you take a peek at the stack trace.
>> Even if your NJ method didn't allow for a progress listener a developer
>> could still make one by looking at the method calls in the stack. As long
>> as
>> your NJ method called other methods internally for each sequence (quite
>> likely) it would be possible to observe the cycle of method calls from the
>> stack. ?This might make it possible to have a very general BioJava
>> progress
>> listener that can be told to count the number of times a method is called
>> in
>> the stack. The name of the method would be the argument. ?If the
>> application
>> runs in a Java App server you can also do this very easily with a method
>> Interceptor.
>>
>> - Mark
>>
>> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
>>
>>> Andreas
>>>
>>> Another theme that should be considered is providing a multi-thread
>>> version of any module with long run time. This would have a couple
>>> elements. A progress listener interface should be standard where core
>>> code would update progress messages to listeners that can be used by
>>> external code to display feedback to the user. I did this with the
>>> Neighbor Joining code for tree construction and it provides needed
>>> feedback in a GUI. If not the user gets frustrated because they don't
>>> know the code they are about to execute may take 10 minutes or 8 hours
>>> to complete and they think the software is not working. The reverse is
>>> also true for canceling an operation where you want to have core code
>>> stop processing a long running loop. Once the code has completed then
>>> the listener interface for process complete is called allowing the next
>>> step in the external code to continue. The developer would have the
>>> choice to call the "process" method or run it in a thread and wait for
>>> the callback complete method to be called.
>>>
>>> This is the first step in the ability to have the core/long running
>>> processes take advantage of multiple threads to complete the
>>> computational task faster. Not all code can be parallelized easily but
>>> if the algorithm can take advantage of running in parallel then it
>>> should. This then opens up a couple of cloud computing frameworks that
>>> extend the multi-threaded concepts in Java across a cluster
>>> http://www.terracotta.org/. If we put an emphasis on having code that
>>> runs well in a thread we are one step closer to an architecture that can
>>> run in a cloud. The computational problems are only going to get bigger
>>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>>> computational IO cycles are going to be cheap as long as the
>>> software/libraries can easily take advantage of it.
>>>
>>> Thanks
>>>
>>> Scooter
>>>
>>> -----Original Message-----
>>> From: biojava-dev-bounces at lists.open-bio.org
>>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>>> Prlic
>>> Sent: Monday, May 11, 2009 12:27 AM
>>> To: biojava-dev
>>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>>
>>> Hi biojava-devs,
>>>
>>> It is time to start working on the next biojava release. ?I ?would
>>> like to modularize the current code base and apply some of the ideas
>>> that have emerged around Richard's "biojava 3" code. In principle the
>>> idea is that all changes should be backwards compatible with the
>>> interfaces provided by the current biojava 1.7 release. ?Backwards
>>> compatibility shall only be broken if the functionality is being
>>> replaced with something that works better, and gets documented
>>> accordingly. For the build functionality I would suggest to stick with
>>> what Richard's biojava 3 code base already is providing. Since we will
>>> try to be backwards compatible all code development should be part of
>>> the biojava-trunk and the first step will be to move the ant-build
>>> scripts to a maven build process. Following this procedure will allow
>>> to use e.g. the code refactoring tools provided by Eclipse, which
>>> should come in handy.
>>>
>>> The modules I would like to see should provide self-contained
>>> functionality and cross dependencies should be restricted to a
>>> minimum. I would suggest to have the following modules:
>>>
>>> biojava-core: Contains everything that can not easily be modularized
>>> or nobody volunteers to become a module maintainer.
>>> biojava-phylogeny: Scooter expressed some interested to provide such a
>>> module and become package maintainer for it.
>>> biojava-structure: Everything protein structure related. I would be
>>> package maintainer.
>>> biojava-blast: Blast parsing is a frequently requested functionality
>>> and it would be good to have this code self-contained. A package
>>> maintainer for this still will need to be nominated at a later stage.
>>> Any suggestions for other modules?
>>>
>>> Let me know what you think about this.
>>>
>>> Andreas
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _________________________
>>
>> CONFIDENTIALITY NOTICE
>>
>> The information contained in this e-mail message is intended only for the
>> exclusive use of the individual or entity named above and may contain
>> information that is privileged, confidential or exempt from disclosure
>> under
>> applicable law. If the reader of this message is not the intended
>> recipient,
>> or the employee or agent responsible for delivery of the message to the
>> intended recipient, you are hereby notified that any dissemination,
>> distribution or copying of this communication is strictly prohibited. If
>> you
>> have received this communication in error, please notify the sender
>> immediately by e-mail and delete the material from any computer. ?Thank
>> you.
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>


From mark.schreiber at novartis.com  Tue May 12 22:15:27 2009
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 13 May 2009 10:15:27 +0800
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <59a41c430905121745p7325d69dgf7e4d916746bf14d@mail.gmail.com>
Message-ID: <OF3FD186AB.FA0D8059-ON482575B5.000A55FA-482575B5.000C66CB@ah.novartis.com>

Hi -

I think it depends if the code is going to be auto-generated at each build 
or only once.  I have autogenerated Entity classes for BioSQL tables. My 
recommendation would be that these be used for JPA mapping to BioSQL from 
BioJava.  I think these only need be generated once (unless the BioSQL 
schema changes), especially as the autogeneration didn't quite catch some 
of the subtleties of the schema.  They can also be in their own module, 
not the core.

Classes that map to XML or webservice clients can be autogenerated from 
XML schema, DTD or WSDL once or at every build (automatically from ANT and 
probably Maven).  In these cases it may pay to do it with every build 
because these classes are completely boiler plate code and should never 
need to be manually modified.  Also it means the code for these utility 
classes will never be in the code base and at will not be possible for 
someone to change it accidentally (and the code base will be smaller). 
Only the XSD or WSDL will be in subversion (and any higher level code that 
makes use of the boilerplate client code).  Improvements in the 
boilerplate code or changes that come with updates to JAXB and similar 
will automatically appear at the next build (when we change JAXB 
versions).

Conceptually the BLAST XML parsing module may consist of only the BLAST 
XSD (or DTD) and a high-level biojava class like the following:

public interface BlastParser {
        public Serializable[] parseBlast(URL url){
                Calls bioler plate code...
        } 

        public Serializable[] parseBlast(String blastXMLOutput){
                Calls bioler plate code...
        }
}

The code for the bit that does the JAXB marshalling etc could be generated 
at build time.  The Serializable array would be the objects that JAXB 
generates. Probably they would be a more specific stub that implements 
serializable (eg BlastResult or similar depending on the XSD).

I think it really comes down to a question of how much the generated code 
is boilerplate code that will never be changed. If it is not 'modifiable' 
then it can be generated at build. If the autogenerated code is an outline 
of a class where method bodies need to be filled in or customized then 
they should not be autogenerated at build time.  A good example would be 
JUnit classes that can be autogenerated to give you a template that will 
compile and run but probably will not perform a sensible test.  The 
developer of the test could autogenerate the template but would then need 
to make the test sensible. At that point the test should be in the code 
base and should not be regenerated at build time.

- Mark

biojava-dev-bounces at lists.open-bio.org wrote on 05/13/2009 08:45:54 AM:

> The point with the auto-generated code raises actually another
> question to me: How shall we deal with auto-generated code?
> 
> I also have some code that is  currently not part on BioJava, but it
> might be useful for other people: It allows to parse uniprot XML files
> and serialize / de-serialize the objects to a database using EJBs,
> hibernate and the uniprot XML files.
> 
> How far should biojava go in supporting such auto generated or
> semi-auto generated code?
> A
> 
> 
> On Tue, May 12, 2009 at 5:09 PM,  <mark.schreiber at novartis.com> wrote:
> >
> > A while back I gave Richard some code that uses JAXB to objectify (and
> > deobjectify) BLAST XML output. This might be useful for parsing BLAST
> > results from the webservices which normally use BLAST XML. I could 
probably
> > dig it up again if needed (it was autogenerated anyway).
> >
> > It would probably be a good object model for BLAST output if people 
want to
> > parse other types of BLAST output (such as flatfile, but who would 
want to
> > do that!).  The BLAST XML seems to accommodate strange flavours of 
BLAST
> > such as PSI-BLAST etc and also has been much more stable than the 
default
> > flat file output.
> >
> > - Mark
> >
> >
> >
> > Andreas Prlic <andreas at sdsc.edu>
> > Sent by: biojava-dev-bounces at lists.open-bio.org
> >
> > 05/13/2009 08:02 AM
> >
> > To
> > Scooter Willis <HWillis at scripps.edu>
> > cc
> > biojava-dev <biojava-dev at lists.open-bio.org>
> > Subject
> > Re: [Biojava-dev] Plans for next biojava release - modularization
> >
> >
> >
> >
> > Hi Scooter,
> >
> > about your suggestion for the blast webservice client code: In
> > principle I like the idea and we have had questions on the mailing
> > list regarding this in the past. Only thing is I think there is
> > already some client code in java available:
> > http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp
> > but I am not sure how good that Java client library is....
> >
> > Besides this, there is the need for work on our blast parser library
> > and if you are interested in working on that you are welcome. As I
> > mentioned, I think this should become its own module, due to the
> > popularity of that code.
> >
> > Andreas
> >
> >
> >
> >
> > On Tue, May 12, 2009 at 6:34 AM, Scooter Willis <HWillis at scripps.edu> 
wrote:
> >> Mark
> >>
> >>
> >>
> >> It is a challenge on knowing where to draw the line. Allowing both 
options
> >> is a reasonable approach. The implementation of the algorithm is key 
to
> >> allow it to be multi-threaded or being able to run in parallel. One
> >> approach
> >> is to provide a standard interface such as process() would wait for 
the
> >> result/return value and run in the parent thread. To run the 
algorithm in
> >> a
> >> thread you can have a startProcess() where you can add yourself as a
> >> progress listener and when complete() method is called you can call
> >> getResults(). You can then also have the corresponding stopProcess() 
which
> >> would set an internal value to cause all threads to quit.  Lots of 
ways to
> >> tackle the problem the key is to start talking about it and at 
minimum
> >> take
> >> advantage of multiple-cores where the external code can set the 
number of
> >> cores to use. You can get a dual quad core machine these days for < 
$1000
> >> but most software implementations are not designed to take advantage 
of
> >> it.
> >>
> >>
> >>
> >> The real question is what exists today in the BioJava API that is
> >> considered
> >> long running in normal use case and thus is a candidate to be run in
> >> parallel. It may not be an issue in existing BioJava code. When I 
first
> >> started using BioJava I went looking for BLAST code only to find a 
BLAST
> >> parser. I wanted to do a Multiple Sequence Alignment and turns out 
that
> >> Biojava code calls CLUSTALW as an external processor under the 
covers.  I
> >> also needed code to construct trees from an MSA and found the summer 
of
> >> code
> >> project that was only focused on representing the tree.
> >>
> >>
> >>
> >> It would be nice to have a BLAST implementation in Java optimized to 
run
> >> on
> >> a cluster but who has time to rewrite BLAST in Java when you can do 
BLAST
> >> search via the web and focus on parsing the results. BioJava needs a 
BLAST
> >> API that makes a web services call to an external service and gets 
returns
> >> structured results in core BioJava structures. Probably not difficult 
to
> >> do
> >> a Java version of CLUSTALW but again we can push the work out to
> >> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the
> >> results
> >> back returned in BioJava structures.
> >>
> >>
> >>
> >> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW 
web
> >> service -> BioJava code. I haven?t done the research but it seems 
that
> >> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of 
work to
> >> expose common biology  computational services. If multiple external
> >> services
> >> are offering BLAST via web services where each picked a different
> >> implementation then BioJava could provide abstraction to different
> >> services.
> >>
> >>
> >>
> >> Thanks
> >>
> >> Scooter
> >>
> >>
> >>
> >> From: mark.schreiber at novartis.com 
[mailto:mark.schreiber at novartis.com]
> >> Sent: Tuesday, May 12, 2009 1:27 AM
> >> To: Scooter Willis
> >> Cc: Andreas Prlic; biojava-dev
> >> Subject: Re: [Biojava-dev] Plans for next biojava release - 
modularization
> >>
> >>
> >>
> >> Hi -
> >>
> >> This was one thing we discussed previously with respect to biojava 3.
> >>  Generally I support the idea because almost all computers are now
> >> multi-core and as you say cloud or utility computing is already a 
reality.
> >>
> >> However, I tend to think that biojava should not control threading or
> >> concurrency. This should be done by the developer. This is because
> >> sometimes
> >> mutithreading can be fast on a slow computer but slow on a fast 
computer
> >> (due to the overhead in spawning threads) so programs need to be 
tunable.
> >> Also Java app servers and things like Sun Grid Engine, EC2 etc don't 
like
> >> people attempting to control their own threads.  What BioJava should 
do is
> >> expose granular and thread-safe operations that can be threaded or 
form
> >> discrete tasks on a utility grid or complete in SessionBeans on an 
App
> >> server.  For example it would be better if BioJava had a single 
threaded
> >> method to calculate the GC of a single sequence rather than a
> >> multi-threaded
> >> method that calculates the GC of multiple sequences.  This would let 
the
> >> developer make a multithreaded version if desired or distribute 
multiple
> >> tasks based on the single threaded version to a compute cloud (and 
let the
> >> cloud manage all the tasks).
> >>
> >> Possibly the best situation would be to have the single threaded fine
> >> grain
> >> operations that let developers or grid engines control threading and 
then
> >> higher level APIs that do it for you (or good cookbook examples that 
show
> >> you how to do it).  Another idea that was discussed was the use of
> >> properties files to allow people to set how many CPUs they wanted to 
make
> >> available to the JVM or name packages that can or cannot use 
threading.
> >>
> >> Finally, there are lots of times when it is highly desirable to use 
Java
> >> beans because they play well with dozens of Java api's however beans 
don't
> >> work well with threads because they have public setter methods.  I 
would
> >> like to see a lot more bean use in a future BioJava because it would 
make
> >> life so much easier but a lot of care would need to be taken to make 
sure
> >> thread safety is preserved.  There are many patterns that can be used 
such
> >> as synchronization locks etc to make things thread safe so I think 
this
> >> can
> >> be achieved as long as we are disciplined and consider that all 
methods
> >> may
> >> be used in a multi-threaded application (even if we write the method 
as a
> >> single thread).  If there are code checkers that make suggestions on
> >> thread
> >> safety it would be great to have these as part of the standard build
> >> process.  Good documentation would go a long way as well.  Are there 
unit
> >> test patterns that can catch these problems as well?  Suggestions 
would be
> >> great.
> >>
> >> Progress Listener patterns are good but it depends on the situation 
and
> >> might be better handled in high level APIs or left to the developer. 
 For
> >> example in your NJ code a progress listener would be good if someone 
fed
> >> 1000 sequences into the method but not if they only put in 10. Also 
code
> >> running on an old machine might need a progress listener but the same
> >> problem on a new machine may complete almost instantly.  Probably a
> >> pluggable listener would be the way to go.  Also it might be possible 
to
> >> do
> >> this using the new JDK APIs that let you take a peek at the stack 
trace.
> >> Even if your NJ method didn't allow for a progress listener a 
developer
> >> could still make one by looking at the method calls in the stack. As 
long
> >> as
> >> your NJ method called other methods internally for each sequence 
(quite
> >> likely) it would be possible to observe the cycle of method calls 
from the
> >> stack.  This might make it possible to have a very general BioJava
> >> progress
> >> listener that can be told to count the number of times a method is 
called
> >> in
> >> the stack. The name of the method would be the argument.  If the
> >> application
> >> runs in a Java App server you can also do this very easily with a 
method
> >> Interceptor.
> >>
> >> - Mark
> >>
> >> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 
PM:
> >>
> >>> Andreas
> >>>
> >>> Another theme that should be considered is providing a multi-thread
> >>> version of any module with long run time. This would have a couple
> >>> elements. A progress listener interface should be standard where 
core
> >>> code would update progress messages to listeners that can be used by
> >>> external code to display feedback to the user. I did this with the
> >>> Neighbor Joining code for tree construction and it provides needed
> >>> feedback in a GUI. If not the user gets frustrated because they 
don't
> >>> know the code they are about to execute may take 10 minutes or 8 
hours
> >>> to complete and they think the software is not working. The reverse 
is
> >>> also true for canceling an operation where you want to have core 
code
> >>> stop processing a long running loop. Once the code has completed 
then
> >>> the listener interface for process complete is called allowing the 
next
> >>> step in the external code to continue. The developer would have the
> >>> choice to call the "process" method or run it in a thread and wait 
for
> >>> the callback complete method to be called.
> >>>
> >>> This is the first step in the ability to have the core/long running
> >>> processes take advantage of multiple threads to complete the
> >>> computational task faster. Not all code can be parallelized easily 
but
> >>> if the algorithm can take advantage of running in parallel then it
> >>> should. This then opens up a couple of cloud computing frameworks 
that
> >>> extend the multi-threaded concepts in Java across a cluster
> >>> http://www.terracotta.org/. If we put an emphasis on having code 
that
> >>> runs well in a thread we are one step closer to an architecture that 
can
> >>> run in a cloud. The computational problems are only going to get 
bigger
> >>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> >>> computational IO cycles are going to be cheap as long as the
> >>> software/libraries can easily take advantage of it.
> >>>
> >>> Thanks
> >>>
> >>> Scooter
> >>>
> >>> -----Original Message-----
> >>> From: biojava-dev-bounces at lists.open-bio.org
> >>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> >>> Prlic
> >>> Sent: Monday, May 11, 2009 12:27 AM
> >>> To: biojava-dev
> >>> Subject: [Biojava-dev] Plans for next biojava release - 
modularization
> >>>
> >>> Hi biojava-devs,
> >>>
> >>> It is time to start working on the next biojava release.  I  would
> >>> like to modularize the current code base and apply some of the ideas
> >>> that have emerged around Richard's "biojava 3" code. In principle 
the
> >>> idea is that all changes should be backwards compatible with the
> >>> interfaces provided by the current biojava 1.7 release.  Backwards
> >>> compatibility shall only be broken if the functionality is being
> >>> replaced with something that works better, and gets documented
> >>> accordingly. For the build functionality I would suggest to stick 
with
> >>> what Richard's biojava 3 code base already is providing. Since we 
will
> >>> try to be backwards compatible all code development should be part 
of
> >>> the biojava-trunk and the first step will be to move the ant-build
> >>> scripts to a maven build process. Following this procedure will 
allow
> >>> to use e.g. the code refactoring tools provided by Eclipse, which
> >>> should come in handy.
> >>>
> >>> The modules I would like to see should provide self-contained
> >>> functionality and cross dependencies should be restricted to a
> >>> minimum. I would suggest to have the following modules:
> >>>
> >>> biojava-core: Contains everything that can not easily be modularized
> >>> or nobody volunteers to become a module maintainer.
> >>> biojava-phylogeny: Scooter expressed some interested to provide such 
a
> >>> module and become package maintainer for it.
> >>> biojava-structure: Everything protein structure related. I would be
> >>> package maintainer.
> >>> biojava-blast: Blast parsing is a frequently requested functionality
> >>> and it would be good to have this code self-contained. A package
> >>> maintainer for this still will need to be nominated at a later 
stage.
> >>> Any suggestions for other modules?
> >>>
> >>> Let me know what you think about this.
> >>>
> >>> Andreas
> >>> _______________________________________________
> >>> biojava-dev mailing list
> >>> biojava-dev at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>>
> >>> _______________________________________________
> >>> biojava-dev mailing list
> >>> biojava-dev at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>
> >> _________________________
> >>
> >> CONFIDENTIALITY NOTICE
> >>
> >> The information contained in this e-mail message is intended only for 
the
> >> exclusive use of the individual or entity named above and may contain
> >> information that is privileged, confidential or exempt from 
disclosure
> >> under
> >> applicable law. If the reader of this message is not the intended
> >> recipient,
> >> or the employee or agent responsible for delivery of the message to 
the
> >> intended recipient, you are hereby notified that any dissemination,
> >> distribution or copying of this communication is strictly prohibited. 
If
> >> you
> >> have received this communication in error, please notify the sender
> >> immediately by e-mail and delete the material from any computer. 
 Thank
> >> you.
> >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
> >
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From msmoot at ucsd.edu  Thu May 21 19:47:22 2009
From: msmoot at ucsd.edu (Mike Smoot)
Date: Thu, 21 May 2009 16:47:22 -0700
Subject: [Biojava-dev] an outsider's take on Biojava 3
Message-ID: <f9ac1d730905211647i7e80aaa2xcaa77d43ff8ea4c3@mail.gmail.com>

Hi Everyone,

I thought I'd respond to Andreas' request for participation in the BioJava 3
design discussions that he made last week on the normal BioJava list.  I'm
the lead developer on the Cytoscape project (http://cytoscape.org), so I
thought I'd provide some perspective on what a project using BioJava might
look for in BioJava 3.

Basically, I'd just like to voice my strong support for the "Basic
Principles" listed here: http://biojava.org/wiki/BioJava3_Design.  Finer
granularity of jars, acyclic dependencies, and the separation of API and
implementation are precisely the things we're doing in Cytoscape 3.  The
first two points will go a long way towards making it easier to use specific
parts of the library without needing everything at once.  The second point
will allow alternative implementations of certain interfaces, which is one
approach to dealing with issues like parallel vs. non-parallel versions of
algorithms.  Maven also sounds great.

If I could add one bullet to the list, it would be to add OSGi metadata to
the jars to allow easy integration with OSGi-based projects (such as
Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven
plugins to make this dead simple and it wouldn't impact anyone not using
OSGi.

Please take that with a large grain of salt, I just thought you might
appreciate an outsider's perspective!

thanks,
Mike

-- 
____________________________________________________________
Michael Smoot, Ph.D.               Bioengineering Department
tel: 858-822-4756         University of California San Diego

From markjschreiber at gmail.com  Thu May 21 22:59:14 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 22 May 2009 10:59:14 +0800
Subject: [Biojava-dev] an outsider's take on Biojava 3
In-Reply-To: <f9ac1d730905211647i7e80aaa2xcaa77d43ff8ea4c3@mail.gmail.com>
References: <f9ac1d730905211647i7e80aaa2xcaa77d43ff8ea4c3@mail.gmail.com>
Message-ID: <93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com>

Thanks for the comments. The OSGi system sounds interesting. I think
we should consider it.

I have also added two more recommendations for the Design Principles:


On Fri, May 22, 2009 at 7:47 AM, Mike Smoot <msmoot at ucsd.edu> wrote:
> Hi Everyone,
>
> I thought I'd respond to Andreas' request for participation in the BioJava 3
> design discussions that he made last week on the normal BioJava list. ?I'm
> the lead developer on the Cytoscape project (http://cytoscape.org), so I
> thought I'd provide some perspective on what a project using BioJava might
> look for in BioJava 3.
>
> Basically, I'd just like to voice my strong support for the "Basic
> Principles" listed here: http://biojava.org/wiki/BioJava3_Design. ?Finer
> granularity of jars, acyclic dependencies, and the separation of API and
> implementation are precisely the things we're doing in Cytoscape 3. ?The
> first two points will go a long way towards making it easier to use specific
> parts of the library without needing everything at once. ?The second point
> will allow alternative implementations of certain interfaces, which is one
> approach to dealing with issues like parallel vs. non-parallel versions of
> algorithms. ?Maven also sounds great.
>
> If I could add one bullet to the list, it would be to add OSGi metadata to
> the jars to allow easy integration with OSGi-based projects (such as
> Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven
> plugins to make this dead simple and it wouldn't impact anyone not using
> OSGi.
>
> Please take that with a large grain of salt, I just thought you might
> appreciate an outsider's perspective!
>
> thanks,
> Mike
>
> --
> ____________________________________________________________
> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department
> tel: 858-822-4756 ? ? ? ? University of California San Diego
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From markjschreiber at gmail.com  Thu May 21 23:01:57 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 22 May 2009 11:01:57 +0800
Subject: [Biojava-dev] an outsider's take on Biojava 3
In-Reply-To: <93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com>
References: <f9ac1d730905211647i7e80aaa2xcaa77d43ff8ea4c3@mail.gmail.com> 
	<93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com>
Message-ID: <93b45ca50905212001v70067680mafb8f0bc36f6c497@mail.gmail.com>

Sorry, sent before I said what the new principles were.

1. Extensive use of the Logging API
2. (At the risk of having a fatwa declared against me) Most biojava
exceptions should derive from RuntimeException and be unchecked

See the wiki page for more details.

- Mark

On Fri, May 22, 2009 at 10:59 AM, Mark Schreiber
<markjschreiber at gmail.com> wrote:
> Thanks for the comments. The OSGi system sounds interesting. I think
> we should consider it.
>
> I have also added two more recommendations for the Design Principles:
>
>
> On Fri, May 22, 2009 at 7:47 AM, Mike Smoot <msmoot at ucsd.edu> wrote:
>> Hi Everyone,
>>
>> I thought I'd respond to Andreas' request for participation in the BioJava 3
>> design discussions that he made last week on the normal BioJava list. ?I'm
>> the lead developer on the Cytoscape project (http://cytoscape.org), so I
>> thought I'd provide some perspective on what a project using BioJava might
>> look for in BioJava 3.
>>
>> Basically, I'd just like to voice my strong support for the "Basic
>> Principles" listed here: http://biojava.org/wiki/BioJava3_Design. ?Finer
>> granularity of jars, acyclic dependencies, and the separation of API and
>> implementation are precisely the things we're doing in Cytoscape 3. ?The
>> first two points will go a long way towards making it easier to use specific
>> parts of the library without needing everything at once. ?The second point
>> will allow alternative implementations of certain interfaces, which is one
>> approach to dealing with issues like parallel vs. non-parallel versions of
>> algorithms. ?Maven also sounds great.
>>
>> If I could add one bullet to the list, it would be to add OSGi metadata to
>> the jars to allow easy integration with OSGi-based projects (such as
>> Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven
>> plugins to make this dead simple and it wouldn't impact anyone not using
>> OSGi.
>>
>> Please take that with a large grain of salt, I just thought you might
>> appreciate an outsider's perspective!
>>
>> thanks,
>> Mike
>>
>> --
>> ____________________________________________________________
>> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department
>> tel: 858-822-4756 ? ? ? ? University of California San Diego
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>


From holland at eaglegenomics.com  Fri May 22 05:02:43 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 22 May 2009 10:02:43 +0100
Subject: [Biojava-dev] an outsider's take on Biojava 3
In-Reply-To: <93b45ca50905212001v70067680mafb8f0bc36f6c497@mail.gmail.com>
References: <f9ac1d730905211647i7e80aaa2xcaa77d43ff8ea4c3@mail.gmail.com>
	<93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com>
	<93b45ca50905212001v70067680mafb8f0bc36f6c497@mail.gmail.com>
Message-ID: <1242982963.10413.6.camel@buzzybee>

RuntimeException is good for things that can't be recovered from. If the
user has provided bad coordinates or invalid sequence, that's a
recoverable error (because there's a chance that they came from user
input via a user interface, which can be corrected and retried). Even
file parsing exceptions should be recoverable - the user can move on to
the next record without borking the entire file (we already see broken
records quite a lot in Genbank downloads).

But, for things like being unable to call out to Blast, or being unable
to convert DNA to Protein because of a misconfiguration internally
somewhere, I agree that RuntimeExceptions are probably best. These are
unrecoverable and indicate that changes need to be made to the
programming code or BioJava itself.

So in my mind then RuntimeExceptions are good for highlighting
programming errors, but not good for errors relating to invalid input
data.


On Fri, 2009-05-22 at 11:01 +0800, Mark Schreiber wrote:
> Sorry, sent before I said what the new principles were.
> 
> 1. Extensive use of the Logging API
> 2. (At the risk of having a fatwa declared against me) Most biojava
> exceptions should derive from RuntimeException and be unchecked
> 
> See the wiki page for more details.
> 
> - Mark
> 
> On Fri, May 22, 2009 at 10:59 AM, Mark Schreiber
> <markjschreiber at gmail.com> wrote:
> > Thanks for the comments. The OSGi system sounds interesting. I think
> > we should consider it.
> >
> > I have also added two more recommendations for the Design Principles:
> >
> >
> > On Fri, May 22, 2009 at 7:47 AM, Mike Smoot <msmoot at ucsd.edu> wrote:
> >> Hi Everyone,
> >>
> >> I thought I'd respond to Andreas' request for participation in the BioJava 3
> >> design discussions that he made last week on the normal BioJava list.  I'm
> >> the lead developer on the Cytoscape project (http://cytoscape.org), so I
> >> thought I'd provide some perspective on what a project using BioJava might
> >> look for in BioJava 3.
> >>
> >> Basically, I'd just like to voice my strong support for the "Basic
> >> Principles" listed here: http://biojava.org/wiki/BioJava3_Design.  Finer
> >> granularity of jars, acyclic dependencies, and the separation of API and
> >> implementation are precisely the things we're doing in Cytoscape 3.  The
> >> first two points will go a long way towards making it easier to use specific
> >> parts of the library without needing everything at once.  The second point
> >> will allow alternative implementations of certain interfaces, which is one
> >> approach to dealing with issues like parallel vs. non-parallel versions of
> >> algorithms.  Maven also sounds great.
> >>
> >> If I could add one bullet to the list, it would be to add OSGi metadata to
> >> the jars to allow easy integration with OSGi-based projects (such as
> >> Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven
> >> plugins to make this dead simple and it wouldn't impact anyone not using
> >> OSGi.
> >>
> >> Please take that with a large grain of salt, I just thought you might
> >> appreciate an outsider's perspective!
> >>
> >> thanks,
> >> Mike
> >>
> >> --
> >> ____________________________________________________________
> >> Michael Smoot, Ph.D.               Bioengineering Department
> >> tel: 858-822-4756         University of California San Diego
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>
> >
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andreas at sdsc.edu  Mon May 25 00:22:09 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 24 May 2009 21:22:09 -0700
Subject: [Biojava-dev] next steps
Message-ID: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>

Hi,

While talking about design requirements, I think we also need to think
pragmatically about how much time we will have to refactor code vs.
re-writing modules from scratch. To get started with the next steps, I
 suggest the following procedure: First thing will be to move to
Maven. Then components should be refactored into independent
sub-modules. Then the submodules can get improved to follow the new
design guidelines. Once we have reached a certain stability with the
re-organized code base, we will make the next release.

Any comments? If there is general agreement about this, I would take
the next step and replace the ant build system with a maven based one.

Andreas

From andreas at sdsc.edu  Mon May 25 11:14:06 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 25 May 2009 08:14:06 -0700
Subject: [Biojava-dev] next steps
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <59a41c430905250814p2cfcc627h477e688637f50ccb@mail.gmail.com>

> build some sort of graph relationship tool. It is also easy enough to start
> dragging packages around to different projects in netbeans and resolve
> compiler errors.

yea, same for Eclipse. The Eclipse Maven plugin allows to auto-convert
a project to Maven (quite easy).  I have played around with it and it
was quite easy to get a mavenized biojava with the dependencies
correctly converted.  That's why I thought it might be the first step.
You suggest to first do the modularization and then the maven meta
data.  I still have to figure out how to make make independent
submodules as part of Maven in eclipse now.... let me play around a
bit more and see how it goes...

The package list sounds good and java 1.6 too.

Andreas


>
> The advantage of smaller tightly group functional jars is that it allows you
> to have more frequent minor releases with out updating and releasing the
> entire biojava package. It also allows individuals to own a smaller block of
> code for unit test, documentation and examples.
>
> With Maven this becomes less of an issue to worry about multiple parts and
> pieces and their relationships. I think we need to divide up into a
> reasonable approximation of the jars before doing the meta data for maven.
>
> Looking at the current package structure this is an attempt of grouping
> jars. I do not have enough code familiarity with all of biojava so this is
> strictly based on package names.
>
> biojava-core Any classes that organize data structures and would probably
> include org.biojava.bio.seq.*. Any utility classes that can be used by other
> packages org.biojava.utils.*
>
> biojava-structure org.biojava.bio.structure.*
>
> biojava-gui org.biojava.bio.gui
>
> biojava-phylo A package that has a few classes for viewing trees structures
> using the jgrapht-jdk package. I need to play with the code and see if it
> actually uses graph generated by jgrapht for anything special. I have code
> that will render a tree as a simple graphic. I have used jgrapht?for other
> projects so it is not a bad "graphing" package for network diagrams. It
> could be refactored out.
>
> Not sure how to tackle the org.biojava.bio.program package as it seems to
> have lots of distinct functional code.
>
> biojava-ws-blast - A web service approach to doing blast. The api would hide
> the web services call
>
> biojava-blast - Blast parsing code. We could have one package for anything
> blast related
>
> biojava-ws-clustalw - A web services approach to doing clustalw multiple
> sequence alignment The api would hide the web services call
>
> biojava-alignment - Code for doing sequence alignment. We could have one
> package for anything alignment related
>
> Does anyone know if you can get usage statistics from maven as to what jar
> files are being downloaded? This would help provide good statistics on what
> code is being used which will help focus on improvements in documentation
> etc.
>
> I assume we are going to make Java 1.6 the minimum requirement moving
> forward? This simplifies some of the web services api requirements to
> minimize the number of external packages that are required.
>
>
> Scooter
>
>
>
>
>
>
>
> ________________________________
> From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic
> Sent: Mon 5/25/2009 12:22 AM
> To: biojava-dev at lists.open-bio.org
> Subject: [Biojava-dev] next steps
>
> Hi,
>
> While talking about design requirements, I think we also need to think
> pragmatically about how much time we will have to refactor code vs.
> re-writing modules from scratch. To get started with the next steps, I
> ?suggest the following procedure: First thing will be to move to
> Maven. Then components should be refactored into independent
> sub-modules. Then the submodules can get improved to follow the new
> design guidelines. Once we have reached a certain stability with the
> re-organized code base, we will make the next release.
>
> Any comments? If there is general agreement about this, I would take
> the next step and replace the ant build system with a maven based one.
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From HWillis at scripps.edu  Mon May 25 10:48:50 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Mon, 25 May 2009 10:48:50 -0400
Subject: [Biojava-dev] next steps
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>

Andreas
 
I was looking at the biojava code yesterday to see how easy it would be to divide up into functionally grouped jars based on package hierarchy. I tried to find some refactoring tools that would give a network graph view of class relationships. It is simple enough to parse source for import statements and build some sort of graph relationship tool. It is also easy enough to start dragging packages around to different projects in netbeans and resolve compiler errors.
 
The advantage of smaller tightly group functional jars is that it allows you to have more frequent minor releases with out updating and releasing the entire biojava package. It also allows individuals to own a smaller block of code for unit test, documentation and examples. 
 
With Maven this becomes less of an issue to worry about multiple parts and pieces and their relationships. I think we need to divide up into a reasonable approximation of the jars before doing the meta data for maven. 
 
Looking at the current package structure this is an attempt of grouping jars. I do not have enough code familiarity with all of biojava so this is strictly based on package names.
 
biojava-core Any classes that organize data structures and would probably include org.biojava.bio.seq.*. Any utility classes that can be used by other packages org.biojava.utils.*
 
biojava-structure org.biojava.bio.structure.*
 
biojava-gui org.biojava.bio.gui
 
biojava-phylo A package that has a few classes for viewing trees structures using the jgrapht-jdk package. I need to play with the code and see if it actually uses graph generated by jgrapht for anything special. I have code that will render a tree as a simple graphic. I have used jgrapht for other projects so it is not a bad "graphing" package for network diagrams. It could be refactored out.
 
Not sure how to tackle the org.biojava.bio.program package as it seems to have lots of distinct functional code.
 
biojava-ws-blast - A web service approach to doing blast. The api would hide the web services call 
 
biojava-blast - Blast parsing code. We could have one package for anything blast related
 
biojava-ws-clustalw - A web services approach to doing clustalw multiple sequence alignment The api would hide the web services call 
 
biojava-alignment - Code for doing sequence alignment. We could have one package for anything alignment related
 
Does anyone know if you can get usage statistics from maven as to what jar files are being downloaded? This would help provide good statistics on what code is being used which will help focus on improvements in documentation etc.
 
I assume we are going to make Java 1.6 the minimum requirement moving forward? This simplifies some of the web services api requirements to minimize the number of external packages that are required. 
 
 
Scooter
 
 
________________________________

From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic
Sent: Mon 5/25/2009 12:22 AM
To: biojava-dev at lists.open-bio.org
Subject: [Biojava-dev] next steps


Hi,

While talking about design requirements, I think we also need to think
pragmatically about how much time we will have to refactor code vs.
re-writing modules from scratch. To get started with the next steps, I
 suggest the following procedure: First thing will be to move to
Maven. Then components should be refactored into independent
sub-modules. Then the submodules can get improved to follow the new
design guidelines. Once we have reached a certain stability with the
re-organized code base, we will make the next release.

Any comments? If there is general agreement about this, I would take
the next step and replace the ant build system with a maven based one.

Andreas
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From msmoot at ucsd.edu  Mon May 25 13:07:57 2009
From: msmoot at ucsd.edu (Mike Smoot)
Date: Mon, 25 May 2009 10:07:57 -0700
Subject: [Biojava-dev] next steps
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> 
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>

On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:

>
> I was looking at the biojava code yesterday to see how easy it would be to
> divide up into functionally grouped jars based on package hierarchy. I tried
> to find some refactoring tools that would give a network graph view of class
> relationships. It is simple enough to parse source for import statements and
> build some sort of graph relationship tool. It is also easy enough to start
> dragging packages around to different projects in netbeans and resolve
> compiler errors.
>

JDepend is a nice tool for evaluating package dependencies.

http://www.clarkware.com/software/JDepend.html


Mike

-- 
____________________________________________________________
Michael Smoot, Ph.D.               Bioengineering Department
tel: 858-822-4756         University of California San Diego

From HWillis at scripps.edu  Mon May 25 18:59:10 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Mon, 25 May 2009 18:59:10 -0400
Subject: [Biojava-dev] next steps
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>

I attached the JDepend output for BioJava. This will help on the circular dependencies where core classes should not have dependencies on other packages and if they do it should be refactored into the core class.
 
Scooter

________________________________

From: mike.smoot at gmail.com on behalf of Mike Smoot
Sent: Mon 5/25/2009 1:07 PM
To: Scooter Willis
Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
Subject: Re: [Biojava-dev] next steps


On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:


	I was looking at the biojava code yesterday to see how easy it would be to divide up into functionally grouped jars based on package hierarchy. I tried to find some refactoring tools that would give a network graph view of class relationships. It is simple enough to parse source for import statements and build some sort of graph relationship tool. It is also easy enough to start dragging packages around to different projects in netbeans and resolve compiler errors.
	

JDepend is a nice tool for evaluating package dependencies.

http://www.clarkware.com/software/JDepend.html


Mike

-- 
____________________________________________________________
Michael Smoot, Ph.D.               Bioengineering Department
tel: 858-822-4756         University of California San Diego

-------------- next part --------------
A non-text attachment was scrubbed...
Name: report.xml
Type: text/xml
Size: 567706 bytes
Desc: report.xml
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20090525/489118b8/attachment-0001.xml>

From andreas at sdsc.edu  Thu May 28 00:31:15 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 27 May 2009 21:31:15 -0700
Subject: [Biojava-dev] next steps
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>

Hi Scooter,

quick update: There is also an eclipse plugin for JDepend, that
provides a user interface to browse thought the dependencies.

As I already mentioned earlier, I had some quick progress with the
maven plugin to convert the project to maven and create a first pom.
At the moment I am testing how  best to create  sub-projects that
should contain the modules.  The plugin does not seem to make it easy
to create new modules, so I agree with your earlier suggestion that it
is best to modularize first and the mavenize 2nd... Should we create a
branch in svn and play around with refactoring there and once we are
happy with it we can switch that branch to become the trunk?

Andreas


On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
> I attached the JDepend output for BioJava. This will help on the circular
> dependencies where core classes should not have dependencies on other
> packages and if they do it should be refactored into the core class.
>
> Scooter
> ________________________________
> From: mike.smoot at gmail.com on behalf of Mike Smoot
> Sent: Mon 5/25/2009 1:07 PM
> To: Scooter Willis
> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
> Subject: Re: [Biojava-dev] next steps
>
>
>
> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>
>> I was looking at the biojava code yesterday to see how easy it would be to
>> divide up into functionally grouped jars based on package hierarchy. I tried
>> to find some refactoring tools that would give a network graph view of class
>> relationships. It is simple enough to parse source for import statements and
>> build some sort of graph relationship tool. It is also easy enough to start
>> dragging packages around to different projects in netbeans and resolve
>> compiler errors.
>
> JDepend is a nice tool for evaluating package dependencies.
>
> http://www.clarkware.com/software/JDepend.html
>
>
> Mike
>
> --
> ____________________________________________________________
> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department
> tel: 858-822-4756 ? ? ? ? University of California San Diego
>


From juberpatel at gmail.com  Thu May 28 03:09:29 2009
From: juberpatel at gmail.com (juber patel)
Date: Thu, 28 May 2009 12:39:29 +0530
Subject: [Biojava-dev] next steps
In-Reply-To: <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
Message-ID: <f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com>

just a small observation:

Maven may not be easy to use and switch to maven should be done after
some consideration. I have personally not used it, but have seen
people on the Mahout list struggling with maven. Its utility may not
justify its complexity.

juber


On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
> Hi Scooter,
>
> quick update: There is also an eclipse plugin for JDepend, that
> provides a user interface to browse thought the dependencies.
>
> As I already mentioned earlier, I had some quick progress with the
> maven plugin to convert the project to maven and create a first pom.
> At the moment I am testing how ?best to create ?sub-projects that
> should contain the modules. ?The plugin does not seem to make it easy
> to create new modules, so I agree with your earlier suggestion that it
> is best to modularize first and the mavenize 2nd... Should we create a
> branch in svn and play around with refactoring there and once we are
> happy with it we can switch that branch to become the trunk?
>
> Andreas
>
>
>
>
> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
>> I attached the JDepend output for BioJava. This will help on the circular
>> dependencies where core classes should not have dependencies on other
>> packages and if they do it should be refactored into the core class.
>>
>> Scooter
>> ________________________________
>> From: mike.smoot at gmail.com on behalf of Mike Smoot
>> Sent: Mon 5/25/2009 1:07 PM
>> To: Scooter Willis
>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
>> Subject: Re: [Biojava-dev] next steps
>>
>>
>>
>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>>
>>> I was looking at the biojava code yesterday to see how easy it would be to
>>> divide up into functionally grouped jars based on package hierarchy. I tried
>>> to find some refactoring tools that would give a network graph view of class
>>> relationships. It is simple enough to parse source for import statements and
>>> build some sort of graph relationship tool. It is also easy enough to start
>>> dragging packages around to different projects in netbeans and resolve
>>> compiler errors.
>>
>> JDepend is a nice tool for evaluating package dependencies.
>>
>> http://www.clarkware.com/software/JDepend.html
>>
>>
>> Mike
>>
>> --
>> ____________________________________________________________
>> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department
>> tel: 858-822-4756 ? ? ? ? University of California San Diego
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Juber Patel        http://juberpatel.googlepages.com


From holland at eaglegenomics.com  Thu May 28 02:55:28 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 28 May 2009 07:55:28 +0100
Subject: [Biojava-dev] next steps
In-Reply-To: <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
Message-ID: <1243493728.5260.1.camel@buzzybee>

I found when creating modules for the testbed biojava3 that it was
easier to do it by hand.

Only two things need to be done - first of all a list of modules needs
to be added to the parent pom.xml of the project, then each module has
its own pom.xml referencing the parent pom.xml.

Once created this way it only takes a project refresh in
Eclipse/NetBeans for the new module to show up.

See the example pom.xmls under the old biojava3 branch for details.

cheers,
Richard

On Wed, 2009-05-27 at 21:31 -0700, Andreas Prlic wrote:
> Hi Scooter,
> 
> quick update: There is also an eclipse plugin for JDepend, that
> provides a user interface to browse thought the dependencies.
> 
> As I already mentioned earlier, I had some quick progress with the
> maven plugin to convert the project to maven and create a first pom.
> At the moment I am testing how  best to create  sub-projects that
> should contain the modules.  The plugin does not seem to make it easy
> to create new modules, so I agree with your earlier suggestion that it
> is best to modularize first and the mavenize 2nd... Should we create a
> branch in svn and play around with refactoring there and once we are
> happy with it we can switch that branch to become the trunk?
> 
> Andreas
> 
> 
> 
> 
> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
> > I attached the JDepend output for BioJava. This will help on the circular
> > dependencies where core classes should not have dependencies on other
> > packages and if they do it should be refactored into the core class.
> >
> > Scooter
> > ________________________________
> > From: mike.smoot at gmail.com on behalf of Mike Smoot
> > Sent: Mon 5/25/2009 1:07 PM
> > To: Scooter Willis
> > Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
> > Subject: Re: [Biojava-dev] next steps
> >
> >
> >
> > On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> >>
> >> I was looking at the biojava code yesterday to see how easy it would be to
> >> divide up into functionally grouped jars based on package hierarchy. I tried
> >> to find some refactoring tools that would give a network graph view of class
> >> relationships. It is simple enough to parse source for import statements and
> >> build some sort of graph relationship tool. It is also easy enough to start
> >> dragging packages around to different projects in netbeans and resolve
> >> compiler errors.
> >
> > JDepend is a nice tool for evaluating package dependencies.
> >
> > http://www.clarkware.com/software/JDepend.html
> >
> >
> > Mike
> >
> > --
> > ____________________________________________________________
> > Michael Smoot, Ph.D.               Bioengineering Department
> > tel: 858-822-4756         University of California San Diego
> >
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From ayates at ebi.ac.uk  Thu May 28 04:16:05 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Thu, 28 May 2009 09:16:05 +0100
Subject: [Biojava-dev] next steps
In-Reply-To: <f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
	<f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com>
Message-ID: <4A1E4845.8080906@ebi.ac.uk>

Maven's big plus points are easy integration into just about any IDE &
its transitive dependency management capability. On a project like
BioJava (need people to get setup & running quickly over a wide range of
development environments) these two points really make it one of the
only viable choices I can would use. This isn't to say the other build
systems are not as good/better (rake, raven, gant, gradle, ant) just
they do not fit the bill as well.

Andy

juber patel wrote:
> just a small observation:
> 
> Maven may not be easy to use and switch to maven should be done after
> some consideration. I have personally not used it, but have seen
> people on the Mahout list struggling with maven. Its utility may not
> justify its complexity.
> 
> juber
> 
> 
> On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>> Hi Scooter,
>>
>> quick update: There is also an eclipse plugin for JDepend, that
>> provides a user interface to browse thought the dependencies.
>>
>> As I already mentioned earlier, I had some quick progress with the
>> maven plugin to convert the project to maven and create a first pom.
>> At the moment I am testing how  best to create  sub-projects that
>> should contain the modules.  The plugin does not seem to make it easy
>> to create new modules, so I agree with your earlier suggestion that it
>> is best to modularize first and the mavenize 2nd... Should we create a
>> branch in svn and play around with refactoring there and once we are
>> happy with it we can switch that branch to become the trunk?
>>
>> Andreas
>>
>>
>>
>>
>> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
>>> I attached the JDepend output for BioJava. This will help on the circular
>>> dependencies where core classes should not have dependencies on other
>>> packages and if they do it should be refactored into the core class.
>>>
>>> Scooter
>>> ________________________________
>>> From: mike.smoot at gmail.com on behalf of Mike Smoot
>>> Sent: Mon 5/25/2009 1:07 PM
>>> To: Scooter Willis
>>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
>>> Subject: Re: [Biojava-dev] next steps
>>>
>>>
>>>
>>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>>> I was looking at the biojava code yesterday to see how easy it would be to
>>>> divide up into functionally grouped jars based on package hierarchy. I tried
>>>> to find some refactoring tools that would give a network graph view of class
>>>> relationships. It is simple enough to parse source for import statements and
>>>> build some sort of graph relationship tool. It is also easy enough to start
>>>> dragging packages around to different projects in netbeans and resolve
>>>> compiler errors.
>>> JDepend is a nice tool for evaluating package dependencies.
>>>
>>> http://www.clarkware.com/software/JDepend.html
>>>
>>>
>>> Mike
>>>
>>> --
>>> ____________________________________________________________
>>> Michael Smoot, Ph.D.               Bioengineering Department
>>> tel: 858-822-4756         University of California San Diego
>>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> 
> 
> 

From james at carmanconsulting.com  Thu May 28 05:37:53 2009
From: james at carmanconsulting.com (James Carman)
Date: Thu, 28 May 2009 05:37:53 -0400
Subject: [Biojava-dev] next steps
In-Reply-To: <f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> 
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> 
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com> 
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> 
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> 
	<f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com>
Message-ID: <f2e8eedf0905280237nb4a4940ydbee0e143b22a0ae@mail.gmail.com>

Maven really isn't that hard.  I have no idea what the Mahout folks
are having troubles with, but I'm sure it can be addressed.  Maven't
benefits greatly outweigh its complexity (which isn't that high,
IMHO).  If you folks want a hand "mavenizing" your project, I wouldn't
mind helping.

On Thu, May 28, 2009 at 3:09 AM, juber patel <juberpatel at gmail.com> wrote:
> just a small observation:
>
> Maven may not be easy to use and switch to maven should be done after
> some consideration. I have personally not used it, but have seen
> people on the Mahout list struggling with maven. Its utility may not
> justify its complexity.
>
> juber
>
>
> On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>> Hi Scooter,
>>
>> quick update: There is also an eclipse plugin for JDepend, that
>> provides a user interface to browse thought the dependencies.
>>
>> As I already mentioned earlier, I had some quick progress with the
>> maven plugin to convert the project to maven and create a first pom.
>> At the moment I am testing how ?best to create ?sub-projects that
>> should contain the modules. ?The plugin does not seem to make it easy
>> to create new modules, so I agree with your earlier suggestion that it
>> is best to modularize first and the mavenize 2nd... Should we create a
>> branch in svn and play around with refactoring there and once we are
>> happy with it we can switch that branch to become the trunk?
>>
>> Andreas
>>
>>
>>
>>
>> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
>>> I attached the JDepend output for BioJava. This will help on the circular
>>> dependencies where core classes should not have dependencies on other
>>> packages and if they do it should be refactored into the core class.
>>>
>>> Scooter
>>> ________________________________
>>> From: mike.smoot at gmail.com on behalf of Mike Smoot
>>> Sent: Mon 5/25/2009 1:07 PM
>>> To: Scooter Willis
>>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
>>> Subject: Re: [Biojava-dev] next steps
>>>
>>>
>>>
>>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>>>
>>>> I was looking at the biojava code yesterday to see how easy it would be to
>>>> divide up into functionally grouped jars based on package hierarchy. I tried
>>>> to find some refactoring tools that would give a network graph view of class
>>>> relationships. It is simple enough to parse source for import statements and
>>>> build some sort of graph relationship tool. It is also easy enough to start
>>>> dragging packages around to different projects in netbeans and resolve
>>>> compiler errors.
>>>
>>> JDepend is a nice tool for evaluating package dependencies.
>>>
>>> http://www.clarkware.com/software/JDepend.html
>>>
>>>
>>> Mike
>>>
>>> --
>>> ____________________________________________________________
>>> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department
>>> tel: 858-822-4756 ? ? ? ? University of California San Diego
>>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>
> --
> Juber Patel ? ? ? ?http://juberpatel.googlepages.com
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From HWillis at scripps.edu  Thu May 28 09:10:43 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Thu, 28 May 2009 09:10:43 -0400
Subject: [Biojava-dev] next steps
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
	<f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com>
	<f2e8eedf0905280237nb4a4940ydbee0e143b22a0ae@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C861@FLMAIL1.fl.ad.scripps.edu>

Maven should be viewed as an additional option for developers where once a version of BioJava is released the Maven repository is updated and we need to make sure we have all the meta-data/dependency information correct. This doesn't mean that BioJava development needs to be done in Maven but simply is another way to get the jars after they have been released. BioJava as a single Jar is not that hard to integrate into your project given that we have a handful of external jars files that  we provide as part of the download. For other projects I have worked with where they only package the jar for that project and then give you web links to download 10 other external projects then that is a pain. You go to each website to figure out the download process and find that they are now all in different releases then Maven is a great solution because the developers of biojava took the time to get the exact version of jar files from external packages referenced properly and did not leave it to the "customer" to figure out.
 
If we use apache commons as a model I personally would rather grab the package of interest say biojava-blast and add into my development environment. Maven is an Apache project yet when you go to http://commons.apache.org/ and grab the component of interest Maven isn't even listed as an option. This is probably because it is an overkill for a single jar. Doesn't mean that you can't get commons jar's via maven when you load a larger project.  
 
In our case we may have a couple components where it can get a little complicated by external jar dependencies. Using biojava-blast as an example where it has a web service client that is either using axis or the latest greatest sun JSR. The project I am importing biojava-blast via Maven into already uses axis but an older version because everything works and I haven't needed to  do the upgrade. Maven may make the integration step easier but it doesn't solve the problem where I as the developer now need to do  something to resolve the version conflicts. 
 
So I view Maven as a nice option for developers who are a big fan of Maven and makes them smile when they can grab the code they need from BioJava via Maven. We should plan on having an apache commons like page to download and publish the jars in maven as well.
 
Scooter

________________________________

From: biojava-dev-bounces at lists.open-bio.org on behalf of James Carman
Sent: Thu 5/28/2009 5:37 AM
To: biojava-dev at lists.open-bio.org
Subject: Re: [Biojava-dev] next steps


Maven really isn't that hard.  I have no idea what the Mahout folks
are having troubles with, but I'm sure it can be addressed.  Maven't
benefits greatly outweigh its complexity (which isn't that high,
IMHO).  If you folks want a hand "mavenizing" your project, I wouldn't
mind helping.

On Thu, May 28, 2009 at 3:09 AM, juber patel <juberpatel at gmail.com> wrote:
> just a small observation:
>
> Maven may not be easy to use and switch to maven should be done after
> some consideration. I have personally not used it, but have seen
> people on the Mahout list struggling with maven. Its utility may not
> justify its complexity.
>
> juber
>
>
> On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>> Hi Scooter,
>>
>> quick update: There is also an eclipse plugin for JDepend, that
>> provides a user interface to browse thought the dependencies.
>>
>> As I already mentioned earlier, I had some quick progress with the
>> maven plugin to convert the project to maven and create a first pom.
>> At the moment I am testing how  best to create  sub-projects that
>> should contain the modules.  The plugin does not seem to make it easy
>> to create new modules, so I agree with your earlier suggestion that it
>> is best to modularize first and the mavenize 2nd... Should we create a
>> branch in svn and play around with refactoring there and once we are
>> happy with it we can switch that branch to become the trunk?
>>
>> Andreas
>>
>>
>>
>>
>> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
>>> I attached the JDepend output for BioJava. This will help on the circular
>>> dependencies where core classes should not have dependencies on other
>>> packages and if they do it should be refactored into the core class.
>>>
>>> Scooter
>>> ________________________________
>>> From: mike.smoot at gmail.com on behalf of Mike Smoot
>>> Sent: Mon 5/25/2009 1:07 PM
>>> To: Scooter Willis
>>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
>>> Subject: Re: [Biojava-dev] next steps
>>>
>>>
>>>
>>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>>>
>>>> I was looking at the biojava code yesterday to see how easy it would be to
>>>> divide up into functionally grouped jars based on package hierarchy. I tried
>>>> to find some refactoring tools that would give a network graph view of class
>>>> relationships. It is simple enough to parse source for import statements and
>>>> build some sort of graph relationship tool. It is also easy enough to start
>>>> dragging packages around to different projects in netbeans and resolve
>>>> compiler errors.
>>>
>>> JDepend is a nice tool for evaluating package dependencies.
>>>
>>> http://www.clarkware.com/software/JDepend.html
>>>
>>>
>>> Mike
>>>
>>> --
>>> ____________________________________________________________
>>> Michael Smoot, Ph.D.               Bioengineering Department
>>> tel: 858-822-4756         University of California San Diego
>>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>
> --
> Juber Patel        http://juberpatel.googlepages.com <http://juberpatel.googlepages.com/> 
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From HWillis at scripps.edu  Thu May 28 09:37:27 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Thu, 28 May 2009 09:37:27 -0400
Subject: [Biojava-dev] BioJava BLAST web services
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C863@FLMAIL1.fl.ad.scripps.edu>


I am planning on doing some testing of  a couple BLAST web services interfaces(assuming more than one exists) and see what they truly have in common and see how that would impact a BJ3 front end to multiple providers. My assumption is that they will be the same. I noticed on the NCBI Blast implementations the user was required to pass their email address as part of the web service call. They are concerned with abuse from external processes and they only allow one sequence per request. Same-Same but different is always fun!

>From wikipedia the following are listed as BLAST resources where more than one may offer a web service interface. Should BioJava3 try and support more than one?

Thanks

Scooter


Variations of BLAST


*	WU-BLAST <http://blast.wustl.edu/>  - the original gapping BLAST with statistics, developed and maintained by Warren Gish at Washington University in St. Louis <http://en.wikipedia.org/wiki/Washington_University_in_St._Louis> 
*	EBI's BLAST Services <http://www.ebi.ac.uk/Tools/blast>  - EBI's <http://en.wikipedia.org/wiki/European_Bioinformatics_Institute>  main blast services page.
*	FSA-BLAST <http://www.fsa-blast.org/>  - a new, faster but still accurate version of NCBI BLAST based on recently published algorithmic improvements
*	NBIC mpiBLAST <http://services.nbic.nl:4080/bb/cgi-bin/bb_login.cgi>  - at the Netherlands Bioinformatics Centre
*	Parallel BLAST <http://www-users.cs.umn.edu/~rangwala/final_bglBLAST.pdf>  - a dual scheduling BLAST tested on the Blue Gene/L
*	mpiBLAST <http://www.mpiblast.org/>  - open-source parallel BLAST
*	A/G BLAST <http://developer.apple.com/darwin/projects/blast/>  - implementation for PowerPC G4/G5 processors and Mac OS X, from Apple Computer <http://en.wikipedia.org/wiki/Apple_Computer> 's Advanced Computation Group <http://en.wikipedia.org/wiki/Advanced_Computation_Group>  and Genentech <http://en.wikipedia.org/wiki/Genentech> .
*	STRAP <http://3d-alignment.eu/>  - the protein workbench STRAP <http://www.charite.de/bioinf/strap/>  contains a comfortable BLAST front-end with a cache for BLAST results


[edit <http://en.wikipedia.org/w/index.php?title=BLAST&action=edit&section=13> ] Commercial versions


*	ThermoBLAST by DNA Software Inc. <http://dnasoftware.com/ThermoBLAST/tabid/110/Default.aspx>  - scans entire genomes quickly and accurately combing the power of BLAST with the most advanced thermodynamics parameters
*	PatternHunter <http://www.bioinformaticssolutions.com/products/ph/index.php>  - an alternative software which provides similar functionality to BLAST while claiming increased speed and sensitivity
*	KoriBlast <http://www.korilog.com/products>  - a reliable graphical environment dedicated to sequence data mining. KoriBlast combines Blast searches with advanced data management capabilities and a state-of-the-art graphical user interface.
*	microbial identification BLAST <http://www.sepsitest-blast.de/>  - a quality controlled database for in-vitro diagnostics. SepsiTest combines broad-range-PCR using ultra-pure reagents with Blast searches in a quality controlled environment.


From james at carmanconsulting.com  Thu May 28 09:45:23 2009
From: james at carmanconsulting.com (James Carman)
Date: Thu, 28 May 2009 09:45:23 -0400
Subject: [Biojava-dev] next steps
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C861@FLMAIL1.fl.ad.scripps.edu>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> 
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> 
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com> 
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> 
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> 
	<f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com> 
	<f2e8eedf0905280237nb4a4940ydbee0e143b22a0ae@mail.gmail.com> 
	<061BFD133FA1584693D19C79A0072F5F76C861@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <f2e8eedf0905280645u480a5500xad575a84fcf54caf@mail.gmail.com>

I would say that you should use the Apache Commons projects as a model
(I'm an Apache Commons PMC member, so I'm a bit biased).  The
maven-generated site will include information on the dependencies
(including whether they are optional and where you can get them
provided the other projects play nicely and include that information).
 And, yes, when you *do* use Maven, it will download all required
transitive dependencies for you and add it to your classpath
automagically.  That's why it's so nice.  Well, that's one of the MANY
reasons it's so nice.  The release plugin also saves a LOT of
headaches, if you ask me (once you get it configured properly).

On Thu, May 28, 2009 at 9:10 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Maven should be viewed as an additional option for developers where once a
> version of BioJava is released the Maven repository is updated and we need
> to make sure we have all the meta-data/dependency information correct. This
> doesn't mean that BioJava development needs to be done in Maven but simply
> is another way to get the jars after they have been released. BioJava as a
> single Jar is not that hard to integrate into your project given that we
> have a handful of external jars files that? we provide as part of the
> download. For other projects I have worked with where they only package the
> jar for that project and then give you web links to download 10 other
> external projects then that is a pain.?You go to each website to figure out
> the download process and find that they are now all in different releases
> then Maven is a great solution because the developers of biojava took the
> time to get the exact version of jar files from external packages referenced
> properly and did not leave it to the "customer" to figure out.
>
> If we use apache commons as a model I personally?would rather grab the
> package of interest say biojava-blast and add into my development
> environment. Maven is an Apache project yet when you go to
> http://commons.apache.org/?and?grab the component of interest Maven isn't
> even listed as an option. This is probably because it is an overkill for a
> single?jar. Doesn't mean that you can't get?commons?jar's via maven when you
> load a larger project.
>
> In our case we may have a couple components where it can get a little
> complicated by external jar dependencies. Using biojava-blast as an example
> where it?has a web service client that is either using axis or the latest
> greatest sun JSR. The project I am importing biojava-blast via Maven into
> already uses axis but an older version because everything works and I
> haven't needed to? do the upgrade. Maven may make the integration step
> easier but it doesn't solve the problem where I as the developer now need to
> do? something to resolve the version conflicts.
>
> So I view Maven as a nice option for developers who are a big fan of Maven
> and makes them smile when they can grab the code they need from BioJava via
> Maven. We should plan on having an apache commons like page to download and
> publish the jars in maven as well.
>
> Scooter
> ________________________________
> From: biojava-dev-bounces at lists.open-bio.org on behalf of James Carman
> Sent: Thu 5/28/2009 5:37 AM
> To: biojava-dev at lists.open-bio.org
> Subject: Re: [Biojava-dev] next steps
>
> Maven really isn't that hard.? I have no idea what the Mahout folks
> are having troubles with, but I'm sure it can be addressed.? Maven't
> benefits greatly outweigh its complexity (which isn't that high,
> IMHO).? If you folks want a hand "mavenizing" your project, I wouldn't
> mind helping.
>
> On Thu, May 28, 2009 at 3:09 AM, juber patel <juberpatel at gmail.com> wrote:
>> just a small observation:
>>
>> Maven may not be easy to use and switch to maven should be done after
>> some consideration. I have personally not used it, but have seen
>> people on the Mahout list struggling with maven. Its utility may not
>> justify its complexity.
>>
>> juber
>>
>>
>> On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>> Hi Scooter,
>>>
>>> quick update: There is also an eclipse plugin for JDepend, that
>>> provides a user interface to browse thought the dependencies.
>>>
>>> As I already mentioned earlier, I had some quick progress with the
>>> maven plugin to convert the project to maven and create a first pom.
>>> At the moment I am testing how ?best to create ?sub-projects that
>>> should contain the modules. ?The plugin does not seem to make it easy
>>> to create new modules, so I agree with your earlier suggestion that it
>>> is best to modularize first and the mavenize 2nd... Should we create a
>>> branch in svn and play around with refactoring there and once we are
>>> happy with it we can switch that branch to become the trunk?
>>>
>>> Andreas
>>>
>>>
>>>
>>>
>>> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu>
>>> wrote:
>>>> I attached the JDepend output for BioJava. This will help on the
>>>> circular
>>>> dependencies where core classes should not have dependencies on other
>>>> packages and if they do it should be refactored into the core class.
>>>>
>>>> Scooter
>>>> ________________________________
>>>> From: mike.smoot at gmail.com on behalf of Mike Smoot
>>>> Sent: Mon 5/25/2009 1:07 PM
>>>> To: Scooter Willis
>>>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
>>>> Subject: Re: [Biojava-dev] next steps
>>>>
>>>>
>>>>
>>>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu>
>>>> wrote:
>>>>>
>>>>> I was looking at the biojava code yesterday to see how easy it would be
>>>>> to
>>>>> divide up into functionally grouped jars based on package hierarchy. I
>>>>> tried
>>>>> to find some refactoring tools that would give a network graph view of
>>>>> class
>>>>> relationships. It is simple enough to parse source for import
>>>>> statements and
>>>>> build some sort of graph relationship tool. It is also easy enough to
>>>>> start
>>>>> dragging packages around to different projects in netbeans and resolve
>>>>> compiler errors.
>>>>
>>>> JDepend is a nice tool for evaluating package dependencies.
>>>>
>>>> http://www.clarkware.com/software/JDepend.html
>>>>
>>>>
>>>> Mike
>>>>
>>>> --
>>>> ____________________________________________________________
>>>> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department
>>>> tel: 858-822-4756 ? ? ? ? University of California San Diego
>>>>
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>
>>
>>
>> --
>> Juber Patel ? ? ? ?http://juberpatel.googlepages.com
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From andreas at sdsc.edu  Thu May 28 12:53:33 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 28 May 2009 09:53:33 -0700
Subject: [Biojava-dev] hierarchical vs flat module organisation
Message-ID: <59a41c430905280953w964ab36q7baf1fd5eb21e62a@mail.gmail.com>

Hi,

from the different posts it seems there are two types of suggestions
for how to organize modules: hierarchical vs. flat.

I wonder if the best way to organize this is to mix the designs. There
could be few top-level modules like core, webservices, phylo,
structure. These would be equivalent to projects in the workspace.
These can then contain-submodules like

webservices-blast-ebi
webservices-blast-ncbi
webservices-whatever

or
structure-core
structure-viewers

The submodules would be sub-folders in the projects.

Any thoughts on that?

Andreas

From HWillis at scripps.edu  Thu May 28 14:09:32 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Thu, 28 May 2009 14:09:32 -0400
Subject: [Biojava-dev] hierarchical vs flat module organisation
References: <59a41c430905280953w964ab36q7baf1fd5eb21e62a@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C867@FLMAIL1.fl.ad.scripps.edu>

Andreas
 
I think the organization should make the most sense to the user of BioJava and should be functionally grouped. I show up looking for specific biology algorithms/code. Blast, Sequence Alignment, Tree construction etc. In that module I would then find different features that I can then explore to solve the problem. The question becomes would I pick a module based on how it solved the problem. Given that BioJava does not have a native solution do to BLAST nor does the developer want to deal with all the configuration the BLAST-web services call simply becomes the only choice. The results of parsing a BLAST output and making a BLAST web service call should be the same structured result where I would then use other BioJava api's against the results.
 
I think we should group by function an that gives the developer a collection of tools to work with.
 
Scooter

________________________________

From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic
Sent: Thu 5/28/2009 12:53 PM
To: biojava-dev
Subject: [Biojava-dev] hierarchical vs flat module organisation


Hi,

from the different posts it seems there are two types of suggestions
for how to organize modules: hierarchical vs. flat.

I wonder if the best way to organize this is to mix the designs. There
could be few top-level modules like core, webservices, phylo,
structure. These would be equivalent to projects in the workspace.
These can then contain-submodules like

webservices-blast-ebi
webservices-blast-ncbi
webservices-whatever

or
structure-core
structure-viewers

The submodules would be sub-folders in the projects.

Any thoughts on that?

Andreas
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From HWillis at scripps.edu  Thu May 28 13:57:27 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Thu, 28 May 2009 13:57:27 -0400
Subject: [Biojava-dev]  next steps
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com><061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu><f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com><061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C864@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C866@FLMAIL1.fl.ad.scripps.edu>


Andreas
 
I think each jar probably needs its own svn trunk. This is how apache commons is setup. The advantage of this is that everything is modularized with nice defined boundaries on dependencies. If you have once source tree that builds multiple jars then it becomes very easy to grab a class from another jar and forcing additional dependencies.
 
You also don't need to worry about a single user having access to the entire source tree. If you have a new developer who wants to get involved with a specific interest then easy to give him access to that package without worrying about breaking other packages.
 
Do you think we should call the functional grouping packages or modules or something else?
 
If you take a wack at the refactoring based on X number of modules then you could check each one in a different subversion trunk. Each module will probably have a dependency on biojava-core which will also be a separate subversion trunk. In Netbeans I would setup a project for each and then I can add the biojava-core project as an external project dependency. This also allows each module to be released independently and more frequently. We probably need to come up with a versioning convention that is part of the jar name. Not sure if any of the ant build tools automate the upticking of major/minor version number when packaging jars.
 
For the user of biojava they would download a single jar for the module of interest where the download contains all the external jars that are required including biojava-core. For maven that would be done via POM.
 
As part of the refactoring now is the time to make any major namespace changes you want to make. I assume that eclipse refactoring makes this easy. Check all the code in and BioJava3 has begun!
 
Scooter

________________________________

From: andreas.prlic at gmail.com on behalf of Andreas Prlic
Sent: Thu 5/28/2009 12:31 AM
To: Scooter Willis
Cc: biojava-dev
Subject: Re: [Biojava-dev] next steps


Hi Scooter,

quick update: There is also an eclipse plugin for JDepend, that
provides a user interface to browse thought the dependencies.

As I already mentioned earlier, I had some quick progress with the
maven plugin to convert the project to maven and create a first pom.
At the moment I am testing how  best to create  sub-projects that
should contain the modules.  The plugin does not seem to make it easy
to create new modules, so I agree with your earlier suggestion that it
is best to modularize first and the mavenize 2nd... Should we create a
branch in svn and play around with refactoring there and once we are
happy with it we can switch that branch to become the trunk?

Andreas


On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
> I attached the JDepend output for BioJava. This will help on the circular
> dependencies where core classes should not have dependencies on other
> packages and if they do it should be refactored into the core class.
>
> Scooter
> ________________________________
> From: mike.smoot at gmail.com on behalf of Mike Smoot
> Sent: Mon 5/25/2009 1:07 PM
> To: Scooter Willis
> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
> Subject: Re: [Biojava-dev] next steps
>
>
>
> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>
>> I was looking at the biojava code yesterday to see how easy it would be to
>> divide up into functionally grouped jars based on package hierarchy. I tried
>> to find some refactoring tools that would give a network graph view of class
>> relationships. It is simple enough to parse source for import statements and
>> build some sort of graph relationship tool. It is also easy enough to start
>> dragging packages around to different projects in netbeans and resolve
>> compiler errors.
>
> JDepend is a nice tool for evaluating package dependencies.
>
> http://www.clarkware.com/software/JDepend.html
>
>
> Mike
>
> --
> ____________________________________________________________
> Michael Smoot, Ph.D.               Bioengineering Department
> tel: 858-822-4756         University of California San Diego
>


From andreas.prlic at gmail.com  Fri May 29 00:53:22 2009
From: andreas.prlic at gmail.com (Andreas Prlic)
Date: Thu, 28 May 2009 21:53:22 -0700
Subject: [Biojava-dev] next steps
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C866@FLMAIL1.fl.ad.scripps.edu>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C864@FLMAIL1.fl.ad.scripps.edu>
	<061BFD133FA1584693D19C79A0072F5F76C866@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <59a41c430905282153r5c82b7cfp1648807b6042eaf5@mail.gmail.com>

> I think each jar probably needs its own svn trunk. This is how apache
> commons is setup. The advantage of this is that everything is modularized
> with nice defined boundaries on dependencies. If you have once source tree
> that builds multiple jars then it becomes very easy to grab a class from
> another jar and forcing additional dependencies.

sounds good.  Guess it might be good not  to have too many .jar files
in the end as well.

> You also don't need to worry about a single user having access to the entire
> source tree. If you have a new developer who wants to get involved with a
> specific interest then easy to give him access to that package without
> worrying about breaking other packages.

might be useful in the future. For now I think it is good enough to
give developers write  access to all of biojava.


>
> Do you think we should call the functional grouping packages or modules or
> something else?

What about: we call a toplevel project, a package. A package can then
consist of several modules. Not sure if we should have a jar per
package or per module.


> If you take a wack at the refactoring based on X number of modules then you
> could check each one in a different subversion trunk. Each module will
> probably have a dependency on biojava-core which will also be a separate
> subversion trunk. In Netbeans I would setup a project for each and then I
> can add the biojava-core project as an external project dependency.

Sounds good and you would do the same in eclipse.

This
> also allows each module to be released independently and more frequently. We
> probably need to come up with a versioning convention that is part of the
> jar name.

I think we should stick to the  maven naming conventions.
http://maven.apache.org/guides/mini/guide-naming-conventions.html
e.g.
groupId org.biojava.phylo for the phylogenetic package
artifactId biojava-phylo
version 3.0.0  or 3.0.0-SNAPSHOT if it is a nightly build


Not sure if any of the ant build tools automate the upticking of
> major/minor version number when packaging jars.

Not sure about ant, but maven has a built in release plugin.  if it is
set up correctly you can just write
mvn release:prepare
and the release is being prepared...


> As part of the refactoring now is the time to make any major namespace
> changes you want to make. I assume that eclipse refactoring makes this easy.

Namespace changes are tricky. In principle I don;t want to break
backwards compatibility with the existing code base. On the other side
having package names starting with org.biojava.structure, rather than
org.biojava.bio.structure would be simpler. If in doubt I am for
backwards compatibility. One case where I would like to see a change
is the core blast parsing modules. org.biojava.bio.program.sax does
not indicate at all that this has to do with blast.

Andreas

From heuermh at acm.org  Fri May 29 12:29:04 2009
From: heuermh at acm.org (Michael Heuer)
Date: Fri, 29 May 2009 12:29:04 -0400 (EDT)
Subject: [Biojava-dev] next steps
In-Reply-To: <59a41c430905282153r5c82b7cfp1648807b6042eaf5@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.0905291225190.13945-100000@shell3.shore.net>

Andreas Prlic wrote:

> > I think each jar probably needs its own svn trunk. This is how apache
> > commons is setup. The advantage of this is that everything is modularized
> > with nice defined boundaries on dependencies. If you have once source tree
> > that builds multiple jars then it becomes very easy to grab a class from
> > another jar and forcing additional dependencies.
>
> sounds good.  Guess it might be good not  to have too many .jar files
> in the end as well.
>
> > You also don't need to worry about a single user having access to the entire
> > source tree. If you have a new developer who wants to get involved with a
> > specific interest then easy to give him access to that package without
> > worrying about breaking other packages.
>
> might be useful in the future. For now I think it is good enough to
> give developers write  access to all of biojava.
>
>
> >
> > Do you think we should call the functional grouping packages or modules or
> > something else?
>
> What about: we call a toplevel project, a package. A package can then
> consist of several modules. Not sure if we should have a jar per
> package or per module.
>
>
> > If you take a wack at the refactoring based on X number of modules then you
> > could check each one in a different subversion trunk. Each module will
> > probably have a dependency on biojava-core which will also be a separate
> > subversion trunk. In Netbeans I would setup a project for each and then I
> > can add the biojava-core project as an external project dependency.
>
> Sounds good and you would do the same in eclipse.
>
> This
> > also allows each module to be released independently and more frequently. We
> > probably need to come up with a versioning convention that is part of the
> > jar name.
>
> I think we should stick to the  maven naming conventions.
> http://maven.apache.org/guides/mini/guide-naming-conventions.html
> e.g.
> groupId org.biojava.phylo for the phylogenetic package
> artifactId biojava-phylo
> version 3.0.0  or 3.0.0-SNAPSHOT if it is a nightly build
>
>
> Not sure if any of the ant build tools automate the upticking of
> > major/minor version number when packaging jars.
>
> Not sure about ant, but maven has a built in release plugin.  if it is
> set up correctly you can just write
> mvn release:prepare
> and the release is being prepared...
>
>
> > As part of the refactoring now is the time to make any major namespace
> > changes you want to make. I assume that eclipse refactoring makes this easy.
>
> Namespace changes are tricky. In principle I don;t want to break
> backwards compatibility with the existing code base. On the other side
> having package names starting with org.biojava.structure, rather than
> org.biojava.bio.structure would be simpler. If in doubt I am for
> backwards compatibility. One case where I would like to see a change
> is the core blast parsing modules. org.biojava.bio.program.sax does
> not indicate at all that this has to do with blast.


From xuxiang at sibs.ac.cn  Sun May 31 21:54:46 2009
From: xuxiang at sibs.ac.cn (xuxiang)
Date: Mon, 1 Jun 2009 09:54:46 +0800
Subject: [Biojava-dev] Next Generation Sequencing
Message-ID: <200906010954385937117@sibs.ac.cn>

Hi all,

I am doing something about sequencing data from Illumina Genome Analyzer (Next Generation Sequencing).  Are there any tools in BioJava for analyzing Next Generation Sequencing data?

2009-06-01 


xuxiang 

From harryzs1981 at gmail.com  Wed May  6 13:13:42 2009
From: harryzs1981 at gmail.com (sheng zhao)
Date: Wed, 6 May 2009 15:13:42 +0200
Subject: [Biojava-dev] Biojava-doc in chm forma
Message-ID: <3d23b1eb0905060613m643adf87sdef55a05a083dd51@mail.gmail.com>

Hi

Where can I find Biojava-doc in chm format??

Thanks !

harry


From andreas at sdsc.edu  Mon May 11 04:26:58 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 10 May 2009 21:26:58 -0700
Subject: [Biojava-dev] Plans for next biojava release - modularization
Message-ID: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>

Hi biojava-devs,

It is time to start working on the next biojava release.  I  would
like to modularize the current code base and apply some of the ideas
that have emerged around Richard's "biojava 3" code. In principle the
idea is that all changes should be backwards compatible with the
interfaces provided by the current biojava 1.7 release.  Backwards
compatibility shall only be broken if the functionality is being
replaced with something that works better, and gets documented
accordingly. For the build functionality I would suggest to stick with
what Richard's biojava 3 code base already is providing. Since we will
try to be backwards compatible all code development should be part of
the biojava-trunk and the first step will be to move the ant-build
scripts to a maven build process. Following this procedure will allow
to use e.g. the code refactoring tools provided by Eclipse, which
should come in handy.

The modules I would like to see should provide self-contained
functionality and cross dependencies should be restricted to a
minimum. I would suggest to have the following modules:

biojava-core: Contains everything that can not easily be modularized
or nobody volunteers to become a module maintainer.
biojava-phylogeny: Scooter expressed some interested to provide such a
module and become package maintainer for it.
biojava-structure: Everything protein structure related. I would be
package maintainer.
biojava-blast: Blast parsing is a frequently requested functionality
and it would be good to have this code self-contained. A package
maintainer for this still will need to be nominated at a later stage.
Any suggestions for other modules?

Let me know what you think about this.

Andreas


From HWillis at scripps.edu  Mon May 11 13:50:58 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Mon, 11 May 2009 09:50:58 -0400
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>
References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>

Andreas

Another theme that should be considered is providing a multi-thread
version of any module with long run time. This would have a couple
elements. A progress listener interface should be standard where core
code would update progress messages to listeners that can be used by
external code to display feedback to the user. I did this with the
Neighbor Joining code for tree construction and it provides needed
feedback in a GUI. If not the user gets frustrated because they don't
know the code they are about to execute may take 10 minutes or 8 hours
to complete and they think the software is not working. The reverse is
also true for canceling an operation where you want to have core code
stop processing a long running loop. Once the code has completed then
the listener interface for process complete is called allowing the next
step in the external code to continue. The developer would have the
choice to call the "process" method or run it in a thread and wait for
the callback complete method to be called. 

This is the first step in the ability to have the core/long running
processes take advantage of multiple threads to complete the
computational task faster. Not all code can be parallelized easily but
if the algorithm can take advantage of running in parallel then it
should. This then opens up a couple of cloud computing frameworks that
extend the multi-threaded concepts in Java across a cluster
http://www.terracotta.org/. If we put an emphasis on having code that
runs well in a thread we are one step closer to an architecture that can
run in a cloud. The computational problems are only going to get bigger
and with Amazon EC2 and http://www.eucalyptus.com/ approaches
computational IO cycles are going to be cheap as long as the
software/libraries can easily take advantage of it.

Thanks

Scooter

-----Original Message-----
From: biojava-dev-bounces at lists.open-bio.org
[mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
Prlic
Sent: Monday, May 11, 2009 12:27 AM
To: biojava-dev
Subject: [Biojava-dev] Plans for next biojava release - modularization

Hi biojava-devs,

It is time to start working on the next biojava release.  I  would
like to modularize the current code base and apply some of the ideas
that have emerged around Richard's "biojava 3" code. In principle the
idea is that all changes should be backwards compatible with the
interfaces provided by the current biojava 1.7 release.  Backwards
compatibility shall only be broken if the functionality is being
replaced with something that works better, and gets documented
accordingly. For the build functionality I would suggest to stick with
what Richard's biojava 3 code base already is providing. Since we will
try to be backwards compatible all code development should be part of
the biojava-trunk and the first step will be to move the ant-build
scripts to a maven build process. Following this procedure will allow
to use e.g. the code refactoring tools provided by Eclipse, which
should come in handy.

The modules I would like to see should provide self-contained
functionality and cross dependencies should be restricted to a
minimum. I would suggest to have the following modules:

biojava-core: Contains everything that can not easily be modularized
or nobody volunteers to become a module maintainer.
biojava-phylogeny: Scooter expressed some interested to provide such a
module and become package maintainer for it.
biojava-structure: Everything protein structure related. I would be
package maintainer.
biojava-blast: Blast parsing is a frequently requested functionality
and it would be good to have this code self-contained. A package
maintainer for this still will need to be nominated at a later stage.
Any suggestions for other modules?

Let me know what you think about this.

Andreas
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From andreas at sdsc.edu  Mon May 11 22:53:14 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 11 May 2009 15:53:14 -0700
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>
References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <59a41c430905111553n743dbcb3hbb21ec59294cb723@mail.gmail.com>

Hi Scooter,

I like the idea of supporting multiple threads and parallelizing code
where possible. Is there a reference implementation that you would
recommend for how progress listeners should be implemented?  I suppose
the neighbor joining code you mention below is not part of biojava...

Andreas


On Mon, May 11, 2009 at 6:50 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Andreas
>
> Another theme that should be considered is providing a multi-thread
> version of any module with long run time. This would have a couple
> elements. A progress listener interface should be standard where core
> code would update progress messages to listeners that can be used by
> external code to display feedback to the user. I did this with the
> Neighbor Joining code for tree construction and it provides needed
> feedback in a GUI. If not the user gets frustrated because they don't
> know the code they are about to execute may take 10 minutes or 8 hours
> to complete and they think the software is not working. The reverse is
> also true for canceling an operation where you want to have core code
> stop processing a long running loop. Once the code has completed then
> the listener interface for process complete is called allowing the next
> step in the external code to continue. The developer would have the
> choice to call the "process" method or run it in a thread and wait for
> the callback complete method to be called.
>
> This is the first step in the ability to have the core/long running
> processes take advantage of multiple threads to complete the
> computational task faster. Not all code can be parallelized easily but
> if the algorithm can take advantage of running in parallel then it
> should. This then opens up a couple of cloud computing frameworks that
> extend the multi-threaded concepts in Java across a cluster
> http://www.terracotta.org/. If we put an emphasis on having code that
> runs well in a thread we are one step closer to an architecture that can
> run in a cloud. The computational problems are only going to get bigger
> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> computational IO cycles are going to be cheap as long as the
> software/libraries can easily take advantage of it.
>
> Thanks
>
> Scooter
>
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> Prlic
> Sent: Monday, May 11, 2009 12:27 AM
> To: biojava-dev
> Subject: [Biojava-dev] Plans for next biojava release - modularization
>
> Hi biojava-devs,
>
> It is time to start working on the next biojava release. ?I ?would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release. ?Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
>
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
>
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
>
> Let me know what you think about this.
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From HWillis at scripps.edu  Tue May 12 00:34:11 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Mon, 11 May 2009 20:34:11 -0400
Subject: [Biojava-dev] Plans for next biojava release - modularization
References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com><061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905111553n743dbcb3hbb21ec59294cb723@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C84F@FLMAIL1.fl.ad.scripps.edu>


Andreas

This is what I put together for the tree code as the interface. In the loop code of the algorithm you simply call the appropriate progress message where it could be cleaned up to have one progress method and a float for percentage complete. Passing the instance of NJTree was required for this specific case because all the work was done when the NJTree class was instantiated. It really should be cleaned up so that it has a process method and is runnable in a thread if needed. The progress listener could be generic for all long running classes. I have wrapped the NJTree code in a TreeConstructor class which bridges the biojava framework and allows the NJTree code to be replaced by something that is compatible with the BioJava open source license if needed. I am still playing around with performance optimizations and need to see if Jalview would contribute the NJTree code to BioJava. If not, I would do my own implementation as the algorithm is not difficult.

I was also thinking that we could have Java code that provides functionality such as Blast by making a web service call to an external publicly supported service. Instead of parsing Blast results flat files you can make a call to an external service http://www.ebi.ac.uk/Tools/webservices/services/wublast via web services and get well structured results. 

Scooter 


package org.biojavax.phylo;

import org.biojavax.phylo.jalview.NJTree;

/**
 *
 * @author willishf
 */
public interface NJTreeProgressListener {
    public void progress(NJTree njtree,String state, int percentageComplete);
    public void progress(NJTree njtree,String state, int currentCount,int totalCount);
    public void complete(NJTree njtree);
    public void canceled(NJTree njtree);
}

**********************************************************************************************
This code could be abstracted out into a base class or simply added into a class that needs to 
notify external listeners
**********************************************************************************************
    Vector<NJTreeProgressListener> progessListenerVector = new Vector<NJTreeProgressListener>();

    public void addProgessListener(NJTreeProgressListener treeProgessListener) {
        if (treeProgessListener != null) {
            progessListenerVector.add(treeProgessListener);
        }
    }

    public void removeProgessListener(NJTreeProgressListener treeProgessListener) {
        if (treeProgessListener != null) {
            progessListenerVector.remove(treeProgessListener);
        }
    }

    public void broadcastComplete() {
        for (NJTreeProgressListener treeProgressListener : progessListenerVector) {
            treeProgressListener.complete(this);
        }
    }

    public void updateProgress(String state, int percentage) {
        for (NJTreeProgressListener treeProgressListener : progessListenerVector) {
            treeProgressListener.progress(this,state, percentage);
        }
    }

    public void updateProgress(String state, int currentCount, int totalCount) {
        for (NJTreeProgressListener treeProgressListener : progessListenerVector) {
            treeProgressListener.progress(this,state, currentCount, totalCount);
        }
    }

***************************************************************************************


/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */
package org.biojavax.phylo;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Vector;
import org.biojava.bio.BioException;
import org.biojavax.phylo.jalview.NJTreeNew;
import org.biojavax.phylo.jalview.TreeConstructionAlgorithm;
import org.biojavax.phylo.jalview.TreeType;

import org.biojava.bio.seq.*;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequenceIterator;
import org.biojavax.phylo.jalview.NJSequence;
import org.biojavax.phylo.jalview.NJTree;

/**
 *
 * @author willishf
 */
public class TreeConstructor extends Thread {

   
    NJTree njtree = null;
    NJSequence[] sequences = null;
    TreeType treeType;
    TreeConstructionAlgorithm treeConstructionAlgorithm;
    NJTreeProgressListener treeProgessListener;

    public TreeConstructor(SequenceIterator iter, TreeType _treeType, TreeConstructionAlgorithm _treeConstructionAlgorithm, NJTreeProgressListener _treeProgessListener) {
        treeType = _treeType;
        treeConstructionAlgorithm = _treeConstructionAlgorithm;
        treeProgessListener = _treeProgessListener;
        ArrayList<NJSequence> sequenceArray = new ArrayList<NJSequence>();
        while (iter.hasNext()) {
            try {
                Sequence seq = iter.nextSequence();
                NJSequence njsequence = new NJSequence(seq.getName(), seq.seqString());
                sequenceArray.add(njsequence);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        sequences = new NJSequence[sequenceArray.size()];
        sequenceArray.toArray(sequences);
    }

    public TreeConstructor(Vector<RichSequence> sequenceVector, TreeType _treeType, TreeConstructionAlgorithm _treeConstructionAlgorithm, NJTreeProgressListener _treeProgessListener) {
        treeType = _treeType;
        treeConstructionAlgorithm = _treeConstructionAlgorithm;
        treeProgessListener = _treeProgessListener;
        sequences = new NJSequence[sequenceVector.size()];
        int index = 0;
        for (RichSequence seq : sequenceVector) {

            NJSequence njsequence = new NJSequence(seq.getName(), seq.seqString());
            sequences[index] = njsequence;
            index++;
        }

    }

    public void cancel(){
        if(njtree != null)
            njtree.cancel();
    }

    public void process() throws Exception {
        njtree = new NJTree(sequences, treeType, treeConstructionAlgorithm, treeProgessListener);
    }

    @Override
    public void run() {
        try {
            process();
        } catch (Exception e) {
            e.printStackTrace();

        }
    }

    public String getNewickString() {
        if (njtree != null) {
            return njtree.toString();
        } 
        return "";
    }

    public static void main(String[] args) {
        if (args.length == 0) {
            args = new String[3];
            args[0] = "C:\\MutualInformation\\project\\hiv\\hiv-genes-genome.fasta";


        }
        try {
            //prepare a BufferedReader for file io
            BufferedReader br = new BufferedReader(new FileReader(args[0]));
            SimpleNamespace ns = new SimpleNamespace("biojava");

            // You can use any of the convenience methods found in the BioJava 1.6 API
            RichSequenceIterator rsi = RichSequence.IOTools.readFastaProtein(br, ns);

            long readTime = System.currentTimeMillis();
            TreeConstructor treeConstructor = new TreeConstructor(rsi, TreeType.NJ, TreeConstructionAlgorithm.PID, new ProgessListenerStub());
            treeConstructor.process();
            long treeTime = System.currentTimeMillis();
            String newick = treeConstructor.getNewickString();


            System.out.println("Tree time " + (treeTime - readTime));
            System.out.println(newick);

        } catch (FileNotFoundException ex) {
            //can't find file specified by args[0]
            ex.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }

    }
}


-----Original Message-----
From: andreas.prlic at gmail.com on behalf of Andreas Prlic
Sent: Mon 5/11/2009 6:53 PM
To: Scooter Willis
Cc: biojava-dev
Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
 
Hi Scooter,

I like the idea of supporting multiple threads and parallelizing code
where possible. Is there a reference implementation that you would
recommend for how progress listeners should be implemented?  I suppose
the neighbor joining code you mention below is not part of biojava...

Andreas


On Mon, May 11, 2009 at 6:50 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Andreas
>
> Another theme that should be considered is providing a multi-thread
> version of any module with long run time. This would have a couple
> elements. A progress listener interface should be standard where core
> code would update progress messages to listeners that can be used by
> external code to display feedback to the user. I did this with the
> Neighbor Joining code for tree construction and it provides needed
> feedback in a GUI. If not the user gets frustrated because they don't
> know the code they are about to execute may take 10 minutes or 8 hours
> to complete and they think the software is not working. The reverse is
> also true for canceling an operation where you want to have core code
> stop processing a long running loop. Once the code has completed then
> the listener interface for process complete is called allowing the next
> step in the external code to continue. The developer would have the
> choice to call the "process" method or run it in a thread and wait for
> the callback complete method to be called.
>
> This is the first step in the ability to have the core/long running
> processes take advantage of multiple threads to complete the
> computational task faster. Not all code can be parallelized easily but
> if the algorithm can take advantage of running in parallel then it
> should. This then opens up a couple of cloud computing frameworks that
> extend the multi-threaded concepts in Java across a cluster
> http://www.terracotta.org/. If we put an emphasis on having code that
> runs well in a thread we are one step closer to an architecture that can
> run in a cloud. The computational problems are only going to get bigger
> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> computational IO cycles are going to be cheap as long as the
> software/libraries can easily take advantage of it.
>
> Thanks
>
> Scooter
>
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> Prlic
> Sent: Monday, May 11, 2009 12:27 AM
> To: biojava-dev
> Subject: [Biojava-dev] Plans for next biojava release - modularization
>
> Hi biojava-devs,
>
> It is time to start working on the next biojava release. ?I ?would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release. ?Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
>
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
>
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
>
> Let me know what you think about this.
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From mark.schreiber at novartis.com  Tue May 12 05:26:33 2009
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 12 May 2009 13:26:33 +0800
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com>

Hi -

This was one thing we discussed previously with respect to biojava 3. 
Generally I support the idea because almost all computers are now 
multi-core and as you say cloud or utility computing is already a reality.

However, I tend to think that biojava should not control threading or 
concurrency. This should be done by the developer. This is because 
sometimes mutithreading can be fast on a slow computer but slow on a fast 
computer (due to the overhead in spawning threads) so programs need to be 
tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc 
don't like people attempting to control their own threads.  What BioJava 
should do is expose granular and thread-safe operations that can be 
threaded or form discrete tasks on a utility grid or complete in 
SessionBeans on an App server.  For example it would be better if BioJava 
had a single threaded method to calculate the GC of a single sequence 
rather than a multi-threaded method that calculates the GC of multiple 
sequences.  This would let the developer make a multithreaded version if 
desired or distribute multiple tasks based on the single threaded version 
to a compute cloud (and let the cloud manage all the tasks).

Possibly the best situation would be to have the single threaded fine 
grain operations that let developers or grid engines control threading and 
then higher level APIs that do it for you (or good cookbook examples that 
show you how to do it).  Another idea that was discussed was the use of 
properties files to allow people to set how many CPUs they wanted to make 
available to the JVM or name packages that can or cannot use threading.

Finally, there are lots of times when it is highly desirable to use Java 
beans because they play well with dozens of Java api's however beans don't 
work well with threads because they have public setter methods.  I would 
like to see a lot more bean use in a future BioJava because it would make 
life so much easier but a lot of care would need to be taken to make sure 
thread safety is preserved.  There are many patterns that can be used such 
as synchronization locks etc to make things thread safe so I think this 
can be achieved as long as we are disciplined and consider that all 
methods may be used in a multi-threaded application (even if we write the 
method as a single thread).  If there are code checkers that make 
suggestions on thread safety it would be great to have these as part of 
the standard build process.  Good documentation would go a long way as 
well.  Are there unit test patterns that can catch these problems as well? 
 Suggestions would be great.

Progress Listener patterns are good but it depends on the situation and 
might be better handled in high level APIs or left to the developer.  For 
example in your NJ code a progress listener would be good if someone fed 
1000 sequences into the method but not if they only put in 10. Also code 
running on an old machine might need a progress listener but the same 
problem on a new machine may complete almost instantly.  Probably a 
pluggable listener would be the way to go.  Also it might be possible to 
do this using the new JDK APIs that let you take a peek at the stack 
trace. Even if your NJ method didn't allow for a progress listener a 
developer could still make one by looking at the method calls in the 
stack. As long as your NJ method called other methods internally for each 
sequence (quite likely) it would be possible to observe the cycle of 
method calls from the stack.  This might make it possible to have a very 
general BioJava progress listener that can be told to count the number of 
times a method is called in the stack. The name of the method would be the 
argument.  If the application runs in a Java App server you can also do 
this very easily with a method Interceptor.

- Mark

biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:

> Andreas
> 
> Another theme that should be considered is providing a multi-thread
> version of any module with long run time. This would have a couple
> elements. A progress listener interface should be standard where core
> code would update progress messages to listeners that can be used by
> external code to display feedback to the user. I did this with the
> Neighbor Joining code for tree construction and it provides needed
> feedback in a GUI. If not the user gets frustrated because they don't
> know the code they are about to execute may take 10 minutes or 8 hours
> to complete and they think the software is not working. The reverse is
> also true for canceling an operation where you want to have core code
> stop processing a long running loop. Once the code has completed then
> the listener interface for process complete is called allowing the next
> step in the external code to continue. The developer would have the
> choice to call the "process" method or run it in a thread and wait for
> the callback complete method to be called. 
> 
> This is the first step in the ability to have the core/long running
> processes take advantage of multiple threads to complete the
> computational task faster. Not all code can be parallelized easily but
> if the algorithm can take advantage of running in parallel then it
> should. This then opens up a couple of cloud computing frameworks that
> extend the multi-threaded concepts in Java across a cluster
> http://www.terracotta.org/. If we put an emphasis on having code that
> runs well in a thread we are one step closer to an architecture that can
> run in a cloud. The computational problems are only going to get bigger
> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> computational IO cycles are going to be cheap as long as the
> software/libraries can easily take advantage of it.
> 
> Thanks
> 
> Scooter
> 
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> Prlic
> Sent: Monday, May 11, 2009 12:27 AM
> To: biojava-dev
> Subject: [Biojava-dev] Plans for next biojava release - modularization
> 
> Hi biojava-devs,
> 
> It is time to start working on the next biojava release.  I  would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release.  Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
> 
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
> 
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
> 
> Let me know what you think about this.
> 
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.


From ayates at ebi.ac.uk  Tue May 12 08:27:52 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 12 May 2009 09:27:52 +0100
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com>
References: <OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com>
Message-ID: <4A093308.4030409@ebi.ac.uk>

I agree with Mark.

Later versions of the Java environment will make concurrent programming
easier not to mention languages already available on the VM (Scala &
Clojure) that make it very easy indeed. Our goal in biojava must be to
write code which will behave well in one of these environments.

I don't want us to fall into the trap of earlier biojava where things
like own implementations of database connection pooling data sources
(sorry I don't mean to pick on any one part of the code but it
highlights very well what we should avoid). We're
bioinformaticians/engineers; lets do what we do best and work well
within our chosen field. Let other people like Doug Lea deal with the
pain that is concurrent programming & the alike :)

Andy

mark.schreiber at novartis.com wrote:
> Hi -
> 
> This was one thing we discussed previously with respect to biojava 3. 
> Generally I support the idea because almost all computers are now 
> multi-core and as you say cloud or utility computing is already a reality.
> 
> However, I tend to think that biojava should not control threading or 
> concurrency. This should be done by the developer. This is because 
> sometimes mutithreading can be fast on a slow computer but slow on a fast 
> computer (due to the overhead in spawning threads) so programs need to be 
> tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc 
> don't like people attempting to control their own threads.  What BioJava 
> should do is expose granular and thread-safe operations that can be 
> threaded or form discrete tasks on a utility grid or complete in 
> SessionBeans on an App server.  For example it would be better if BioJava 
> had a single threaded method to calculate the GC of a single sequence 
> rather than a multi-threaded method that calculates the GC of multiple 
> sequences.  This would let the developer make a multithreaded version if 
> desired or distribute multiple tasks based on the single threaded version 
> to a compute cloud (and let the cloud manage all the tasks).
> 
> Possibly the best situation would be to have the single threaded fine 
> grain operations that let developers or grid engines control threading and 
> then higher level APIs that do it for you (or good cookbook examples that 
> show you how to do it).  Another idea that was discussed was the use of 
> properties files to allow people to set how many CPUs they wanted to make 
> available to the JVM or name packages that can or cannot use threading.
> 
> Finally, there are lots of times when it is highly desirable to use Java 
> beans because they play well with dozens of Java api's however beans don't 
> work well with threads because they have public setter methods.  I would 
> like to see a lot more bean use in a future BioJava because it would make 
> life so much easier but a lot of care would need to be taken to make sure 
> thread safety is preserved.  There are many patterns that can be used such 
> as synchronization locks etc to make things thread safe so I think this 
> can be achieved as long as we are disciplined and consider that all 
> methods may be used in a multi-threaded application (even if we write the 
> method as a single thread).  If there are code checkers that make 
> suggestions on thread safety it would be great to have these as part of 
> the standard build process.  Good documentation would go a long way as 
> well.  Are there unit test patterns that can catch these problems as well? 
>  Suggestions would be great.
> 
> Progress Listener patterns are good but it depends on the situation and 
> might be better handled in high level APIs or left to the developer.  For 
> example in your NJ code a progress listener would be good if someone fed 
> 1000 sequences into the method but not if they only put in 10. Also code 
> running on an old machine might need a progress listener but the same 
> problem on a new machine may complete almost instantly.  Probably a 
> pluggable listener would be the way to go.  Also it might be possible to 
> do this using the new JDK APIs that let you take a peek at the stack 
> trace. Even if your NJ method didn't allow for a progress listener a 
> developer could still make one by looking at the method calls in the 
> stack. As long as your NJ method called other methods internally for each 
> sequence (quite likely) it would be possible to observe the cycle of 
> method calls from the stack.  This might make it possible to have a very 
> general BioJava progress listener that can be told to count the number of 
> times a method is called in the stack. The name of the method would be the 
> argument.  If the application runs in a Java App server you can also do 
> this very easily with a method Interceptor.
> 
> - Mark
> 
> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
> 
>> Andreas
>>
>> Another theme that should be considered is providing a multi-thread
>> version of any module with long run time. This would have a couple
>> elements. A progress listener interface should be standard where core
>> code would update progress messages to listeners that can be used by
>> external code to display feedback to the user. I did this with the
>> Neighbor Joining code for tree construction and it provides needed
>> feedback in a GUI. If not the user gets frustrated because they don't
>> know the code they are about to execute may take 10 minutes or 8 hours
>> to complete and they think the software is not working. The reverse is
>> also true for canceling an operation where you want to have core code
>> stop processing a long running loop. Once the code has completed then
>> the listener interface for process complete is called allowing the next
>> step in the external code to continue. The developer would have the
>> choice to call the "process" method or run it in a thread and wait for
>> the callback complete method to be called. 
>>
>> This is the first step in the ability to have the core/long running
>> processes take advantage of multiple threads to complete the
>> computational task faster. Not all code can be parallelized easily but
>> if the algorithm can take advantage of running in parallel then it
>> should. This then opens up a couple of cloud computing frameworks that
>> extend the multi-threaded concepts in Java across a cluster
>> http://www.terracotta.org/. If we put an emphasis on having code that
>> runs well in a thread we are one step closer to an architecture that can
>> run in a cloud. The computational problems are only going to get bigger
>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>> computational IO cycles are going to be cheap as long as the
>> software/libraries can easily take advantage of it.
>>
>> Thanks
>>
>> Scooter
>>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>> Prlic
>> Sent: Monday, May 11, 2009 12:27 AM
>> To: biojava-dev
>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release.  I  would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release.  Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> _________________________
> 
> CONFIDENTIALITY NOTICE
> 
> The information contained in this e-mail message is intended only for the 
> exclusive use of the individual or entity named above and may contain 
> information that is privileged, confidential or exempt from disclosure 
> under applicable law. If the reader of this message is not the intended 
> recipient, or the employee or agent responsible for delivery of the 
> message to the intended recipient, you are hereby notified that any 
> dissemination, distribution or copying of this communication is strictly 
> prohibited. If you have received this communication in error, please 
> notify the sender immediately by e-mail and delete the material from any 
> computer.  Thank you.
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From holland at eaglegenomics.com  Tue May 12 08:26:26 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 12 May 2009 09:26:26 +0100
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>
References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>
Message-ID: <1242116786.7101.7.camel@buzzybee>

The BJ3 code contains only as much code as is needed to represent
sequences and to parse/write simple FASTA. It should be viewed as a
concept. In particular the file parsing mechanism is quite flexible (if
a little complex) but easily wrapped with simple one-liner utility
methods to provide end-users with easier-to-use APIs.

Sequence representation in BJ3 is done via the Collections API. It's set
up in such a way that you can write something yourself that implements
the List API and behaves like a List but internally uses a more compact
or even offline storage mechanism to represent the sequence. This allows
you to reuse sequences wherever Lists can be used, e.g. in Iterators or
foreach-loops.

Everything written so far has been documented here:

  http://biojava.org/wiki/BioJava3:HowTo

cheers,
Richard


On Sun, 2009-05-10 at 21:26 -0700, Andreas Prlic wrote:
> Hi biojava-devs,
> 
> It is time to start working on the next biojava release.  I  would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release.  Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
> 
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
> 
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
> 
> Let me know what you think about this.
> 
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From HWillis at scripps.edu  Tue May 12 13:34:51 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Tue, 12 May 2009 09:34:51 -0400
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com>
References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>
	<OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu>

Mark

 
It is a challenge on knowing where to draw the line. Allowing both
options is a reasonable approach. The implementation of the algorithm is
key to allow it to be multi-threaded or being able to run in parallel.
One approach is to provide a standard interface such as process() would
wait for the result/return value and run in the parent thread. To run
the algorithm in a thread you can have a startProcess() where you can
add yourself as a progress listener and when complete() method is called
you can call getResults(). You can then also have the corresponding
stopProcess() which would set an internal value to cause all threads to
quit.  Lots of ways to tackle the problem the key is to start talking
about it and at minimum take advantage of multiple-cores where the
external code can set the number of cores to use. You can get a dual
quad core machine these days for < $1000 but most software
implementations are not designed to take advantage of it. 

 
The real question is what exists today in the BioJava API that is
considered long running in normal use case and thus is a candidate to be
run in parallel. It may not be an issue in existing BioJava code. When I
first started using BioJava I went looking for BLAST code only to find a
BLAST parser. I wanted to do a Multiple Sequence Alignment and turns out
that Biojava code calls CLUSTALW as an external processor under the
covers.  I also needed code to construct trees from an MSA and found the
summer of code project that was only focused on representing the tree. 

 
It would be nice to have a BLAST implementation in Java optimized to run
on a cluster but who has time to rewrite BLAST in Java when you can do
BLAST search via the web and focus on parsing the results. BioJava needs
a BLAST API that makes a web services call to an external service and
gets returns structured results in core BioJava structures. Probably not
difficult to do a Java version of CLUSTALW but again we can push the
work out to http://www.ebi.ac.uk/Tools/webservices/services/clustalw and
get the results back returned in BioJava structures. 

 
I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web
service -> BioJava code. I haven't done the research but it seems that
http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work
to expose common biology  computational services. If multiple external
services are offering BLAST via web services where each picked a
different implementation then BioJava could provide abstraction to
different services.

 
Thanks


Scooter

 
From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] 
Sent: Tuesday, May 12, 2009 1:27 AM
To: Scooter Willis
Cc: Andreas Prlic; biojava-dev
Subject: Re: [Biojava-dev] Plans for next biojava release -
modularization

 
Hi - 

This was one thing we discussed previously with respect to biojava 3.
Generally I support the idea because almost all computers are now
multi-core and as you say cloud or utility computing is already a
reality. 

However, I tend to think that biojava should not control threading or
concurrency. This should be done by the developer. This is because
sometimes mutithreading can be fast on a slow computer but slow on a
fast computer (due to the overhead in spawning threads) so programs need
to be tunable. Also Java app servers and things like Sun Grid Engine,
EC2 etc don't like people attempting to control their own threads.  What
BioJava should do is expose granular and thread-safe operations that can
be threaded or form discrete tasks on a utility grid or complete in
SessionBeans on an App server.  For example it would be better if
BioJava had a single threaded method to calculate the GC of a single
sequence rather than a multi-threaded method that calculates the GC of
multiple sequences.  This would let the developer make a multithreaded
version if desired or distribute multiple tasks based on the single
threaded version to a compute cloud (and let the cloud manage all the
tasks). 

Possibly the best situation would be to have the single threaded fine
grain operations that let developers or grid engines control threading
and then higher level APIs that do it for you (or good cookbook examples
that show you how to do it).  Another idea that was discussed was the
use of properties files to allow people to set how many CPUs they wanted
to make available to the JVM or name packages that can or cannot use
threading. 

Finally, there are lots of times when it is highly desirable to use Java
beans because they play well with dozens of Java api's however beans
don't work well with threads because they have public setter methods.  I
would like to see a lot more bean use in a future BioJava because it
would make life so much easier but a lot of care would need to be taken
to make sure thread safety is preserved.  There are many patterns that
can be used such as synchronization locks etc to make things thread safe
so I think this can be achieved as long as we are disciplined and
consider that all methods may be used in a multi-threaded application
(even if we write the method as a single thread).  If there are code
checkers that make suggestions on thread safety it would be great to
have these as part of the standard build process.  Good documentation
would go a long way as well.  Are there unit test patterns that can
catch these problems as well?  Suggestions would be great. 

Progress Listener patterns are good but it depends on the situation and
might be better handled in high level APIs or left to the developer.
For example in your NJ code a progress listener would be good if someone
fed 1000 sequences into the method but not if they only put in 10. Also
code running on an old machine might need a progress listener but the
same problem on a new machine may complete almost instantly.  Probably a
pluggable listener would be the way to go.  Also it might be possible to
do this using the new JDK APIs that let you take a peek at the stack
trace. Even if your NJ method didn't allow for a progress listener a
developer could still make one by looking at the method calls in the
stack. As long as your NJ method called other methods internally for
each sequence (quite likely) it would be possible to observe the cycle
of method calls from the stack.  This might make it possible to have a
very general BioJava progress listener that can be told to count the
number of times a method is called in the stack. The name of the method
would be the argument.  If the application runs in a Java App server you
can also do this very easily with a method Interceptor. 

- Mark 

biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:

> Andreas
> 
> Another theme that should be considered is providing a multi-thread
> version of any module with long run time. This would have a couple
> elements. A progress listener interface should be standard where core
> code would update progress messages to listeners that can be used by
> external code to display feedback to the user. I did this with the
> Neighbor Joining code for tree construction and it provides needed
> feedback in a GUI. If not the user gets frustrated because they don't
> know the code they are about to execute may take 10 minutes or 8 hours
> to complete and they think the software is not working. The reverse is
> also true for canceling an operation where you want to have core code
> stop processing a long running loop. Once the code has completed then
> the listener interface for process complete is called allowing the
next
> step in the external code to continue. The developer would have the
> choice to call the "process" method or run it in a thread and wait for
> the callback complete method to be called. 
> 
> This is the first step in the ability to have the core/long running
> processes take advantage of multiple threads to complete the
> computational task faster. Not all code can be parallelized easily but
> if the algorithm can take advantage of running in parallel then it
> should. This then opens up a couple of cloud computing frameworks that
> extend the multi-threaded concepts in Java across a cluster
> http://www.terracotta.org/. If we put an emphasis on having code that
> runs well in a thread we are one step closer to an architecture that
can
> run in a cloud. The computational problems are only going to get
bigger
> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> computational IO cycles are going to be cheap as long as the
> software/libraries can easily take advantage of it.
> 
> Thanks
> 
> Scooter
> 
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> Prlic
> Sent: Monday, May 11, 2009 12:27 AM
> To: biojava-dev
> Subject: [Biojava-dev] Plans for next biojava release - modularization
> 
> Hi biojava-devs,
> 
> It is time to start working on the next biojava release.  I  would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release.  Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
> 
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
> 
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
> 
> Let me know what you think about this.
> 
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for
the exclusive use of the individual or entity named above and may
contain information that is privileged, confidential or exempt from
disclosure under applicable law. If the reader of this message is not
the intended recipient, or the employee or agent responsible for
delivery of the message to the intended recipient, you are hereby
notified that any dissemination, distribution or copying of this
communication is strictly prohibited. If you have received this
communication in error, please notify the sender immediately by e-mail
and delete the material from any computer.  Thank you.


From andreas at sdsc.edu  Tue May 12 23:52:51 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 12 May 2009 16:52:51 -0700
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <1242116786.7101.7.camel@buzzybee>
References: <59a41c430905102126i4c3eb30erabbebb760b51e793@mail.gmail.com>
	<1242116786.7101.7.camel@buzzybee>
Message-ID: <59a41c430905121652s7c548985xd9261734b42a4182@mail.gmail.com>

Hi Richard,

Do you think the BJ3 code could form the beginning of a new
biojava-sequence module and can become part of the next release?

Andreas

On Tue, May 12, 2009 at 1:26 AM, Richard Holland
<holland at eaglegenomics.com> wrote:
> The BJ3 code contains only as much code as is needed to represent
> sequences and to parse/write simple FASTA. It should be viewed as a
> concept. In particular the file parsing mechanism is quite flexible (if
> a little complex) but easily wrapped with simple one-liner utility
> methods to provide end-users with easier-to-use APIs.
>
> Sequence representation in BJ3 is done via the Collections API. It's set
> up in such a way that you can write something yourself that implements
> the List API and behaves like a List but internally uses a more compact
> or even offline storage mechanism to represent the sequence. This allows
> you to reuse sequences wherever Lists can be used, e.g. in Iterators or
> foreach-loops.
>
> Everything written so far has been documented here:
>
> ?http://biojava.org/wiki/BioJava3:HowTo
>
> cheers,
> Richard
>
>
>
> On Sun, 2009-05-10 at 21:26 -0700, Andreas Prlic wrote:
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release. ?I ?would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release. ?Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
>


From andreas at sdsc.edu  Tue May 12 23:59:11 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 12 May 2009 16:59:11 -0700
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu>
References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu>
	<OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com>
	<061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com>

Hi Scooter,

about your suggestion for the blast webservice client code: In
principle I like the idea and we have had questions on the mailing
list regarding this in the past. Only thing is I think there is
already some client code in java available:
http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp
but I am not sure how good that Java client library is....

Besides this, there is the need for work on our blast parser library
and if you are interested in working on that you are welcome. As I
mentioned, I think this should become its own module, due to the
popularity of that code.

Andreas


On Tue, May 12, 2009 at 6:34 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Mark
>
>
>
> It is a challenge on knowing where to draw the line. Allowing both options
> is a reasonable approach. The implementation of the algorithm is key to
> allow it to be multi-threaded or being able to run in parallel. One approach
> is to provide a standard interface such as process() would wait for the
> result/return value and run in the parent thread. To run the algorithm in a
> thread you can have a startProcess() where you can add yourself as a
> progress listener and when complete() method is called you can call
> getResults(). You can then also have the corresponding stopProcess() which
> would set an internal value to cause all threads to quit. ?Lots of ways to
> tackle the problem the key is to start talking about it and at minimum take
> advantage of multiple-cores where the external code can set the number of
> cores to use. You can get a dual quad core machine these days for < $1000
> but most software implementations are not designed to take advantage of it.
>
>
>
> The real question is what exists today in the BioJava API that is considered
> long running in normal use case and thus is a candidate to be run in
> parallel. It may not be an issue in existing BioJava code. When I first
> started using BioJava I went looking for BLAST code only to find a BLAST
> parser. I wanted to do a Multiple Sequence Alignment and turns out that
> Biojava code calls CLUSTALW as an external processor under the covers. ?I
> also needed code to construct trees from an MSA and found the summer of code
> project that was only focused on representing the tree.
>
>
>
> It would be nice to have a BLAST implementation in Java optimized to run on
> a cluster but who has time to rewrite BLAST in Java when you can do BLAST
> search via the web and focus on parsing the results. BioJava needs a BLAST
> API that makes a web services call to an external service and gets returns
> structured results in core BioJava structures. Probably not difficult to do
> a Java version of CLUSTALW but again we can push the work out to
> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results
> back returned in BioJava structures.
>
>
>
> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web
> service -> BioJava code. I haven?t done the research but it seems that
> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to
> expose common biology ?computational services. If multiple external services
> are offering BLAST via web services where each picked a different
> implementation then BioJava could provide abstraction to different services.
>
>
>
> Thanks
>
> Scooter
>
>
>
> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com]
> Sent: Tuesday, May 12, 2009 1:27 AM
> To: Scooter Willis
> Cc: Andreas Prlic; biojava-dev
> Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
>
>
>
> Hi -
>
> This was one thing we discussed previously with respect to biojava 3.
> ?Generally I support the idea because almost all computers are now
> multi-core and as you say cloud or utility computing is already a reality.
>
> However, I tend to think that biojava should not control threading or
> concurrency. This should be done by the developer. This is because sometimes
> mutithreading can be fast on a slow computer but slow on a fast computer
> (due to the overhead in spawning threads) so programs need to be tunable.
> Also Java app servers and things like Sun Grid Engine, EC2 etc don't like
> people attempting to control their own threads. ?What BioJava should do is
> expose granular and thread-safe operations that can be threaded or form
> discrete tasks on a utility grid or complete in SessionBeans on an App
> server. ?For example it would be better if BioJava had a single threaded
> method to calculate the GC of a single sequence rather than a multi-threaded
> method that calculates the GC of multiple sequences. ?This would let the
> developer make a multithreaded version if desired or distribute multiple
> tasks based on the single threaded version to a compute cloud (and let the
> cloud manage all the tasks).
>
> Possibly the best situation would be to have the single threaded fine grain
> operations that let developers or grid engines control threading and then
> higher level APIs that do it for you (or good cookbook examples that show
> you how to do it). ?Another idea that was discussed was the use of
> properties files to allow people to set how many CPUs they wanted to make
> available to the JVM or name packages that can or cannot use threading.
>
> Finally, there are lots of times when it is highly desirable to use Java
> beans because they play well with dozens of Java api's however beans don't
> work well with threads because they have public setter methods. ?I would
> like to see a lot more bean use in a future BioJava because it would make
> life so much easier but a lot of care would need to be taken to make sure
> thread safety is preserved. ?There are many patterns that can be used such
> as synchronization locks etc to make things thread safe so I think this can
> be achieved as long as we are disciplined and consider that all methods may
> be used in a multi-threaded application (even if we write the method as a
> single thread). ?If there are code checkers that make suggestions on thread
> safety it would be great to have these as part of the standard build
> process. ?Good documentation would go a long way as well. ?Are there unit
> test patterns that can catch these problems as well? ?Suggestions would be
> great.
>
> Progress Listener patterns are good but it depends on the situation and
> might be better handled in high level APIs or left to the developer. ?For
> example in your NJ code a progress listener would be good if someone fed
> 1000 sequences into the method but not if they only put in 10. Also code
> running on an old machine might need a progress listener but the same
> problem on a new machine may complete almost instantly. ?Probably a
> pluggable listener would be the way to go. ?Also it might be possible to do
> this using the new JDK APIs that let you take a peek at the stack trace.
> Even if your NJ method didn't allow for a progress listener a developer
> could still make one by looking at the method calls in the stack. As long as
> your NJ method called other methods internally for each sequence (quite
> likely) it would be possible to observe the cycle of method calls from the
> stack. ?This might make it possible to have a very general BioJava progress
> listener that can be told to count the number of times a method is called in
> the stack. The name of the method would be the argument. ?If the application
> runs in a Java App server you can also do this very easily with a method
> Interceptor.
>
> - Mark
>
> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
>
>> Andreas
>>
>> Another theme that should be considered is providing a multi-thread
>> version of any module with long run time. This would have a couple
>> elements. A progress listener interface should be standard where core
>> code would update progress messages to listeners that can be used by
>> external code to display feedback to the user. I did this with the
>> Neighbor Joining code for tree construction and it provides needed
>> feedback in a GUI. If not the user gets frustrated because they don't
>> know the code they are about to execute may take 10 minutes or 8 hours
>> to complete and they think the software is not working. The reverse is
>> also true for canceling an operation where you want to have core code
>> stop processing a long running loop. Once the code has completed then
>> the listener interface for process complete is called allowing the next
>> step in the external code to continue. The developer would have the
>> choice to call the "process" method or run it in a thread and wait for
>> the callback complete method to be called.
>>
>> This is the first step in the ability to have the core/long running
>> processes take advantage of multiple threads to complete the
>> computational task faster. Not all code can be parallelized easily but
>> if the algorithm can take advantage of running in parallel then it
>> should. This then opens up a couple of cloud computing frameworks that
>> extend the multi-threaded concepts in Java across a cluster
>> http://www.terracotta.org/. If we put an emphasis on having code that
>> runs well in a thread we are one step closer to an architecture that can
>> run in a cloud. The computational problems are only going to get bigger
>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>> computational IO cycles are going to be cheap as long as the
>> software/libraries can easily take advantage of it.
>>
>> Thanks
>>
>> Scooter
>>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>> Prlic
>> Sent: Monday, May 11, 2009 12:27 AM
>> To: biojava-dev
>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release. ?I ?would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release. ?Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> _________________________
>
> CONFIDENTIALITY NOTICE
>
> The information contained in this e-mail message is intended only for the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure under
> applicable law. If the reader of this message is not the intended recipient,
> or the employee or agent responsible for delivery of the message to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If you
> have received this communication in error, please notify the sender
> immediately by e-mail and delete the material from any computer. ?Thank you.


From HWillis at scripps.edu  Wed May 13 00:13:45 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Tue, 12 May 2009 20:13:45 -0400
Subject: [Biojava-dev] Plans for next biojava release - modularization
References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu><OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com><061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C855@FLMAIL1.fl.ad.scripps.edu>

Andreas

The goal for BioJava could be to provide a wrapper for the http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp java code so that inputs/outputs are BioJava. I think they are using Axis for the client web services code. If BioJava 3 is going to be Java 6 minimum then it is easier to use the Java 6 SOAP processing capabilities by pointing to the WSDL code and generating the Java code for the client side. This cuts down on the additional external 3rd parties that are required.

I try to stay out of the legacy file parsing business whenever possible. 

Scooter 

-----Original Message-----
From: andreas.prlic at gmail.com on behalf of Andreas Prlic
Sent: Tue 5/12/2009 7:59 PM
To: Scooter Willis
Cc: biojava-dev
Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
 
Hi Scooter,

about your suggestion for the blast webservice client code: In
principle I like the idea and we have had questions on the mailing
list regarding this in the past. Only thing is I think there is
already some client code in java available:
http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp
but I am not sure how good that Java client library is....

Besides this, there is the need for work on our blast parser library
and if you are interested in working on that you are welcome. As I
mentioned, I think this should become its own module, due to the
popularity of that code.

Andreas


On Tue, May 12, 2009 at 6:34 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Mark
>
>
>
> It is a challenge on knowing where to draw the line. Allowing both options
> is a reasonable approach. The implementation of the algorithm is key to
> allow it to be multi-threaded or being able to run in parallel. One approach
> is to provide a standard interface such as process() would wait for the
> result/return value and run in the parent thread. To run the algorithm in a
> thread you can have a startProcess() where you can add yourself as a
> progress listener and when complete() method is called you can call
> getResults(). You can then also have the corresponding stopProcess() which
> would set an internal value to cause all threads to quit. ?Lots of ways to
> tackle the problem the key is to start talking about it and at minimum take
> advantage of multiple-cores where the external code can set the number of
> cores to use. You can get a dual quad core machine these days for < $1000
> but most software implementations are not designed to take advantage of it.
>
>
>
> The real question is what exists today in the BioJava API that is considered
> long running in normal use case and thus is a candidate to be run in
> parallel. It may not be an issue in existing BioJava code. When I first
> started using BioJava I went looking for BLAST code only to find a BLAST
> parser. I wanted to do a Multiple Sequence Alignment and turns out that
> Biojava code calls CLUSTALW as an external processor under the covers. ?I
> also needed code to construct trees from an MSA and found the summer of code
> project that was only focused on representing the tree.
>
>
>
> It would be nice to have a BLAST implementation in Java optimized to run on
> a cluster but who has time to rewrite BLAST in Java when you can do BLAST
> search via the web and focus on parsing the results. BioJava needs a BLAST
> API that makes a web services call to an external service and gets returns
> structured results in core BioJava structures. Probably not difficult to do
> a Java version of CLUSTALW but again we can push the work out to
> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results
> back returned in BioJava structures.
>
>
>
> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web
> service -> BioJava code. I haven't done the research but it seems that
> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to
> expose common biology ?computational services. If multiple external services
> are offering BLAST via web services where each picked a different
> implementation then BioJava could provide abstraction to different services.
>
>
>
> Thanks
>
> Scooter
>
>
>
> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com]
> Sent: Tuesday, May 12, 2009 1:27 AM
> To: Scooter Willis
> Cc: Andreas Prlic; biojava-dev
> Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
>
>
>
> Hi -
>
> This was one thing we discussed previously with respect to biojava 3.
> ?Generally I support the idea because almost all computers are now
> multi-core and as you say cloud or utility computing is already a reality.
>
> However, I tend to think that biojava should not control threading or
> concurrency. This should be done by the developer. This is because sometimes
> mutithreading can be fast on a slow computer but slow on a fast computer
> (due to the overhead in spawning threads) so programs need to be tunable.
> Also Java app servers and things like Sun Grid Engine, EC2 etc don't like
> people attempting to control their own threads. ?What BioJava should do is
> expose granular and thread-safe operations that can be threaded or form
> discrete tasks on a utility grid or complete in SessionBeans on an App
> server. ?For example it would be better if BioJava had a single threaded
> method to calculate the GC of a single sequence rather than a multi-threaded
> method that calculates the GC of multiple sequences. ?This would let the
> developer make a multithreaded version if desired or distribute multiple
> tasks based on the single threaded version to a compute cloud (and let the
> cloud manage all the tasks).
>
> Possibly the best situation would be to have the single threaded fine grain
> operations that let developers or grid engines control threading and then
> higher level APIs that do it for you (or good cookbook examples that show
> you how to do it). ?Another idea that was discussed was the use of
> properties files to allow people to set how many CPUs they wanted to make
> available to the JVM or name packages that can or cannot use threading.
>
> Finally, there are lots of times when it is highly desirable to use Java
> beans because they play well with dozens of Java api's however beans don't
> work well with threads because they have public setter methods. ?I would
> like to see a lot more bean use in a future BioJava because it would make
> life so much easier but a lot of care would need to be taken to make sure
> thread safety is preserved. ?There are many patterns that can be used such
> as synchronization locks etc to make things thread safe so I think this can
> be achieved as long as we are disciplined and consider that all methods may
> be used in a multi-threaded application (even if we write the method as a
> single thread). ?If there are code checkers that make suggestions on thread
> safety it would be great to have these as part of the standard build
> process. ?Good documentation would go a long way as well. ?Are there unit
> test patterns that can catch these problems as well? ?Suggestions would be
> great.
>
> Progress Listener patterns are good but it depends on the situation and
> might be better handled in high level APIs or left to the developer. ?For
> example in your NJ code a progress listener would be good if someone fed
> 1000 sequences into the method but not if they only put in 10. Also code
> running on an old machine might need a progress listener but the same
> problem on a new machine may complete almost instantly. ?Probably a
> pluggable listener would be the way to go. ?Also it might be possible to do
> this using the new JDK APIs that let you take a peek at the stack trace.
> Even if your NJ method didn't allow for a progress listener a developer
> could still make one by looking at the method calls in the stack. As long as
> your NJ method called other methods internally for each sequence (quite
> likely) it would be possible to observe the cycle of method calls from the
> stack. ?This might make it possible to have a very general BioJava progress
> listener that can be told to count the number of times a method is called in
> the stack. The name of the method would be the argument. ?If the application
> runs in a Java App server you can also do this very easily with a method
> Interceptor.
>
> - Mark
>
> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
>
>> Andreas
>>
>> Another theme that should be considered is providing a multi-thread
>> version of any module with long run time. This would have a couple
>> elements. A progress listener interface should be standard where core
>> code would update progress messages to listeners that can be used by
>> external code to display feedback to the user. I did this with the
>> Neighbor Joining code for tree construction and it provides needed
>> feedback in a GUI. If not the user gets frustrated because they don't
>> know the code they are about to execute may take 10 minutes or 8 hours
>> to complete and they think the software is not working. The reverse is
>> also true for canceling an operation where you want to have core code
>> stop processing a long running loop. Once the code has completed then
>> the listener interface for process complete is called allowing the next
>> step in the external code to continue. The developer would have the
>> choice to call the "process" method or run it in a thread and wait for
>> the callback complete method to be called.
>>
>> This is the first step in the ability to have the core/long running
>> processes take advantage of multiple threads to complete the
>> computational task faster. Not all code can be parallelized easily but
>> if the algorithm can take advantage of running in parallel then it
>> should. This then opens up a couple of cloud computing frameworks that
>> extend the multi-threaded concepts in Java across a cluster
>> http://www.terracotta.org/. If we put an emphasis on having code that
>> runs well in a thread we are one step closer to an architecture that can
>> run in a cloud. The computational problems are only going to get bigger
>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>> computational IO cycles are going to be cheap as long as the
>> software/libraries can easily take advantage of it.
>>
>> Thanks
>>
>> Scooter
>>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>> Prlic
>> Sent: Monday, May 11, 2009 12:27 AM
>> To: biojava-dev
>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release. ?I ?would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release. ?Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> _________________________
>
> CONFIDENTIALITY NOTICE
>
> The information contained in this e-mail message is intended only for the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure under
> applicable law. If the reader of this message is not the intended recipient,
> or the employee or agent responsible for delivery of the message to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If you
> have received this communication in error, please notify the sender
> immediately by e-mail and delete the material from any computer. ?Thank you.


From mark.schreiber at novartis.com  Wed May 13 00:09:31 2009
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 13 May 2009 08:09:31 +0800
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com>
Message-ID: <OF8495A026.AC43734D-ON482575B5.000057FD-482575B5.0000DF4C@ah.novartis.com>

A while back I gave Richard some code that uses JAXB to objectify (and 
deobjectify) BLAST XML output. This might be useful for parsing BLAST 
results from the webservices which normally use BLAST XML. I could 
probably dig it up again if needed (it was autogenerated anyway).

It would probably be a good object model for BLAST output if people want 
to parse other types of BLAST output (such as flatfile, but who would want 
to do that!).  The BLAST XML seems to accommodate strange flavours of 
BLAST such as PSI-BLAST etc and also has been much more stable than the 
default flat file output.

- Mark


Andreas Prlic <andreas at sdsc.edu> 
Sent by: biojava-dev-bounces at lists.open-bio.org
05/13/2009 08:02 AM

To
Scooter Willis <HWillis at scripps.edu>
cc
biojava-dev <biojava-dev at lists.open-bio.org>
Subject
Re: [Biojava-dev] Plans for next biojava release - modularization


Hi Scooter,

about your suggestion for the blast webservice client code: In
principle I like the idea and we have had questions on the mailing
list regarding this in the past. Only thing is I think there is
already some client code in java available:
http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp
but I am not sure how good that Java client library is....

Besides this, there is the need for work on our blast parser library
and if you are interested in working on that you are welcome. As I
mentioned, I think this should become its own module, due to the
popularity of that code.

Andreas


On Tue, May 12, 2009 at 6:34 AM, Scooter Willis <HWillis at scripps.edu> 
wrote:
> Mark
>
>
>
> It is a challenge on knowing where to draw the line. Allowing both 
options
> is a reasonable approach. The implementation of the algorithm is key to
> allow it to be multi-threaded or being able to run in parallel. One 
approach
> is to provide a standard interface such as process() would wait for the
> result/return value and run in the parent thread. To run the algorithm 
in a
> thread you can have a startProcess() where you can add yourself as a
> progress listener and when complete() method is called you can call
> getResults(). You can then also have the corresponding stopProcess() 
which
> would set an internal value to cause all threads to quit.  Lots of ways 
to
> tackle the problem the key is to start talking about it and at minimum 
take
> advantage of multiple-cores where the external code can set the number 
of
> cores to use. You can get a dual quad core machine these days for < 
$1000
> but most software implementations are not designed to take advantage of 
it.
>
>
>
> The real question is what exists today in the BioJava API that is 
considered
> long running in normal use case and thus is a candidate to be run in
> parallel. It may not be an issue in existing BioJava code. When I first
> started using BioJava I went looking for BLAST code only to find a BLAST
> parser. I wanted to do a Multiple Sequence Alignment and turns out that
> Biojava code calls CLUSTALW as an external processor under the covers. 
 I
> also needed code to construct trees from an MSA and found the summer of 
code
> project that was only focused on representing the tree.
>
>
>
> It would be nice to have a BLAST implementation in Java optimized to run 
on
> a cluster but who has time to rewrite BLAST in Java when you can do 
BLAST
> search via the web and focus on parsing the results. BioJava needs a 
BLAST
> API that makes a web services call to an external service and gets 
returns
> structured results in core BioJava structures. Probably not difficult to 
do
> a Java version of CLUSTALW but again we can push the work out to
> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the 
results
> back returned in BioJava structures.
>
>
>
> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web
> service -> BioJava code. I haven?t done the research but it seems that
> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work 
to
> expose common biology  computational services. If multiple external 
services
> are offering BLAST via web services where each picked a different
> implementation then BioJava could provide abstraction to different 
services.
>
>
>
> Thanks
>
> Scooter
>
>
>
> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com]
> Sent: Tuesday, May 12, 2009 1:27 AM
> To: Scooter Willis
> Cc: Andreas Prlic; biojava-dev
> Subject: Re: [Biojava-dev] Plans for next biojava release - 
modularization
>
>
>
> Hi -
>
> This was one thing we discussed previously with respect to biojava 3.
>  Generally I support the idea because almost all computers are now
> multi-core and as you say cloud or utility computing is already a 
reality.
>
> However, I tend to think that biojava should not control threading or
> concurrency. This should be done by the developer. This is because 
sometimes
> mutithreading can be fast on a slow computer but slow on a fast computer
> (due to the overhead in spawning threads) so programs need to be 
tunable.
> Also Java app servers and things like Sun Grid Engine, EC2 etc don't 
like
> people attempting to control their own threads.  What BioJava should do 
is
> expose granular and thread-safe operations that can be threaded or form
> discrete tasks on a utility grid or complete in SessionBeans on an App
> server.  For example it would be better if BioJava had a single threaded
> method to calculate the GC of a single sequence rather than a 
multi-threaded
> method that calculates the GC of multiple sequences.  This would let the
> developer make a multithreaded version if desired or distribute multiple
> tasks based on the single threaded version to a compute cloud (and let 
the
> cloud manage all the tasks).
>
> Possibly the best situation would be to have the single threaded fine 
grain
> operations that let developers or grid engines control threading and 
then
> higher level APIs that do it for you (or good cookbook examples that 
show
> you how to do it).  Another idea that was discussed was the use of
> properties files to allow people to set how many CPUs they wanted to 
make
> available to the JVM or name packages that can or cannot use threading.
>
> Finally, there are lots of times when it is highly desirable to use Java
> beans because they play well with dozens of Java api's however beans 
don't
> work well with threads because they have public setter methods.  I would
> like to see a lot more bean use in a future BioJava because it would 
make
> life so much easier but a lot of care would need to be taken to make 
sure
> thread safety is preserved.  There are many patterns that can be used 
such
> as synchronization locks etc to make things thread safe so I think this 
can
> be achieved as long as we are disciplined and consider that all methods 
may
> be used in a multi-threaded application (even if we write the method as 
a
> single thread).  If there are code checkers that make suggestions on 
thread
> safety it would be great to have these as part of the standard build
> process.  Good documentation would go a long way as well.  Are there 
unit
> test patterns that can catch these problems as well?  Suggestions would 
be
> great.
>
> Progress Listener patterns are good but it depends on the situation and
> might be better handled in high level APIs or left to the developer. 
 For
> example in your NJ code a progress listener would be good if someone fed
> 1000 sequences into the method but not if they only put in 10. Also code
> running on an old machine might need a progress listener but the same
> problem on a new machine may complete almost instantly.  Probably a
> pluggable listener would be the way to go.  Also it might be possible to 
do
> this using the new JDK APIs that let you take a peek at the stack trace.
> Even if your NJ method didn't allow for a progress listener a developer
> could still make one by looking at the method calls in the stack. As 
long as
> your NJ method called other methods internally for each sequence (quite
> likely) it would be possible to observe the cycle of method calls from 
the
> stack.  This might make it possible to have a very general BioJava 
progress
> listener that can be told to count the number of times a method is 
called in
> the stack. The name of the method would be the argument.  If the 
application
> runs in a Java App server you can also do this very easily with a method
> Interceptor.
>
> - Mark
>
> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
>
>> Andreas
>>
>> Another theme that should be considered is providing a multi-thread
>> version of any module with long run time. This would have a couple
>> elements. A progress listener interface should be standard where core
>> code would update progress messages to listeners that can be used by
>> external code to display feedback to the user. I did this with the
>> Neighbor Joining code for tree construction and it provides needed
>> feedback in a GUI. If not the user gets frustrated because they don't
>> know the code they are about to execute may take 10 minutes or 8 hours
>> to complete and they think the software is not working. The reverse is
>> also true for canceling an operation where you want to have core code
>> stop processing a long running loop. Once the code has completed then
>> the listener interface for process complete is called allowing the next
>> step in the external code to continue. The developer would have the
>> choice to call the "process" method or run it in a thread and wait for
>> the callback complete method to be called.
>>
>> This is the first step in the ability to have the core/long running
>> processes take advantage of multiple threads to complete the
>> computational task faster. Not all code can be parallelized easily but
>> if the algorithm can take advantage of running in parallel then it
>> should. This then opens up a couple of cloud computing frameworks that
>> extend the multi-threaded concepts in Java across a cluster
>> http://www.terracotta.org/. If we put an emphasis on having code that
>> runs well in a thread we are one step closer to an architecture that 
can
>> run in a cloud. The computational problems are only going to get bigger
>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>> computational IO cycles are going to be cheap as long as the
>> software/libraries can easily take advantage of it.
>>
>> Thanks
>>
>> Scooter
>>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>> Prlic
>> Sent: Monday, May 11, 2009 12:27 AM
>> To: biojava-dev
>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release.  I  would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release.  Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> _________________________
>
> CONFIDENTIALITY NOTICE
>
> The information contained in this e-mail message is intended only for 
the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure 
under
> applicable law. If the reader of this message is not the intended 
recipient,
> or the employee or agent responsible for delivery of the message to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If 
you
> have received this communication in error, please notify the sender
> immediately by e-mail and delete the material from any computer.  Thank 
you.

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From HWillis at scripps.edu  Wed May 13 00:23:30 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Tue, 12 May 2009 20:23:30 -0400
Subject: [Biojava-dev] Plans for next biojava release - modularization
References: <061BFD133FA1584693D19C79A0072F5F8DD582@FLMAIL1.fl.ad.scripps.edu><OFFAAE41BE.0F70B29C-ON482575B4.001419C7-482575B4.001DE5F5@ah.novartis.com><061BFD133FA1584693D19C79A0072F5F8DD67A@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C855@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C858@FLMAIL1.fl.ad.scripps.edu>


Andreas

A follow up point related to Mark's comment could be that parsing blast output would not be required or less important if we provide a clean BioJava API to make the web service call with BioJava data structure inputs and give back BioJava data structure outputs. This saves the step of the user doing the web query, file save, parse etc. It would be interesting to know how many users run their own BLAST server for privacy reasons.

Scooter

-----Original Message-----
From: Scooter Willis
Sent: Tue 5/12/2009 8:13 PM
To: Andreas Prlic
Cc: biojava-dev
Subject: RE: [Biojava-dev] Plans for next biojava release - modularization
 
Andreas

The goal for BioJava could be to provide a wrapper for the http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp java code so that inputs/outputs are BioJava. I think they are using Axis for the client web services code. If BioJava 3 is going to be Java 6 minimum then it is easier to use the Java 6 SOAP processing capabilities by pointing to the WSDL code and generating the Java code for the client side. This cuts down on the additional external 3rd parties that are required.

I try to stay out of the legacy file parsing business whenever possible. 

Scooter 

-----Original Message-----
From: andreas.prlic at gmail.com on behalf of Andreas Prlic
Sent: Tue 5/12/2009 7:59 PM
To: Scooter Willis
Cc: biojava-dev
Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
 
Hi Scooter,

about your suggestion for the blast webservice client code: In
principle I like the idea and we have had questions on the mailing
list regarding this in the past. Only thing is I think there is
already some client code in java available:
http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp
but I am not sure how good that Java client library is....

Besides this, there is the need for work on our blast parser library
and if you are interested in working on that you are welcome. As I
mentioned, I think this should become its own module, due to the
popularity of that code.

Andreas


On Tue, May 12, 2009 at 6:34 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Mark
>
>
>
> It is a challenge on knowing where to draw the line. Allowing both options
> is a reasonable approach. The implementation of the algorithm is key to
> allow it to be multi-threaded or being able to run in parallel. One approach
> is to provide a standard interface such as process() would wait for the
> result/return value and run in the parent thread. To run the algorithm in a
> thread you can have a startProcess() where you can add yourself as a
> progress listener and when complete() method is called you can call
> getResults(). You can then also have the corresponding stopProcess() which
> would set an internal value to cause all threads to quit. ?Lots of ways to
> tackle the problem the key is to start talking about it and at minimum take
> advantage of multiple-cores where the external code can set the number of
> cores to use. You can get a dual quad core machine these days for < $1000
> but most software implementations are not designed to take advantage of it.
>
>
>
> The real question is what exists today in the BioJava API that is considered
> long running in normal use case and thus is a candidate to be run in
> parallel. It may not be an issue in existing BioJava code. When I first
> started using BioJava I went looking for BLAST code only to find a BLAST
> parser. I wanted to do a Multiple Sequence Alignment and turns out that
> Biojava code calls CLUSTALW as an external processor under the covers. ?I
> also needed code to construct trees from an MSA and found the summer of code
> project that was only focused on representing the tree.
>
>
>
> It would be nice to have a BLAST implementation in Java optimized to run on
> a cluster but who has time to rewrite BLAST in Java when you can do BLAST
> search via the web and focus on parsing the results. BioJava needs a BLAST
> API that makes a web services call to an external service and gets returns
> structured results in core BioJava structures. Probably not difficult to do
> a Java version of CLUSTALW but again we can push the work out to
> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the results
> back returned in BioJava structures.
>
>
>
> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web
> service -> BioJava code. I haven't done the research but it seems that
> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to
> expose common biology ?computational services. If multiple external services
> are offering BLAST via web services where each picked a different
> implementation then BioJava could provide abstraction to different services.
>
>
>
> Thanks
>
> Scooter
>
>
>
> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com]
> Sent: Tuesday, May 12, 2009 1:27 AM
> To: Scooter Willis
> Cc: Andreas Prlic; biojava-dev
> Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
>
>
>
> Hi -
>
> This was one thing we discussed previously with respect to biojava 3.
> ?Generally I support the idea because almost all computers are now
> multi-core and as you say cloud or utility computing is already a reality.
>
> However, I tend to think that biojava should not control threading or
> concurrency. This should be done by the developer. This is because sometimes
> mutithreading can be fast on a slow computer but slow on a fast computer
> (due to the overhead in spawning threads) so programs need to be tunable.
> Also Java app servers and things like Sun Grid Engine, EC2 etc don't like
> people attempting to control their own threads. ?What BioJava should do is
> expose granular and thread-safe operations that can be threaded or form
> discrete tasks on a utility grid or complete in SessionBeans on an App
> server. ?For example it would be better if BioJava had a single threaded
> method to calculate the GC of a single sequence rather than a multi-threaded
> method that calculates the GC of multiple sequences. ?This would let the
> developer make a multithreaded version if desired or distribute multiple
> tasks based on the single threaded version to a compute cloud (and let the
> cloud manage all the tasks).
>
> Possibly the best situation would be to have the single threaded fine grain
> operations that let developers or grid engines control threading and then
> higher level APIs that do it for you (or good cookbook examples that show
> you how to do it). ?Another idea that was discussed was the use of
> properties files to allow people to set how many CPUs they wanted to make
> available to the JVM or name packages that can or cannot use threading.
>
> Finally, there are lots of times when it is highly desirable to use Java
> beans because they play well with dozens of Java api's however beans don't
> work well with threads because they have public setter methods. ?I would
> like to see a lot more bean use in a future BioJava because it would make
> life so much easier but a lot of care would need to be taken to make sure
> thread safety is preserved. ?There are many patterns that can be used such
> as synchronization locks etc to make things thread safe so I think this can
> be achieved as long as we are disciplined and consider that all methods may
> be used in a multi-threaded application (even if we write the method as a
> single thread). ?If there are code checkers that make suggestions on thread
> safety it would be great to have these as part of the standard build
> process. ?Good documentation would go a long way as well. ?Are there unit
> test patterns that can catch these problems as well? ?Suggestions would be
> great.
>
> Progress Listener patterns are good but it depends on the situation and
> might be better handled in high level APIs or left to the developer. ?For
> example in your NJ code a progress listener would be good if someone fed
> 1000 sequences into the method but not if they only put in 10. Also code
> running on an old machine might need a progress listener but the same
> problem on a new machine may complete almost instantly. ?Probably a
> pluggable listener would be the way to go. ?Also it might be possible to do
> this using the new JDK APIs that let you take a peek at the stack trace.
> Even if your NJ method didn't allow for a progress listener a developer
> could still make one by looking at the method calls in the stack. As long as
> your NJ method called other methods internally for each sequence (quite
> likely) it would be possible to observe the cycle of method calls from the
> stack. ?This might make it possible to have a very general BioJava progress
> listener that can be told to count the number of times a method is called in
> the stack. The name of the method would be the argument. ?If the application
> runs in a Java App server you can also do this very easily with a method
> Interceptor.
>
> - Mark
>
> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
>
>> Andreas
>>
>> Another theme that should be considered is providing a multi-thread
>> version of any module with long run time. This would have a couple
>> elements. A progress listener interface should be standard where core
>> code would update progress messages to listeners that can be used by
>> external code to display feedback to the user. I did this with the
>> Neighbor Joining code for tree construction and it provides needed
>> feedback in a GUI. If not the user gets frustrated because they don't
>> know the code they are about to execute may take 10 minutes or 8 hours
>> to complete and they think the software is not working. The reverse is
>> also true for canceling an operation where you want to have core code
>> stop processing a long running loop. Once the code has completed then
>> the listener interface for process complete is called allowing the next
>> step in the external code to continue. The developer would have the
>> choice to call the "process" method or run it in a thread and wait for
>> the callback complete method to be called.
>>
>> This is the first step in the ability to have the core/long running
>> processes take advantage of multiple threads to complete the
>> computational task faster. Not all code can be parallelized easily but
>> if the algorithm can take advantage of running in parallel then it
>> should. This then opens up a couple of cloud computing frameworks that
>> extend the multi-threaded concepts in Java across a cluster
>> http://www.terracotta.org/. If we put an emphasis on having code that
>> runs well in a thread we are one step closer to an architecture that can
>> run in a cloud. The computational problems are only going to get bigger
>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>> computational IO cycles are going to be cheap as long as the
>> software/libraries can easily take advantage of it.
>>
>> Thanks
>>
>> Scooter
>>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>> Prlic
>> Sent: Monday, May 11, 2009 12:27 AM
>> To: biojava-dev
>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>
>> Hi biojava-devs,
>>
>> It is time to start working on the next biojava release. ?I ?would
>> like to modularize the current code base and apply some of the ideas
>> that have emerged around Richard's "biojava 3" code. In principle the
>> idea is that all changes should be backwards compatible with the
>> interfaces provided by the current biojava 1.7 release. ?Backwards
>> compatibility shall only be broken if the functionality is being
>> replaced with something that works better, and gets documented
>> accordingly. For the build functionality I would suggest to stick with
>> what Richard's biojava 3 code base already is providing. Since we will
>> try to be backwards compatible all code development should be part of
>> the biojava-trunk and the first step will be to move the ant-build
>> scripts to a maven build process. Following this procedure will allow
>> to use e.g. the code refactoring tools provided by Eclipse, which
>> should come in handy.
>>
>> The modules I would like to see should provide self-contained
>> functionality and cross dependencies should be restricted to a
>> minimum. I would suggest to have the following modules:
>>
>> biojava-core: Contains everything that can not easily be modularized
>> or nobody volunteers to become a module maintainer.
>> biojava-phylogeny: Scooter expressed some interested to provide such a
>> module and become package maintainer for it.
>> biojava-structure: Everything protein structure related. I would be
>> package maintainer.
>> biojava-blast: Blast parsing is a frequently requested functionality
>> and it would be good to have this code self-contained. A package
>> maintainer for this still will need to be nominated at a later stage.
>> Any suggestions for other modules?
>>
>> Let me know what you think about this.
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> _________________________
>
> CONFIDENTIALITY NOTICE
>
> The information contained in this e-mail message is intended only for the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure under
> applicable law. If the reader of this message is not the intended recipient,
> or the employee or agent responsible for delivery of the message to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If you
> have received this communication in error, please notify the sender
> immediately by e-mail and delete the material from any computer. ?Thank you.


From andreas at sdsc.edu  Wed May 13 00:45:54 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 12 May 2009 17:45:54 -0700
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <OF8495A026.AC43734D-ON482575B5.000057FD-482575B5.0000DF4C@ah.novartis.com>
References: <59a41c430905121659q75601cbie13f4c499ba8b679@mail.gmail.com>
	<OF8495A026.AC43734D-ON482575B5.000057FD-482575B5.0000DF4C@ah.novartis.com>
Message-ID: <59a41c430905121745p7325d69dgf7e4d916746bf14d@mail.gmail.com>

The point with the auto-generated code raises actually another
question to me: How shall we deal with auto-generated code?

I also have some code that is  currently not part on BioJava, but it
might be useful for other people: It allows to parse uniprot XML files
and serialize / de-serialize the objects to a database using EJBs,
hibernate and the uniprot XML files.

How far should biojava go in supporting such auto generated or
semi-auto generated code?
A


On Tue, May 12, 2009 at 5:09 PM,  <mark.schreiber at novartis.com> wrote:
>
> A while back I gave Richard some code that uses JAXB to objectify (and
> deobjectify) BLAST XML output. This might be useful for parsing BLAST
> results from the webservices which normally use BLAST XML. I could probably
> dig it up again if needed (it was autogenerated anyway).
>
> It would probably be a good object model for BLAST output if people want to
> parse other types of BLAST output (such as flatfile, but who would want to
> do that!). ?The BLAST XML seems to accommodate strange flavours of BLAST
> such as PSI-BLAST etc and also has been much more stable than the default
> flat file output.
>
> - Mark
>
>
>
> Andreas Prlic <andreas at sdsc.edu>
> Sent by: biojava-dev-bounces at lists.open-bio.org
>
> 05/13/2009 08:02 AM
>
> To
> Scooter Willis <HWillis at scripps.edu>
> cc
> biojava-dev <biojava-dev at lists.open-bio.org>
> Subject
> Re: [Biojava-dev] Plans for next biojava release - modularization
>
>
>
>
> Hi Scooter,
>
> about your suggestion for the blast webservice client code: In
> principle I like the idea and we have had questions on the mailing
> list regarding this in the past. Only thing is I think there is
> already some client code in java available:
> http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp
> but I am not sure how good that Java client library is....
>
> Besides this, there is the need for work on our blast parser library
> and if you are interested in working on that you are welcome. As I
> mentioned, I think this should become its own module, due to the
> popularity of that code.
>
> Andreas
>
>
>
>
> On Tue, May 12, 2009 at 6:34 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>> Mark
>>
>>
>>
>> It is a challenge on knowing where to draw the line. Allowing both options
>> is a reasonable approach. The implementation of the algorithm is key to
>> allow it to be multi-threaded or being able to run in parallel. One
>> approach
>> is to provide a standard interface such as process() would wait for the
>> result/return value and run in the parent thread. To run the algorithm in
>> a
>> thread you can have a startProcess() where you can add yourself as a
>> progress listener and when complete() method is called you can call
>> getResults(). You can then also have the corresponding stopProcess() which
>> would set an internal value to cause all threads to quit. ?Lots of ways to
>> tackle the problem the key is to start talking about it and at minimum
>> take
>> advantage of multiple-cores where the external code can set the number of
>> cores to use. You can get a dual quad core machine these days for < $1000
>> but most software implementations are not designed to take advantage of
>> it.
>>
>>
>>
>> The real question is what exists today in the BioJava API that is
>> considered
>> long running in normal use case and thus is a candidate to be run in
>> parallel. It may not be an issue in existing BioJava code. When I first
>> started using BioJava I went looking for BLAST code only to find a BLAST
>> parser. I wanted to do a Multiple Sequence Alignment and turns out that
>> Biojava code calls CLUSTALW as an external processor under the covers. ?I
>> also needed code to construct trees from an MSA and found the summer of
>> code
>> project that was only focused on representing the tree.
>>
>>
>>
>> It would be nice to have a BLAST implementation in Java optimized to run
>> on
>> a cluster but who has time to rewrite BLAST in Java when you can do BLAST
>> search via the web and focus on parsing the results. BioJava needs a BLAST
>> API that makes a web services call to an external service and gets returns
>> structured results in core BioJava structures. Probably not difficult to
>> do
>> a Java version of CLUSTALW but again we can push the work out to
>> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the
>> results
>> back returned in BioJava structures.
>>
>>
>>
>> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW web
>> service -> BioJava code. I haven?t done the research but it seems that
>> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of work to
>> expose common biology ?computational services. If multiple external
>> services
>> are offering BLAST via web services where each picked a different
>> implementation then BioJava could provide abstraction to different
>> services.
>>
>>
>>
>> Thanks
>>
>> Scooter
>>
>>
>>
>> From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com]
>> Sent: Tuesday, May 12, 2009 1:27 AM
>> To: Scooter Willis
>> Cc: Andreas Prlic; biojava-dev
>> Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
>>
>>
>>
>> Hi -
>>
>> This was one thing we discussed previously with respect to biojava 3.
>> ?Generally I support the idea because almost all computers are now
>> multi-core and as you say cloud or utility computing is already a reality.
>>
>> However, I tend to think that biojava should not control threading or
>> concurrency. This should be done by the developer. This is because
>> sometimes
>> mutithreading can be fast on a slow computer but slow on a fast computer
>> (due to the overhead in spawning threads) so programs need to be tunable.
>> Also Java app servers and things like Sun Grid Engine, EC2 etc don't like
>> people attempting to control their own threads. ?What BioJava should do is
>> expose granular and thread-safe operations that can be threaded or form
>> discrete tasks on a utility grid or complete in SessionBeans on an App
>> server. ?For example it would be better if BioJava had a single threaded
>> method to calculate the GC of a single sequence rather than a
>> multi-threaded
>> method that calculates the GC of multiple sequences. ?This would let the
>> developer make a multithreaded version if desired or distribute multiple
>> tasks based on the single threaded version to a compute cloud (and let the
>> cloud manage all the tasks).
>>
>> Possibly the best situation would be to have the single threaded fine
>> grain
>> operations that let developers or grid engines control threading and then
>> higher level APIs that do it for you (or good cookbook examples that show
>> you how to do it). ?Another idea that was discussed was the use of
>> properties files to allow people to set how many CPUs they wanted to make
>> available to the JVM or name packages that can or cannot use threading.
>>
>> Finally, there are lots of times when it is highly desirable to use Java
>> beans because they play well with dozens of Java api's however beans don't
>> work well with threads because they have public setter methods. ?I would
>> like to see a lot more bean use in a future BioJava because it would make
>> life so much easier but a lot of care would need to be taken to make sure
>> thread safety is preserved. ?There are many patterns that can be used such
>> as synchronization locks etc to make things thread safe so I think this
>> can
>> be achieved as long as we are disciplined and consider that all methods
>> may
>> be used in a multi-threaded application (even if we write the method as a
>> single thread). ?If there are code checkers that make suggestions on
>> thread
>> safety it would be great to have these as part of the standard build
>> process. ?Good documentation would go a long way as well. ?Are there unit
>> test patterns that can catch these problems as well? ?Suggestions would be
>> great.
>>
>> Progress Listener patterns are good but it depends on the situation and
>> might be better handled in high level APIs or left to the developer. ?For
>> example in your NJ code a progress listener would be good if someone fed
>> 1000 sequences into the method but not if they only put in 10. Also code
>> running on an old machine might need a progress listener but the same
>> problem on a new machine may complete almost instantly. ?Probably a
>> pluggable listener would be the way to go. ?Also it might be possible to
>> do
>> this using the new JDK APIs that let you take a peek at the stack trace.
>> Even if your NJ method didn't allow for a progress listener a developer
>> could still make one by looking at the method calls in the stack. As long
>> as
>> your NJ method called other methods internally for each sequence (quite
>> likely) it would be possible to observe the cycle of method calls from the
>> stack. ?This might make it possible to have a very general BioJava
>> progress
>> listener that can be told to count the number of times a method is called
>> in
>> the stack. The name of the method would be the argument. ?If the
>> application
>> runs in a Java App server you can also do this very easily with a method
>> Interceptor.
>>
>> - Mark
>>
>> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
>>
>>> Andreas
>>>
>>> Another theme that should be considered is providing a multi-thread
>>> version of any module with long run time. This would have a couple
>>> elements. A progress listener interface should be standard where core
>>> code would update progress messages to listeners that can be used by
>>> external code to display feedback to the user. I did this with the
>>> Neighbor Joining code for tree construction and it provides needed
>>> feedback in a GUI. If not the user gets frustrated because they don't
>>> know the code they are about to execute may take 10 minutes or 8 hours
>>> to complete and they think the software is not working. The reverse is
>>> also true for canceling an operation where you want to have core code
>>> stop processing a long running loop. Once the code has completed then
>>> the listener interface for process complete is called allowing the next
>>> step in the external code to continue. The developer would have the
>>> choice to call the "process" method or run it in a thread and wait for
>>> the callback complete method to be called.
>>>
>>> This is the first step in the ability to have the core/long running
>>> processes take advantage of multiple threads to complete the
>>> computational task faster. Not all code can be parallelized easily but
>>> if the algorithm can take advantage of running in parallel then it
>>> should. This then opens up a couple of cloud computing frameworks that
>>> extend the multi-threaded concepts in Java across a cluster
>>> http://www.terracotta.org/. If we put an emphasis on having code that
>>> runs well in a thread we are one step closer to an architecture that can
>>> run in a cloud. The computational problems are only going to get bigger
>>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
>>> computational IO cycles are going to be cheap as long as the
>>> software/libraries can easily take advantage of it.
>>>
>>> Thanks
>>>
>>> Scooter
>>>
>>> -----Original Message-----
>>> From: biojava-dev-bounces at lists.open-bio.org
>>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
>>> Prlic
>>> Sent: Monday, May 11, 2009 12:27 AM
>>> To: biojava-dev
>>> Subject: [Biojava-dev] Plans for next biojava release - modularization
>>>
>>> Hi biojava-devs,
>>>
>>> It is time to start working on the next biojava release. ?I ?would
>>> like to modularize the current code base and apply some of the ideas
>>> that have emerged around Richard's "biojava 3" code. In principle the
>>> idea is that all changes should be backwards compatible with the
>>> interfaces provided by the current biojava 1.7 release. ?Backwards
>>> compatibility shall only be broken if the functionality is being
>>> replaced with something that works better, and gets documented
>>> accordingly. For the build functionality I would suggest to stick with
>>> what Richard's biojava 3 code base already is providing. Since we will
>>> try to be backwards compatible all code development should be part of
>>> the biojava-trunk and the first step will be to move the ant-build
>>> scripts to a maven build process. Following this procedure will allow
>>> to use e.g. the code refactoring tools provided by Eclipse, which
>>> should come in handy.
>>>
>>> The modules I would like to see should provide self-contained
>>> functionality and cross dependencies should be restricted to a
>>> minimum. I would suggest to have the following modules:
>>>
>>> biojava-core: Contains everything that can not easily be modularized
>>> or nobody volunteers to become a module maintainer.
>>> biojava-phylogeny: Scooter expressed some interested to provide such a
>>> module and become package maintainer for it.
>>> biojava-structure: Everything protein structure related. I would be
>>> package maintainer.
>>> biojava-blast: Blast parsing is a frequently requested functionality
>>> and it would be good to have this code self-contained. A package
>>> maintainer for this still will need to be nominated at a later stage.
>>> Any suggestions for other modules?
>>>
>>> Let me know what you think about this.
>>>
>>> Andreas
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _________________________
>>
>> CONFIDENTIALITY NOTICE
>>
>> The information contained in this e-mail message is intended only for the
>> exclusive use of the individual or entity named above and may contain
>> information that is privileged, confidential or exempt from disclosure
>> under
>> applicable law. If the reader of this message is not the intended
>> recipient,
>> or the employee or agent responsible for delivery of the message to the
>> intended recipient, you are hereby notified that any dissemination,
>> distribution or copying of this communication is strictly prohibited. If
>> you
>> have received this communication in error, please notify the sender
>> immediately by e-mail and delete the material from any computer. ?Thank
>> you.
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>


From mark.schreiber at novartis.com  Wed May 13 02:15:27 2009
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 13 May 2009 10:15:27 +0800
Subject: [Biojava-dev] Plans for next biojava release - modularization
In-Reply-To: <59a41c430905121745p7325d69dgf7e4d916746bf14d@mail.gmail.com>
Message-ID: <OF3FD186AB.FA0D8059-ON482575B5.000A55FA-482575B5.000C66CB@ah.novartis.com>

Hi -

I think it depends if the code is going to be auto-generated at each build 
or only once.  I have autogenerated Entity classes for BioSQL tables. My 
recommendation would be that these be used for JPA mapping to BioSQL from 
BioJava.  I think these only need be generated once (unless the BioSQL 
schema changes), especially as the autogeneration didn't quite catch some 
of the subtleties of the schema.  They can also be in their own module, 
not the core.

Classes that map to XML or webservice clients can be autogenerated from 
XML schema, DTD or WSDL once or at every build (automatically from ANT and 
probably Maven).  In these cases it may pay to do it with every build 
because these classes are completely boiler plate code and should never 
need to be manually modified.  Also it means the code for these utility 
classes will never be in the code base and at will not be possible for 
someone to change it accidentally (and the code base will be smaller). 
Only the XSD or WSDL will be in subversion (and any higher level code that 
makes use of the boilerplate client code).  Improvements in the 
boilerplate code or changes that come with updates to JAXB and similar 
will automatically appear at the next build (when we change JAXB 
versions).

Conceptually the BLAST XML parsing module may consist of only the BLAST 
XSD (or DTD) and a high-level biojava class like the following:

public interface BlastParser {
        public Serializable[] parseBlast(URL url){
                Calls bioler plate code...
        } 

        public Serializable[] parseBlast(String blastXMLOutput){
                Calls bioler plate code...
        }
}

The code for the bit that does the JAXB marshalling etc could be generated 
at build time.  The Serializable array would be the objects that JAXB 
generates. Probably they would be a more specific stub that implements 
serializable (eg BlastResult or similar depending on the XSD).

I think it really comes down to a question of how much the generated code 
is boilerplate code that will never be changed. If it is not 'modifiable' 
then it can be generated at build. If the autogenerated code is an outline 
of a class where method bodies need to be filled in or customized then 
they should not be autogenerated at build time.  A good example would be 
JUnit classes that can be autogenerated to give you a template that will 
compile and run but probably will not perform a sensible test.  The 
developer of the test could autogenerate the template but would then need 
to make the test sensible. At that point the test should be in the code 
base and should not be regenerated at build time.

- Mark

biojava-dev-bounces at lists.open-bio.org wrote on 05/13/2009 08:45:54 AM:

> The point with the auto-generated code raises actually another
> question to me: How shall we deal with auto-generated code?
> 
> I also have some code that is  currently not part on BioJava, but it
> might be useful for other people: It allows to parse uniprot XML files
> and serialize / de-serialize the objects to a database using EJBs,
> hibernate and the uniprot XML files.
> 
> How far should biojava go in supporting such auto generated or
> semi-auto generated code?
> A
> 
> 
> On Tue, May 12, 2009 at 5:09 PM,  <mark.schreiber at novartis.com> wrote:
> >
> > A while back I gave Richard some code that uses JAXB to objectify (and
> > deobjectify) BLAST XML output. This might be useful for parsing BLAST
> > results from the webservices which normally use BLAST XML. I could 
probably
> > dig it up again if needed (it was autogenerated anyway).
> >
> > It would probably be a good object model for BLAST output if people 
want to
> > parse other types of BLAST output (such as flatfile, but who would 
want to
> > do that!).  The BLAST XML seems to accommodate strange flavours of 
BLAST
> > such as PSI-BLAST etc and also has been much more stable than the 
default
> > flat file output.
> >
> > - Mark
> >
> >
> >
> > Andreas Prlic <andreas at sdsc.edu>
> > Sent by: biojava-dev-bounces at lists.open-bio.org
> >
> > 05/13/2009 08:02 AM
> >
> > To
> > Scooter Willis <HWillis at scripps.edu>
> > cc
> > biojava-dev <biojava-dev at lists.open-bio.org>
> > Subject
> > Re: [Biojava-dev] Plans for next biojava release - modularization
> >
> >
> >
> >
> > Hi Scooter,
> >
> > about your suggestion for the blast webservice client code: In
> > principle I like the idea and we have had questions on the mailing
> > list regarding this in the past. Only thing is I think there is
> > already some client code in java available:
> > http://www.ebi.ac.uk/Tools/webservices/clients/blastpgp
> > but I am not sure how good that Java client library is....
> >
> > Besides this, there is the need for work on our blast parser library
> > and if you are interested in working on that you are welcome. As I
> > mentioned, I think this should become its own module, due to the
> > popularity of that code.
> >
> > Andreas
> >
> >
> >
> >
> > On Tue, May 12, 2009 at 6:34 AM, Scooter Willis <HWillis at scripps.edu> 
wrote:
> >> Mark
> >>
> >>
> >>
> >> It is a challenge on knowing where to draw the line. Allowing both 
options
> >> is a reasonable approach. The implementation of the algorithm is key 
to
> >> allow it to be multi-threaded or being able to run in parallel. One
> >> approach
> >> is to provide a standard interface such as process() would wait for 
the
> >> result/return value and run in the parent thread. To run the 
algorithm in
> >> a
> >> thread you can have a startProcess() where you can add yourself as a
> >> progress listener and when complete() method is called you can call
> >> getResults(). You can then also have the corresponding stopProcess() 
which
> >> would set an internal value to cause all threads to quit.  Lots of 
ways to
> >> tackle the problem the key is to start talking about it and at 
minimum
> >> take
> >> advantage of multiple-cores where the external code can set the 
number of
> >> cores to use. You can get a dual quad core machine these days for < 
$1000
> >> but most software implementations are not designed to take advantage 
of
> >> it.
> >>
> >>
> >>
> >> The real question is what exists today in the BioJava API that is
> >> considered
> >> long running in normal use case and thus is a candidate to be run in
> >> parallel. It may not be an issue in existing BioJava code. When I 
first
> >> started using BioJava I went looking for BLAST code only to find a 
BLAST
> >> parser. I wanted to do a Multiple Sequence Alignment and turns out 
that
> >> Biojava code calls CLUSTALW as an external processor under the 
covers.  I
> >> also needed code to construct trees from an MSA and found the summer 
of
> >> code
> >> project that was only focused on representing the tree.
> >>
> >>
> >>
> >> It would be nice to have a BLAST implementation in Java optimized to 
run
> >> on
> >> a cluster but who has time to rewrite BLAST in Java when you can do 
BLAST
> >> search via the web and focus on parsing the results. BioJava needs a 
BLAST
> >> API that makes a web services call to an external service and gets 
returns
> >> structured results in core BioJava structures. Probably not difficult 
to
> >> do
> >> a Java version of CLUSTALW but again we can push the work out to
> >> http://www.ebi.ac.uk/Tools/webservices/services/clustalw and get the
> >> results
> >> back returned in BioJava structures.
> >>
> >>
> >>
> >> I can signup for doing a BLAST web service -> BioJava and a CLUSTALW 
web
> >> service -> BioJava code. I haven?t done the research but it seems 
that
> >> http://www.ebi.ac.uk/Tools/webservices/ has done a fair amount of 
work to
> >> expose common biology  computational services. If multiple external
> >> services
> >> are offering BLAST via web services where each picked a different
> >> implementation then BioJava could provide abstraction to different
> >> services.
> >>
> >>
> >>
> >> Thanks
> >>
> >> Scooter
> >>
> >>
> >>
> >> From: mark.schreiber at novartis.com 
[mailto:mark.schreiber at novartis.com]
> >> Sent: Tuesday, May 12, 2009 1:27 AM
> >> To: Scooter Willis
> >> Cc: Andreas Prlic; biojava-dev
> >> Subject: Re: [Biojava-dev] Plans for next biojava release - 
modularization
> >>
> >>
> >>
> >> Hi -
> >>
> >> This was one thing we discussed previously with respect to biojava 3.
> >>  Generally I support the idea because almost all computers are now
> >> multi-core and as you say cloud or utility computing is already a 
reality.
> >>
> >> However, I tend to think that biojava should not control threading or
> >> concurrency. This should be done by the developer. This is because
> >> sometimes
> >> mutithreading can be fast on a slow computer but slow on a fast 
computer
> >> (due to the overhead in spawning threads) so programs need to be 
tunable.
> >> Also Java app servers and things like Sun Grid Engine, EC2 etc don't 
like
> >> people attempting to control their own threads.  What BioJava should 
do is
> >> expose granular and thread-safe operations that can be threaded or 
form
> >> discrete tasks on a utility grid or complete in SessionBeans on an 
App
> >> server.  For example it would be better if BioJava had a single 
threaded
> >> method to calculate the GC of a single sequence rather than a
> >> multi-threaded
> >> method that calculates the GC of multiple sequences.  This would let 
the
> >> developer make a multithreaded version if desired or distribute 
multiple
> >> tasks based on the single threaded version to a compute cloud (and 
let the
> >> cloud manage all the tasks).
> >>
> >> Possibly the best situation would be to have the single threaded fine
> >> grain
> >> operations that let developers or grid engines control threading and 
then
> >> higher level APIs that do it for you (or good cookbook examples that 
show
> >> you how to do it).  Another idea that was discussed was the use of
> >> properties files to allow people to set how many CPUs they wanted to 
make
> >> available to the JVM or name packages that can or cannot use 
threading.
> >>
> >> Finally, there are lots of times when it is highly desirable to use 
Java
> >> beans because they play well with dozens of Java api's however beans 
don't
> >> work well with threads because they have public setter methods.  I 
would
> >> like to see a lot more bean use in a future BioJava because it would 
make
> >> life so much easier but a lot of care would need to be taken to make 
sure
> >> thread safety is preserved.  There are many patterns that can be used 
such
> >> as synchronization locks etc to make things thread safe so I think 
this
> >> can
> >> be achieved as long as we are disciplined and consider that all 
methods
> >> may
> >> be used in a multi-threaded application (even if we write the method 
as a
> >> single thread).  If there are code checkers that make suggestions on
> >> thread
> >> safety it would be great to have these as part of the standard build
> >> process.  Good documentation would go a long way as well.  Are there 
unit
> >> test patterns that can catch these problems as well?  Suggestions 
would be
> >> great.
> >>
> >> Progress Listener patterns are good but it depends on the situation 
and
> >> might be better handled in high level APIs or left to the developer. 
 For
> >> example in your NJ code a progress listener would be good if someone 
fed
> >> 1000 sequences into the method but not if they only put in 10. Also 
code
> >> running on an old machine might need a progress listener but the same
> >> problem on a new machine may complete almost instantly.  Probably a
> >> pluggable listener would be the way to go.  Also it might be possible 
to
> >> do
> >> this using the new JDK APIs that let you take a peek at the stack 
trace.
> >> Even if your NJ method didn't allow for a progress listener a 
developer
> >> could still make one by looking at the method calls in the stack. As 
long
> >> as
> >> your NJ method called other methods internally for each sequence 
(quite
> >> likely) it would be possible to observe the cycle of method calls 
from the
> >> stack.  This might make it possible to have a very general BioJava
> >> progress
> >> listener that can be told to count the number of times a method is 
called
> >> in
> >> the stack. The name of the method would be the argument.  If the
> >> application
> >> runs in a Java App server you can also do this very easily with a 
method
> >> Interceptor.
> >>
> >> - Mark
> >>
> >> biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 
PM:
> >>
> >>> Andreas
> >>>
> >>> Another theme that should be considered is providing a multi-thread
> >>> version of any module with long run time. This would have a couple
> >>> elements. A progress listener interface should be standard where 
core
> >>> code would update progress messages to listeners that can be used by
> >>> external code to display feedback to the user. I did this with the
> >>> Neighbor Joining code for tree construction and it provides needed
> >>> feedback in a GUI. If not the user gets frustrated because they 
don't
> >>> know the code they are about to execute may take 10 minutes or 8 
hours
> >>> to complete and they think the software is not working. The reverse 
is
> >>> also true for canceling an operation where you want to have core 
code
> >>> stop processing a long running loop. Once the code has completed 
then
> >>> the listener interface for process complete is called allowing the 
next
> >>> step in the external code to continue. The developer would have the
> >>> choice to call the "process" method or run it in a thread and wait 
for
> >>> the callback complete method to be called.
> >>>
> >>> This is the first step in the ability to have the core/long running
> >>> processes take advantage of multiple threads to complete the
> >>> computational task faster. Not all code can be parallelized easily 
but
> >>> if the algorithm can take advantage of running in parallel then it
> >>> should. This then opens up a couple of cloud computing frameworks 
that
> >>> extend the multi-threaded concepts in Java across a cluster
> >>> http://www.terracotta.org/. If we put an emphasis on having code 
that
> >>> runs well in a thread we are one step closer to an architecture that 
can
> >>> run in a cloud. The computational problems are only going to get 
bigger
> >>> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> >>> computational IO cycles are going to be cheap as long as the
> >>> software/libraries can easily take advantage of it.
> >>>
> >>> Thanks
> >>>
> >>> Scooter
> >>>
> >>> -----Original Message-----
> >>> From: biojava-dev-bounces at lists.open-bio.org
> >>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> >>> Prlic
> >>> Sent: Monday, May 11, 2009 12:27 AM
> >>> To: biojava-dev
> >>> Subject: [Biojava-dev] Plans for next biojava release - 
modularization
> >>>
> >>> Hi biojava-devs,
> >>>
> >>> It is time to start working on the next biojava release.  I  would
> >>> like to modularize the current code base and apply some of the ideas
> >>> that have emerged around Richard's "biojava 3" code. In principle 
the
> >>> idea is that all changes should be backwards compatible with the
> >>> interfaces provided by the current biojava 1.7 release.  Backwards
> >>> compatibility shall only be broken if the functionality is being
> >>> replaced with something that works better, and gets documented
> >>> accordingly. For the build functionality I would suggest to stick 
with
> >>> what Richard's biojava 3 code base already is providing. Since we 
will
> >>> try to be backwards compatible all code development should be part 
of
> >>> the biojava-trunk and the first step will be to move the ant-build
> >>> scripts to a maven build process. Following this procedure will 
allow
> >>> to use e.g. the code refactoring tools provided by Eclipse, which
> >>> should come in handy.
> >>>
> >>> The modules I would like to see should provide self-contained
> >>> functionality and cross dependencies should be restricted to a
> >>> minimum. I would suggest to have the following modules:
> >>>
> >>> biojava-core: Contains everything that can not easily be modularized
> >>> or nobody volunteers to become a module maintainer.
> >>> biojava-phylogeny: Scooter expressed some interested to provide such 
a
> >>> module and become package maintainer for it.
> >>> biojava-structure: Everything protein structure related. I would be
> >>> package maintainer.
> >>> biojava-blast: Blast parsing is a frequently requested functionality
> >>> and it would be good to have this code self-contained. A package
> >>> maintainer for this still will need to be nominated at a later 
stage.
> >>> Any suggestions for other modules?
> >>>
> >>> Let me know what you think about this.
> >>>
> >>> Andreas
> >>> _______________________________________________
> >>> biojava-dev mailing list
> >>> biojava-dev at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>>
> >>> _______________________________________________
> >>> biojava-dev mailing list
> >>> biojava-dev at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>
> >> _________________________
> >>
> >> CONFIDENTIALITY NOTICE
> >>
> >> The information contained in this e-mail message is intended only for 
the
> >> exclusive use of the individual or entity named above and may contain
> >> information that is privileged, confidential or exempt from 
disclosure
> >> under
> >> applicable law. If the reader of this message is not the intended
> >> recipient,
> >> or the employee or agent responsible for delivery of the message to 
the
> >> intended recipient, you are hereby notified that any dissemination,
> >> distribution or copying of this communication is strictly prohibited. 
If
> >> you
> >> have received this communication in error, please notify the sender
> >> immediately by e-mail and delete the material from any computer. 
 Thank
> >> you.
> >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
> >
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From msmoot at ucsd.edu  Thu May 21 23:47:22 2009
From: msmoot at ucsd.edu (Mike Smoot)
Date: Thu, 21 May 2009 16:47:22 -0700
Subject: [Biojava-dev] an outsider's take on Biojava 3
Message-ID: <f9ac1d730905211647i7e80aaa2xcaa77d43ff8ea4c3@mail.gmail.com>

Hi Everyone,

I thought I'd respond to Andreas' request for participation in the BioJava 3
design discussions that he made last week on the normal BioJava list.  I'm
the lead developer on the Cytoscape project (http://cytoscape.org), so I
thought I'd provide some perspective on what a project using BioJava might
look for in BioJava 3.

Basically, I'd just like to voice my strong support for the "Basic
Principles" listed here: http://biojava.org/wiki/BioJava3_Design.  Finer
granularity of jars, acyclic dependencies, and the separation of API and
implementation are precisely the things we're doing in Cytoscape 3.  The
first two points will go a long way towards making it easier to use specific
parts of the library without needing everything at once.  The second point
will allow alternative implementations of certain interfaces, which is one
approach to dealing with issues like parallel vs. non-parallel versions of
algorithms.  Maven also sounds great.

If I could add one bullet to the list, it would be to add OSGi metadata to
the jars to allow easy integration with OSGi-based projects (such as
Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven
plugins to make this dead simple and it wouldn't impact anyone not using
OSGi.

Please take that with a large grain of salt, I just thought you might
appreciate an outsider's perspective!

thanks,
Mike

-- 
____________________________________________________________
Michael Smoot, Ph.D.               Bioengineering Department
tel: 858-822-4756         University of California San Diego


From markjschreiber at gmail.com  Fri May 22 02:59:14 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 22 May 2009 10:59:14 +0800
Subject: [Biojava-dev] an outsider's take on Biojava 3
In-Reply-To: <f9ac1d730905211647i7e80aaa2xcaa77d43ff8ea4c3@mail.gmail.com>
References: <f9ac1d730905211647i7e80aaa2xcaa77d43ff8ea4c3@mail.gmail.com>
Message-ID: <93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com>

Thanks for the comments. The OSGi system sounds interesting. I think
we should consider it.

I have also added two more recommendations for the Design Principles:


On Fri, May 22, 2009 at 7:47 AM, Mike Smoot <msmoot at ucsd.edu> wrote:
> Hi Everyone,
>
> I thought I'd respond to Andreas' request for participation in the BioJava 3
> design discussions that he made last week on the normal BioJava list. ?I'm
> the lead developer on the Cytoscape project (http://cytoscape.org), so I
> thought I'd provide some perspective on what a project using BioJava might
> look for in BioJava 3.
>
> Basically, I'd just like to voice my strong support for the "Basic
> Principles" listed here: http://biojava.org/wiki/BioJava3_Design. ?Finer
> granularity of jars, acyclic dependencies, and the separation of API and
> implementation are precisely the things we're doing in Cytoscape 3. ?The
> first two points will go a long way towards making it easier to use specific
> parts of the library without needing everything at once. ?The second point
> will allow alternative implementations of certain interfaces, which is one
> approach to dealing with issues like parallel vs. non-parallel versions of
> algorithms. ?Maven also sounds great.
>
> If I could add one bullet to the list, it would be to add OSGi metadata to
> the jars to allow easy integration with OSGi-based projects (such as
> Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven
> plugins to make this dead simple and it wouldn't impact anyone not using
> OSGi.
>
> Please take that with a large grain of salt, I just thought you might
> appreciate an outsider's perspective!
>
> thanks,
> Mike
>
> --
> ____________________________________________________________
> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department
> tel: 858-822-4756 ? ? ? ? University of California San Diego
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From markjschreiber at gmail.com  Fri May 22 03:01:57 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 22 May 2009 11:01:57 +0800
Subject: [Biojava-dev] an outsider's take on Biojava 3
In-Reply-To: <93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com>
References: <f9ac1d730905211647i7e80aaa2xcaa77d43ff8ea4c3@mail.gmail.com> 
	<93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com>
Message-ID: <93b45ca50905212001v70067680mafb8f0bc36f6c497@mail.gmail.com>

Sorry, sent before I said what the new principles were.

1. Extensive use of the Logging API
2. (At the risk of having a fatwa declared against me) Most biojava
exceptions should derive from RuntimeException and be unchecked

See the wiki page for more details.

- Mark

On Fri, May 22, 2009 at 10:59 AM, Mark Schreiber
<markjschreiber at gmail.com> wrote:
> Thanks for the comments. The OSGi system sounds interesting. I think
> we should consider it.
>
> I have also added two more recommendations for the Design Principles:
>
>
> On Fri, May 22, 2009 at 7:47 AM, Mike Smoot <msmoot at ucsd.edu> wrote:
>> Hi Everyone,
>>
>> I thought I'd respond to Andreas' request for participation in the BioJava 3
>> design discussions that he made last week on the normal BioJava list. ?I'm
>> the lead developer on the Cytoscape project (http://cytoscape.org), so I
>> thought I'd provide some perspective on what a project using BioJava might
>> look for in BioJava 3.
>>
>> Basically, I'd just like to voice my strong support for the "Basic
>> Principles" listed here: http://biojava.org/wiki/BioJava3_Design. ?Finer
>> granularity of jars, acyclic dependencies, and the separation of API and
>> implementation are precisely the things we're doing in Cytoscape 3. ?The
>> first two points will go a long way towards making it easier to use specific
>> parts of the library without needing everything at once. ?The second point
>> will allow alternative implementations of certain interfaces, which is one
>> approach to dealing with issues like parallel vs. non-parallel versions of
>> algorithms. ?Maven also sounds great.
>>
>> If I could add one bullet to the list, it would be to add OSGi metadata to
>> the jars to allow easy integration with OSGi-based projects (such as
>> Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven
>> plugins to make this dead simple and it wouldn't impact anyone not using
>> OSGi.
>>
>> Please take that with a large grain of salt, I just thought you might
>> appreciate an outsider's perspective!
>>
>> thanks,
>> Mike
>>
>> --
>> ____________________________________________________________
>> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department
>> tel: 858-822-4756 ? ? ? ? University of California San Diego
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>


From holland at eaglegenomics.com  Fri May 22 09:02:43 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 22 May 2009 10:02:43 +0100
Subject: [Biojava-dev] an outsider's take on Biojava 3
In-Reply-To: <93b45ca50905212001v70067680mafb8f0bc36f6c497@mail.gmail.com>
References: <f9ac1d730905211647i7e80aaa2xcaa77d43ff8ea4c3@mail.gmail.com>
	<93b45ca50905211959r2c440034r72ca73306a8a3925@mail.gmail.com>
	<93b45ca50905212001v70067680mafb8f0bc36f6c497@mail.gmail.com>
Message-ID: <1242982963.10413.6.camel@buzzybee>

RuntimeException is good for things that can't be recovered from. If the
user has provided bad coordinates or invalid sequence, that's a
recoverable error (because there's a chance that they came from user
input via a user interface, which can be corrected and retried). Even
file parsing exceptions should be recoverable - the user can move on to
the next record without borking the entire file (we already see broken
records quite a lot in Genbank downloads).

But, for things like being unable to call out to Blast, or being unable
to convert DNA to Protein because of a misconfiguration internally
somewhere, I agree that RuntimeExceptions are probably best. These are
unrecoverable and indicate that changes need to be made to the
programming code or BioJava itself.

So in my mind then RuntimeExceptions are good for highlighting
programming errors, but not good for errors relating to invalid input
data.


On Fri, 2009-05-22 at 11:01 +0800, Mark Schreiber wrote:
> Sorry, sent before I said what the new principles were.
> 
> 1. Extensive use of the Logging API
> 2. (At the risk of having a fatwa declared against me) Most biojava
> exceptions should derive from RuntimeException and be unchecked
> 
> See the wiki page for more details.
> 
> - Mark
> 
> On Fri, May 22, 2009 at 10:59 AM, Mark Schreiber
> <markjschreiber at gmail.com> wrote:
> > Thanks for the comments. The OSGi system sounds interesting. I think
> > we should consider it.
> >
> > I have also added two more recommendations for the Design Principles:
> >
> >
> > On Fri, May 22, 2009 at 7:47 AM, Mike Smoot <msmoot at ucsd.edu> wrote:
> >> Hi Everyone,
> >>
> >> I thought I'd respond to Andreas' request for participation in the BioJava 3
> >> design discussions that he made last week on the normal BioJava list.  I'm
> >> the lead developer on the Cytoscape project (http://cytoscape.org), so I
> >> thought I'd provide some perspective on what a project using BioJava might
> >> look for in BioJava 3.
> >>
> >> Basically, I'd just like to voice my strong support for the "Basic
> >> Principles" listed here: http://biojava.org/wiki/BioJava3_Design.  Finer
> >> granularity of jars, acyclic dependencies, and the separation of API and
> >> implementation are precisely the things we're doing in Cytoscape 3.  The
> >> first two points will go a long way towards making it easier to use specific
> >> parts of the library without needing everything at once.  The second point
> >> will allow alternative implementations of certain interfaces, which is one
> >> approach to dealing with issues like parallel vs. non-parallel versions of
> >> algorithms.  Maven also sounds great.
> >>
> >> If I could add one bullet to the list, it would be to add OSGi metadata to
> >> the jars to allow easy integration with OSGi-based projects (such as
> >> Cytoscape 3 and (as I'm told) the next version of Taverna). There are maven
> >> plugins to make this dead simple and it wouldn't impact anyone not using
> >> OSGi.
> >>
> >> Please take that with a large grain of salt, I just thought you might
> >> appreciate an outsider's perspective!
> >>
> >> thanks,
> >> Mike
> >>
> >> --
> >> ____________________________________________________________
> >> Michael Smoot, Ph.D.               Bioengineering Department
> >> tel: 858-822-4756         University of California San Diego
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>
> >
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andreas at sdsc.edu  Mon May 25 04:22:09 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 24 May 2009 21:22:09 -0700
Subject: [Biojava-dev] next steps
Message-ID: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>

Hi,

While talking about design requirements, I think we also need to think
pragmatically about how much time we will have to refactor code vs.
re-writing modules from scratch. To get started with the next steps, I
 suggest the following procedure: First thing will be to move to
Maven. Then components should be refactored into independent
sub-modules. Then the submodules can get improved to follow the new
design guidelines. Once we have reached a certain stability with the
re-organized code base, we will make the next release.

Any comments? If there is general agreement about this, I would take
the next step and replace the ant build system with a maven based one.

Andreas


From andreas at sdsc.edu  Mon May 25 15:14:06 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 25 May 2009 08:14:06 -0700
Subject: [Biojava-dev] next steps
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <59a41c430905250814p2cfcc627h477e688637f50ccb@mail.gmail.com>

> build some sort of graph relationship tool. It is also easy enough to start
> dragging packages around to different projects in netbeans and resolve
> compiler errors.

yea, same for Eclipse. The Eclipse Maven plugin allows to auto-convert
a project to Maven (quite easy).  I have played around with it and it
was quite easy to get a mavenized biojava with the dependencies
correctly converted.  That's why I thought it might be the first step.
You suggest to first do the modularization and then the maven meta
data.  I still have to figure out how to make make independent
submodules as part of Maven in eclipse now.... let me play around a
bit more and see how it goes...

The package list sounds good and java 1.6 too.

Andreas


>
> The advantage of smaller tightly group functional jars is that it allows you
> to have more frequent minor releases with out updating and releasing the
> entire biojava package. It also allows individuals to own a smaller block of
> code for unit test, documentation and examples.
>
> With Maven this becomes less of an issue to worry about multiple parts and
> pieces and their relationships. I think we need to divide up into a
> reasonable approximation of the jars before doing the meta data for maven.
>
> Looking at the current package structure this is an attempt of grouping
> jars. I do not have enough code familiarity with all of biojava so this is
> strictly based on package names.
>
> biojava-core Any classes that organize data structures and would probably
> include org.biojava.bio.seq.*. Any utility classes that can be used by other
> packages org.biojava.utils.*
>
> biojava-structure org.biojava.bio.structure.*
>
> biojava-gui org.biojava.bio.gui
>
> biojava-phylo A package that has a few classes for viewing trees structures
> using the jgrapht-jdk package. I need to play with the code and see if it
> actually uses graph generated by jgrapht for anything special. I have code
> that will render a tree as a simple graphic. I have used jgrapht?for other
> projects so it is not a bad "graphing" package for network diagrams. It
> could be refactored out.
>
> Not sure how to tackle the org.biojava.bio.program package as it seems to
> have lots of distinct functional code.
>
> biojava-ws-blast - A web service approach to doing blast. The api would hide
> the web services call
>
> biojava-blast - Blast parsing code. We could have one package for anything
> blast related
>
> biojava-ws-clustalw - A web services approach to doing clustalw multiple
> sequence alignment The api would hide the web services call
>
> biojava-alignment - Code for doing sequence alignment. We could have one
> package for anything alignment related
>
> Does anyone know if you can get usage statistics from maven as to what jar
> files are being downloaded? This would help provide good statistics on what
> code is being used which will help focus on improvements in documentation
> etc.
>
> I assume we are going to make Java 1.6 the minimum requirement moving
> forward? This simplifies some of the web services api requirements to
> minimize the number of external packages that are required.
>
>
> Scooter
>
>
>
>
>
>
>
> ________________________________
> From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic
> Sent: Mon 5/25/2009 12:22 AM
> To: biojava-dev at lists.open-bio.org
> Subject: [Biojava-dev] next steps
>
> Hi,
>
> While talking about design requirements, I think we also need to think
> pragmatically about how much time we will have to refactor code vs.
> re-writing modules from scratch. To get started with the next steps, I
> ?suggest the following procedure: First thing will be to move to
> Maven. Then components should be refactored into independent
> sub-modules. Then the submodules can get improved to follow the new
> design guidelines. Once we have reached a certain stability with the
> re-organized code base, we will make the next release.
>
> Any comments? If there is general agreement about this, I would take
> the next step and replace the ant build system with a maven based one.
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From HWillis at scripps.edu  Mon May 25 14:48:50 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Mon, 25 May 2009 10:48:50 -0400
Subject: [Biojava-dev] next steps
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>

Andreas
 
I was looking at the biojava code yesterday to see how easy it would be to divide up into functionally grouped jars based on package hierarchy. I tried to find some refactoring tools that would give a network graph view of class relationships. It is simple enough to parse source for import statements and build some sort of graph relationship tool. It is also easy enough to start dragging packages around to different projects in netbeans and resolve compiler errors.
 
The advantage of smaller tightly group functional jars is that it allows you to have more frequent minor releases with out updating and releasing the entire biojava package. It also allows individuals to own a smaller block of code for unit test, documentation and examples. 
 
With Maven this becomes less of an issue to worry about multiple parts and pieces and their relationships. I think we need to divide up into a reasonable approximation of the jars before doing the meta data for maven. 
 
Looking at the current package structure this is an attempt of grouping jars. I do not have enough code familiarity with all of biojava so this is strictly based on package names.
 
biojava-core Any classes that organize data structures and would probably include org.biojava.bio.seq.*. Any utility classes that can be used by other packages org.biojava.utils.*
 
biojava-structure org.biojava.bio.structure.*
 
biojava-gui org.biojava.bio.gui
 
biojava-phylo A package that has a few classes for viewing trees structures using the jgrapht-jdk package. I need to play with the code and see if it actually uses graph generated by jgrapht for anything special. I have code that will render a tree as a simple graphic. I have used jgrapht for other projects so it is not a bad "graphing" package for network diagrams. It could be refactored out.
 
Not sure how to tackle the org.biojava.bio.program package as it seems to have lots of distinct functional code.
 
biojava-ws-blast - A web service approach to doing blast. The api would hide the web services call 
 
biojava-blast - Blast parsing code. We could have one package for anything blast related
 
biojava-ws-clustalw - A web services approach to doing clustalw multiple sequence alignment The api would hide the web services call 
 
biojava-alignment - Code for doing sequence alignment. We could have one package for anything alignment related
 
Does anyone know if you can get usage statistics from maven as to what jar files are being downloaded? This would help provide good statistics on what code is being used which will help focus on improvements in documentation etc.
 
I assume we are going to make Java 1.6 the minimum requirement moving forward? This simplifies some of the web services api requirements to minimize the number of external packages that are required. 
 
 
Scooter
 
 
________________________________

From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic
Sent: Mon 5/25/2009 12:22 AM
To: biojava-dev at lists.open-bio.org
Subject: [Biojava-dev] next steps


Hi,

While talking about design requirements, I think we also need to think
pragmatically about how much time we will have to refactor code vs.
re-writing modules from scratch. To get started with the next steps, I
 suggest the following procedure: First thing will be to move to
Maven. Then components should be refactored into independent
sub-modules. Then the submodules can get improved to follow the new
design guidelines. Once we have reached a certain stability with the
re-organized code base, we will make the next release.

Any comments? If there is general agreement about this, I would take
the next step and replace the ant build system with a maven based one.

Andreas
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From msmoot at ucsd.edu  Mon May 25 17:07:57 2009
From: msmoot at ucsd.edu (Mike Smoot)
Date: Mon, 25 May 2009 10:07:57 -0700
Subject: [Biojava-dev] next steps
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> 
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>

On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:

>
> I was looking at the biojava code yesterday to see how easy it would be to
> divide up into functionally grouped jars based on package hierarchy. I tried
> to find some refactoring tools that would give a network graph view of class
> relationships. It is simple enough to parse source for import statements and
> build some sort of graph relationship tool. It is also easy enough to start
> dragging packages around to different projects in netbeans and resolve
> compiler errors.
>

JDepend is a nice tool for evaluating package dependencies.

http://www.clarkware.com/software/JDepend.html


Mike

-- 
____________________________________________________________
Michael Smoot, Ph.D.               Bioengineering Department
tel: 858-822-4756         University of California San Diego


From HWillis at scripps.edu  Mon May 25 22:59:10 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Mon, 25 May 2009 18:59:10 -0400
Subject: [Biojava-dev] next steps
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>

I attached the JDepend output for BioJava. This will help on the circular dependencies where core classes should not have dependencies on other packages and if they do it should be refactored into the core class.
 
Scooter

________________________________

From: mike.smoot at gmail.com on behalf of Mike Smoot
Sent: Mon 5/25/2009 1:07 PM
To: Scooter Willis
Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
Subject: Re: [Biojava-dev] next steps


On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:


	I was looking at the biojava code yesterday to see how easy it would be to divide up into functionally grouped jars based on package hierarchy. I tried to find some refactoring tools that would give a network graph view of class relationships. It is simple enough to parse source for import statements and build some sort of graph relationship tool. It is also easy enough to start dragging packages around to different projects in netbeans and resolve compiler errors.
	

JDepend is a nice tool for evaluating package dependencies.

http://www.clarkware.com/software/JDepend.html


Mike

-- 
____________________________________________________________
Michael Smoot, Ph.D.               Bioengineering Department
tel: 858-822-4756         University of California San Diego

-------------- next part --------------
A non-text attachment was scrubbed...
Name: report.xml
Type: text/xml
Size: 567706 bytes
Desc: report.xml
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20090525/489118b8/attachment-0002.xml>

From andreas at sdsc.edu  Thu May 28 04:31:15 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 27 May 2009 21:31:15 -0700
Subject: [Biojava-dev] next steps
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>

Hi Scooter,

quick update: There is also an eclipse plugin for JDepend, that
provides a user interface to browse thought the dependencies.

As I already mentioned earlier, I had some quick progress with the
maven plugin to convert the project to maven and create a first pom.
At the moment I am testing how  best to create  sub-projects that
should contain the modules.  The plugin does not seem to make it easy
to create new modules, so I agree with your earlier suggestion that it
is best to modularize first and the mavenize 2nd... Should we create a
branch in svn and play around with refactoring there and once we are
happy with it we can switch that branch to become the trunk?

Andreas


On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
> I attached the JDepend output for BioJava. This will help on the circular
> dependencies where core classes should not have dependencies on other
> packages and if they do it should be refactored into the core class.
>
> Scooter
> ________________________________
> From: mike.smoot at gmail.com on behalf of Mike Smoot
> Sent: Mon 5/25/2009 1:07 PM
> To: Scooter Willis
> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
> Subject: Re: [Biojava-dev] next steps
>
>
>
> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>
>> I was looking at the biojava code yesterday to see how easy it would be to
>> divide up into functionally grouped jars based on package hierarchy. I tried
>> to find some refactoring tools that would give a network graph view of class
>> relationships. It is simple enough to parse source for import statements and
>> build some sort of graph relationship tool. It is also easy enough to start
>> dragging packages around to different projects in netbeans and resolve
>> compiler errors.
>
> JDepend is a nice tool for evaluating package dependencies.
>
> http://www.clarkware.com/software/JDepend.html
>
>
> Mike
>
> --
> ____________________________________________________________
> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department
> tel: 858-822-4756 ? ? ? ? University of California San Diego
>


From juberpatel at gmail.com  Thu May 28 07:09:29 2009
From: juberpatel at gmail.com (juber patel)
Date: Thu, 28 May 2009 12:39:29 +0530
Subject: [Biojava-dev] next steps
In-Reply-To: <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
Message-ID: <f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com>

just a small observation:

Maven may not be easy to use and switch to maven should be done after
some consideration. I have personally not used it, but have seen
people on the Mahout list struggling with maven. Its utility may not
justify its complexity.

juber


On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
> Hi Scooter,
>
> quick update: There is also an eclipse plugin for JDepend, that
> provides a user interface to browse thought the dependencies.
>
> As I already mentioned earlier, I had some quick progress with the
> maven plugin to convert the project to maven and create a first pom.
> At the moment I am testing how ?best to create ?sub-projects that
> should contain the modules. ?The plugin does not seem to make it easy
> to create new modules, so I agree with your earlier suggestion that it
> is best to modularize first and the mavenize 2nd... Should we create a
> branch in svn and play around with refactoring there and once we are
> happy with it we can switch that branch to become the trunk?
>
> Andreas
>
>
>
>
> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
>> I attached the JDepend output for BioJava. This will help on the circular
>> dependencies where core classes should not have dependencies on other
>> packages and if they do it should be refactored into the core class.
>>
>> Scooter
>> ________________________________
>> From: mike.smoot at gmail.com on behalf of Mike Smoot
>> Sent: Mon 5/25/2009 1:07 PM
>> To: Scooter Willis
>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
>> Subject: Re: [Biojava-dev] next steps
>>
>>
>>
>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>>
>>> I was looking at the biojava code yesterday to see how easy it would be to
>>> divide up into functionally grouped jars based on package hierarchy. I tried
>>> to find some refactoring tools that would give a network graph view of class
>>> relationships. It is simple enough to parse source for import statements and
>>> build some sort of graph relationship tool. It is also easy enough to start
>>> dragging packages around to different projects in netbeans and resolve
>>> compiler errors.
>>
>> JDepend is a nice tool for evaluating package dependencies.
>>
>> http://www.clarkware.com/software/JDepend.html
>>
>>
>> Mike
>>
>> --
>> ____________________________________________________________
>> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department
>> tel: 858-822-4756 ? ? ? ? University of California San Diego
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Juber Patel        http://juberpatel.googlepages.com


From holland at eaglegenomics.com  Thu May 28 06:55:28 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 28 May 2009 07:55:28 +0100
Subject: [Biojava-dev] next steps
In-Reply-To: <59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
Message-ID: <1243493728.5260.1.camel@buzzybee>

I found when creating modules for the testbed biojava3 that it was
easier to do it by hand.

Only two things need to be done - first of all a list of modules needs
to be added to the parent pom.xml of the project, then each module has
its own pom.xml referencing the parent pom.xml.

Once created this way it only takes a project refresh in
Eclipse/NetBeans for the new module to show up.

See the example pom.xmls under the old biojava3 branch for details.

cheers,
Richard

On Wed, 2009-05-27 at 21:31 -0700, Andreas Prlic wrote:
> Hi Scooter,
> 
> quick update: There is also an eclipse plugin for JDepend, that
> provides a user interface to browse thought the dependencies.
> 
> As I already mentioned earlier, I had some quick progress with the
> maven plugin to convert the project to maven and create a first pom.
> At the moment I am testing how  best to create  sub-projects that
> should contain the modules.  The plugin does not seem to make it easy
> to create new modules, so I agree with your earlier suggestion that it
> is best to modularize first and the mavenize 2nd... Should we create a
> branch in svn and play around with refactoring there and once we are
> happy with it we can switch that branch to become the trunk?
> 
> Andreas
> 
> 
> 
> 
> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
> > I attached the JDepend output for BioJava. This will help on the circular
> > dependencies where core classes should not have dependencies on other
> > packages and if they do it should be refactored into the core class.
> >
> > Scooter
> > ________________________________
> > From: mike.smoot at gmail.com on behalf of Mike Smoot
> > Sent: Mon 5/25/2009 1:07 PM
> > To: Scooter Willis
> > Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
> > Subject: Re: [Biojava-dev] next steps
> >
> >
> >
> > On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> >>
> >> I was looking at the biojava code yesterday to see how easy it would be to
> >> divide up into functionally grouped jars based on package hierarchy. I tried
> >> to find some refactoring tools that would give a network graph view of class
> >> relationships. It is simple enough to parse source for import statements and
> >> build some sort of graph relationship tool. It is also easy enough to start
> >> dragging packages around to different projects in netbeans and resolve
> >> compiler errors.
> >
> > JDepend is a nice tool for evaluating package dependencies.
> >
> > http://www.clarkware.com/software/JDepend.html
> >
> >
> > Mike
> >
> > --
> > ____________________________________________________________
> > Michael Smoot, Ph.D.               Bioengineering Department
> > tel: 858-822-4756         University of California San Diego
> >
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From ayates at ebi.ac.uk  Thu May 28 08:16:05 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Thu, 28 May 2009 09:16:05 +0100
Subject: [Biojava-dev] next steps
In-Reply-To: <f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
	<f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com>
Message-ID: <4A1E4845.8080906@ebi.ac.uk>

Maven's big plus points are easy integration into just about any IDE &
its transitive dependency management capability. On a project like
BioJava (need people to get setup & running quickly over a wide range of
development environments) these two points really make it one of the
only viable choices I can would use. This isn't to say the other build
systems are not as good/better (rake, raven, gant, gradle, ant) just
they do not fit the bill as well.

Andy

juber patel wrote:
> just a small observation:
> 
> Maven may not be easy to use and switch to maven should be done after
> some consideration. I have personally not used it, but have seen
> people on the Mahout list struggling with maven. Its utility may not
> justify its complexity.
> 
> juber
> 
> 
> On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>> Hi Scooter,
>>
>> quick update: There is also an eclipse plugin for JDepend, that
>> provides a user interface to browse thought the dependencies.
>>
>> As I already mentioned earlier, I had some quick progress with the
>> maven plugin to convert the project to maven and create a first pom.
>> At the moment I am testing how  best to create  sub-projects that
>> should contain the modules.  The plugin does not seem to make it easy
>> to create new modules, so I agree with your earlier suggestion that it
>> is best to modularize first and the mavenize 2nd... Should we create a
>> branch in svn and play around with refactoring there and once we are
>> happy with it we can switch that branch to become the trunk?
>>
>> Andreas
>>
>>
>>
>>
>> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
>>> I attached the JDepend output for BioJava. This will help on the circular
>>> dependencies where core classes should not have dependencies on other
>>> packages and if they do it should be refactored into the core class.
>>>
>>> Scooter
>>> ________________________________
>>> From: mike.smoot at gmail.com on behalf of Mike Smoot
>>> Sent: Mon 5/25/2009 1:07 PM
>>> To: Scooter Willis
>>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
>>> Subject: Re: [Biojava-dev] next steps
>>>
>>>
>>>
>>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>>> I was looking at the biojava code yesterday to see how easy it would be to
>>>> divide up into functionally grouped jars based on package hierarchy. I tried
>>>> to find some refactoring tools that would give a network graph view of class
>>>> relationships. It is simple enough to parse source for import statements and
>>>> build some sort of graph relationship tool. It is also easy enough to start
>>>> dragging packages around to different projects in netbeans and resolve
>>>> compiler errors.
>>> JDepend is a nice tool for evaluating package dependencies.
>>>
>>> http://www.clarkware.com/software/JDepend.html
>>>
>>>
>>> Mike
>>>
>>> --
>>> ____________________________________________________________
>>> Michael Smoot, Ph.D.               Bioengineering Department
>>> tel: 858-822-4756         University of California San Diego
>>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> 
> 
> 


From james at carmanconsulting.com  Thu May 28 09:37:53 2009
From: james at carmanconsulting.com (James Carman)
Date: Thu, 28 May 2009 05:37:53 -0400
Subject: [Biojava-dev] next steps
In-Reply-To: <f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> 
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> 
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com> 
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> 
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> 
	<f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com>
Message-ID: <f2e8eedf0905280237nb4a4940ydbee0e143b22a0ae@mail.gmail.com>

Maven really isn't that hard.  I have no idea what the Mahout folks
are having troubles with, but I'm sure it can be addressed.  Maven't
benefits greatly outweigh its complexity (which isn't that high,
IMHO).  If you folks want a hand "mavenizing" your project, I wouldn't
mind helping.

On Thu, May 28, 2009 at 3:09 AM, juber patel <juberpatel at gmail.com> wrote:
> just a small observation:
>
> Maven may not be easy to use and switch to maven should be done after
> some consideration. I have personally not used it, but have seen
> people on the Mahout list struggling with maven. Its utility may not
> justify its complexity.
>
> juber
>
>
> On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>> Hi Scooter,
>>
>> quick update: There is also an eclipse plugin for JDepend, that
>> provides a user interface to browse thought the dependencies.
>>
>> As I already mentioned earlier, I had some quick progress with the
>> maven plugin to convert the project to maven and create a first pom.
>> At the moment I am testing how ?best to create ?sub-projects that
>> should contain the modules. ?The plugin does not seem to make it easy
>> to create new modules, so I agree with your earlier suggestion that it
>> is best to modularize first and the mavenize 2nd... Should we create a
>> branch in svn and play around with refactoring there and once we are
>> happy with it we can switch that branch to become the trunk?
>>
>> Andreas
>>
>>
>>
>>
>> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
>>> I attached the JDepend output for BioJava. This will help on the circular
>>> dependencies where core classes should not have dependencies on other
>>> packages and if they do it should be refactored into the core class.
>>>
>>> Scooter
>>> ________________________________
>>> From: mike.smoot at gmail.com on behalf of Mike Smoot
>>> Sent: Mon 5/25/2009 1:07 PM
>>> To: Scooter Willis
>>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
>>> Subject: Re: [Biojava-dev] next steps
>>>
>>>
>>>
>>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>>>
>>>> I was looking at the biojava code yesterday to see how easy it would be to
>>>> divide up into functionally grouped jars based on package hierarchy. I tried
>>>> to find some refactoring tools that would give a network graph view of class
>>>> relationships. It is simple enough to parse source for import statements and
>>>> build some sort of graph relationship tool. It is also easy enough to start
>>>> dragging packages around to different projects in netbeans and resolve
>>>> compiler errors.
>>>
>>> JDepend is a nice tool for evaluating package dependencies.
>>>
>>> http://www.clarkware.com/software/JDepend.html
>>>
>>>
>>> Mike
>>>
>>> --
>>> ____________________________________________________________
>>> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department
>>> tel: 858-822-4756 ? ? ? ? University of California San Diego
>>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>
> --
> Juber Patel ? ? ? ?http://juberpatel.googlepages.com
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From HWillis at scripps.edu  Thu May 28 13:10:43 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Thu, 28 May 2009 09:10:43 -0400
Subject: [Biojava-dev] next steps
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
	<f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com>
	<f2e8eedf0905280237nb4a4940ydbee0e143b22a0ae@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C861@FLMAIL1.fl.ad.scripps.edu>

Maven should be viewed as an additional option for developers where once a version of BioJava is released the Maven repository is updated and we need to make sure we have all the meta-data/dependency information correct. This doesn't mean that BioJava development needs to be done in Maven but simply is another way to get the jars after they have been released. BioJava as a single Jar is not that hard to integrate into your project given that we have a handful of external jars files that  we provide as part of the download. For other projects I have worked with where they only package the jar for that project and then give you web links to download 10 other external projects then that is a pain. You go to each website to figure out the download process and find that they are now all in different releases then Maven is a great solution because the developers of biojava took the time to get the exact version of jar files from external packages referenced properly and did not leave it to the "customer" to figure out.
 
If we use apache commons as a model I personally would rather grab the package of interest say biojava-blast and add into my development environment. Maven is an Apache project yet when you go to http://commons.apache.org/ and grab the component of interest Maven isn't even listed as an option. This is probably because it is an overkill for a single jar. Doesn't mean that you can't get commons jar's via maven when you load a larger project.  
 
In our case we may have a couple components where it can get a little complicated by external jar dependencies. Using biojava-blast as an example where it has a web service client that is either using axis or the latest greatest sun JSR. The project I am importing biojava-blast via Maven into already uses axis but an older version because everything works and I haven't needed to  do the upgrade. Maven may make the integration step easier but it doesn't solve the problem where I as the developer now need to do  something to resolve the version conflicts. 
 
So I view Maven as a nice option for developers who are a big fan of Maven and makes them smile when they can grab the code they need from BioJava via Maven. We should plan on having an apache commons like page to download and publish the jars in maven as well.
 
Scooter

________________________________

From: biojava-dev-bounces at lists.open-bio.org on behalf of James Carman
Sent: Thu 5/28/2009 5:37 AM
To: biojava-dev at lists.open-bio.org
Subject: Re: [Biojava-dev] next steps


Maven really isn't that hard.  I have no idea what the Mahout folks
are having troubles with, but I'm sure it can be addressed.  Maven't
benefits greatly outweigh its complexity (which isn't that high,
IMHO).  If you folks want a hand "mavenizing" your project, I wouldn't
mind helping.

On Thu, May 28, 2009 at 3:09 AM, juber patel <juberpatel at gmail.com> wrote:
> just a small observation:
>
> Maven may not be easy to use and switch to maven should be done after
> some consideration. I have personally not used it, but have seen
> people on the Mahout list struggling with maven. Its utility may not
> justify its complexity.
>
> juber
>
>
> On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>> Hi Scooter,
>>
>> quick update: There is also an eclipse plugin for JDepend, that
>> provides a user interface to browse thought the dependencies.
>>
>> As I already mentioned earlier, I had some quick progress with the
>> maven plugin to convert the project to maven and create a first pom.
>> At the moment I am testing how  best to create  sub-projects that
>> should contain the modules.  The plugin does not seem to make it easy
>> to create new modules, so I agree with your earlier suggestion that it
>> is best to modularize first and the mavenize 2nd... Should we create a
>> branch in svn and play around with refactoring there and once we are
>> happy with it we can switch that branch to become the trunk?
>>
>> Andreas
>>
>>
>>
>>
>> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
>>> I attached the JDepend output for BioJava. This will help on the circular
>>> dependencies where core classes should not have dependencies on other
>>> packages and if they do it should be refactored into the core class.
>>>
>>> Scooter
>>> ________________________________
>>> From: mike.smoot at gmail.com on behalf of Mike Smoot
>>> Sent: Mon 5/25/2009 1:07 PM
>>> To: Scooter Willis
>>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
>>> Subject: Re: [Biojava-dev] next steps
>>>
>>>
>>>
>>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>>>
>>>> I was looking at the biojava code yesterday to see how easy it would be to
>>>> divide up into functionally grouped jars based on package hierarchy. I tried
>>>> to find some refactoring tools that would give a network graph view of class
>>>> relationships. It is simple enough to parse source for import statements and
>>>> build some sort of graph relationship tool. It is also easy enough to start
>>>> dragging packages around to different projects in netbeans and resolve
>>>> compiler errors.
>>>
>>> JDepend is a nice tool for evaluating package dependencies.
>>>
>>> http://www.clarkware.com/software/JDepend.html
>>>
>>>
>>> Mike
>>>
>>> --
>>> ____________________________________________________________
>>> Michael Smoot, Ph.D.               Bioengineering Department
>>> tel: 858-822-4756         University of California San Diego
>>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>
> --
> Juber Patel        http://juberpatel.googlepages.com <http://juberpatel.googlepages.com/> 
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From HWillis at scripps.edu  Thu May 28 13:37:27 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Thu, 28 May 2009 09:37:27 -0400
Subject: [Biojava-dev] BioJava BLAST web services
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C863@FLMAIL1.fl.ad.scripps.edu>


I am planning on doing some testing of  a couple BLAST web services interfaces(assuming more than one exists) and see what they truly have in common and see how that would impact a BJ3 front end to multiple providers. My assumption is that they will be the same. I noticed on the NCBI Blast implementations the user was required to pass their email address as part of the web service call. They are concerned with abuse from external processes and they only allow one sequence per request. Same-Same but different is always fun!

>From wikipedia the following are listed as BLAST resources where more than one may offer a web service interface. Should BioJava3 try and support more than one?

Thanks

Scooter


Variations of BLAST


*	WU-BLAST <http://blast.wustl.edu/>  - the original gapping BLAST with statistics, developed and maintained by Warren Gish at Washington University in St. Louis <http://en.wikipedia.org/wiki/Washington_University_in_St._Louis> 
*	EBI's BLAST Services <http://www.ebi.ac.uk/Tools/blast>  - EBI's <http://en.wikipedia.org/wiki/European_Bioinformatics_Institute>  main blast services page.
*	FSA-BLAST <http://www.fsa-blast.org/>  - a new, faster but still accurate version of NCBI BLAST based on recently published algorithmic improvements
*	NBIC mpiBLAST <http://services.nbic.nl:4080/bb/cgi-bin/bb_login.cgi>  - at the Netherlands Bioinformatics Centre
*	Parallel BLAST <http://www-users.cs.umn.edu/~rangwala/final_bglBLAST.pdf>  - a dual scheduling BLAST tested on the Blue Gene/L
*	mpiBLAST <http://www.mpiblast.org/>  - open-source parallel BLAST
*	A/G BLAST <http://developer.apple.com/darwin/projects/blast/>  - implementation for PowerPC G4/G5 processors and Mac OS X, from Apple Computer <http://en.wikipedia.org/wiki/Apple_Computer> 's Advanced Computation Group <http://en.wikipedia.org/wiki/Advanced_Computation_Group>  and Genentech <http://en.wikipedia.org/wiki/Genentech> .
*	STRAP <http://3d-alignment.eu/>  - the protein workbench STRAP <http://www.charite.de/bioinf/strap/>  contains a comfortable BLAST front-end with a cache for BLAST results


[edit <http://en.wikipedia.org/w/index.php?title=BLAST&action=edit&section=13> ] Commercial versions


*	ThermoBLAST by DNA Software Inc. <http://dnasoftware.com/ThermoBLAST/tabid/110/Default.aspx>  - scans entire genomes quickly and accurately combing the power of BLAST with the most advanced thermodynamics parameters
*	PatternHunter <http://www.bioinformaticssolutions.com/products/ph/index.php>  - an alternative software which provides similar functionality to BLAST while claiming increased speed and sensitivity
*	KoriBlast <http://www.korilog.com/products>  - a reliable graphical environment dedicated to sequence data mining. KoriBlast combines Blast searches with advanced data management capabilities and a state-of-the-art graphical user interface.
*	microbial identification BLAST <http://www.sepsitest-blast.de/>  - a quality controlled database for in-vitro diagnostics. SepsiTest combines broad-range-PCR using ultra-pure reagents with Blast searches in a quality controlled environment.


From james at carmanconsulting.com  Thu May 28 13:45:23 2009
From: james at carmanconsulting.com (James Carman)
Date: Thu, 28 May 2009 09:45:23 -0400
Subject: [Biojava-dev] next steps
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C861@FLMAIL1.fl.ad.scripps.edu>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com> 
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu> 
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com> 
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu> 
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com> 
	<f8e28e170905280009i310e83d6se952d26684fef763@mail.gmail.com> 
	<f2e8eedf0905280237nb4a4940ydbee0e143b22a0ae@mail.gmail.com> 
	<061BFD133FA1584693D19C79A0072F5F76C861@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <f2e8eedf0905280645u480a5500xad575a84fcf54caf@mail.gmail.com>

I would say that you should use the Apache Commons projects as a model
(I'm an Apache Commons PMC member, so I'm a bit biased).  The
maven-generated site will include information on the dependencies
(including whether they are optional and where you can get them
provided the other projects play nicely and include that information).
 And, yes, when you *do* use Maven, it will download all required
transitive dependencies for you and add it to your classpath
automagically.  That's why it's so nice.  Well, that's one of the MANY
reasons it's so nice.  The release plugin also saves a LOT of
headaches, if you ask me (once you get it configured properly).

On Thu, May 28, 2009 at 9:10 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Maven should be viewed as an additional option for developers where once a
> version of BioJava is released the Maven repository is updated and we need
> to make sure we have all the meta-data/dependency information correct. This
> doesn't mean that BioJava development needs to be done in Maven but simply
> is another way to get the jars after they have been released. BioJava as a
> single Jar is not that hard to integrate into your project given that we
> have a handful of external jars files that? we provide as part of the
> download. For other projects I have worked with where they only package the
> jar for that project and then give you web links to download 10 other
> external projects then that is a pain.?You go to each website to figure out
> the download process and find that they are now all in different releases
> then Maven is a great solution because the developers of biojava took the
> time to get the exact version of jar files from external packages referenced
> properly and did not leave it to the "customer" to figure out.
>
> If we use apache commons as a model I personally?would rather grab the
> package of interest say biojava-blast and add into my development
> environment. Maven is an Apache project yet when you go to
> http://commons.apache.org/?and?grab the component of interest Maven isn't
> even listed as an option. This is probably because it is an overkill for a
> single?jar. Doesn't mean that you can't get?commons?jar's via maven when you
> load a larger project.
>
> In our case we may have a couple components where it can get a little
> complicated by external jar dependencies. Using biojava-blast as an example
> where it?has a web service client that is either using axis or the latest
> greatest sun JSR. The project I am importing biojava-blast via Maven into
> already uses axis but an older version because everything works and I
> haven't needed to? do the upgrade. Maven may make the integration step
> easier but it doesn't solve the problem where I as the developer now need to
> do? something to resolve the version conflicts.
>
> So I view Maven as a nice option for developers who are a big fan of Maven
> and makes them smile when they can grab the code they need from BioJava via
> Maven. We should plan on having an apache commons like page to download and
> publish the jars in maven as well.
>
> Scooter
> ________________________________
> From: biojava-dev-bounces at lists.open-bio.org on behalf of James Carman
> Sent: Thu 5/28/2009 5:37 AM
> To: biojava-dev at lists.open-bio.org
> Subject: Re: [Biojava-dev] next steps
>
> Maven really isn't that hard.? I have no idea what the Mahout folks
> are having troubles with, but I'm sure it can be addressed.? Maven't
> benefits greatly outweigh its complexity (which isn't that high,
> IMHO).? If you folks want a hand "mavenizing" your project, I wouldn't
> mind helping.
>
> On Thu, May 28, 2009 at 3:09 AM, juber patel <juberpatel at gmail.com> wrote:
>> just a small observation:
>>
>> Maven may not be easy to use and switch to maven should be done after
>> some consideration. I have personally not used it, but have seen
>> people on the Mahout list struggling with maven. Its utility may not
>> justify its complexity.
>>
>> juber
>>
>>
>> On Thu, May 28, 2009 at 10:01 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>> Hi Scooter,
>>>
>>> quick update: There is also an eclipse plugin for JDepend, that
>>> provides a user interface to browse thought the dependencies.
>>>
>>> As I already mentioned earlier, I had some quick progress with the
>>> maven plugin to convert the project to maven and create a first pom.
>>> At the moment I am testing how ?best to create ?sub-projects that
>>> should contain the modules. ?The plugin does not seem to make it easy
>>> to create new modules, so I agree with your earlier suggestion that it
>>> is best to modularize first and the mavenize 2nd... Should we create a
>>> branch in svn and play around with refactoring there and once we are
>>> happy with it we can switch that branch to become the trunk?
>>>
>>> Andreas
>>>
>>>
>>>
>>>
>>> On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu>
>>> wrote:
>>>> I attached the JDepend output for BioJava. This will help on the
>>>> circular
>>>> dependencies where core classes should not have dependencies on other
>>>> packages and if they do it should be refactored into the core class.
>>>>
>>>> Scooter
>>>> ________________________________
>>>> From: mike.smoot at gmail.com on behalf of Mike Smoot
>>>> Sent: Mon 5/25/2009 1:07 PM
>>>> To: Scooter Willis
>>>> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
>>>> Subject: Re: [Biojava-dev] next steps
>>>>
>>>>
>>>>
>>>> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu>
>>>> wrote:
>>>>>
>>>>> I was looking at the biojava code yesterday to see how easy it would be
>>>>> to
>>>>> divide up into functionally grouped jars based on package hierarchy. I
>>>>> tried
>>>>> to find some refactoring tools that would give a network graph view of
>>>>> class
>>>>> relationships. It is simple enough to parse source for import
>>>>> statements and
>>>>> build some sort of graph relationship tool. It is also easy enough to
>>>>> start
>>>>> dragging packages around to different projects in netbeans and resolve
>>>>> compiler errors.
>>>>
>>>> JDepend is a nice tool for evaluating package dependencies.
>>>>
>>>> http://www.clarkware.com/software/JDepend.html
>>>>
>>>>
>>>> Mike
>>>>
>>>> --
>>>> ____________________________________________________________
>>>> Michael Smoot, Ph.D. ? ? ? ? ? ? ? Bioengineering Department
>>>> tel: 858-822-4756 ? ? ? ? University of California San Diego
>>>>
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>
>>
>>
>> --
>> Juber Patel ? ? ? ?http://juberpatel.googlepages.com
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From andreas at sdsc.edu  Thu May 28 16:53:33 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 28 May 2009 09:53:33 -0700
Subject: [Biojava-dev] hierarchical vs flat module organisation
Message-ID: <59a41c430905280953w964ab36q7baf1fd5eb21e62a@mail.gmail.com>

Hi,

from the different posts it seems there are two types of suggestions
for how to organize modules: hierarchical vs. flat.

I wonder if the best way to organize this is to mix the designs. There
could be few top-level modules like core, webservices, phylo,
structure. These would be equivalent to projects in the workspace.
These can then contain-submodules like

webservices-blast-ebi
webservices-blast-ncbi
webservices-whatever

or
structure-core
structure-viewers

The submodules would be sub-folders in the projects.

Any thoughts on that?

Andreas


From HWillis at scripps.edu  Thu May 28 18:09:32 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Thu, 28 May 2009 14:09:32 -0400
Subject: [Biojava-dev] hierarchical vs flat module organisation
References: <59a41c430905280953w964ab36q7baf1fd5eb21e62a@mail.gmail.com>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C867@FLMAIL1.fl.ad.scripps.edu>

Andreas
 
I think the organization should make the most sense to the user of BioJava and should be functionally grouped. I show up looking for specific biology algorithms/code. Blast, Sequence Alignment, Tree construction etc. In that module I would then find different features that I can then explore to solve the problem. The question becomes would I pick a module based on how it solved the problem. Given that BioJava does not have a native solution do to BLAST nor does the developer want to deal with all the configuration the BLAST-web services call simply becomes the only choice. The results of parsing a BLAST output and making a BLAST web service call should be the same structured result where I would then use other BioJava api's against the results.
 
I think we should group by function an that gives the developer a collection of tools to work with.
 
Scooter

________________________________

From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic
Sent: Thu 5/28/2009 12:53 PM
To: biojava-dev
Subject: [Biojava-dev] hierarchical vs flat module organisation


Hi,

from the different posts it seems there are two types of suggestions
for how to organize modules: hierarchical vs. flat.

I wonder if the best way to organize this is to mix the designs. There
could be few top-level modules like core, webservices, phylo,
structure. These would be equivalent to projects in the workspace.
These can then contain-submodules like

webservices-blast-ebi
webservices-blast-ncbi
webservices-whatever

or
structure-core
structure-viewers

The submodules would be sub-folders in the projects.

Any thoughts on that?

Andreas
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From HWillis at scripps.edu  Thu May 28 17:57:27 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Thu, 28 May 2009 13:57:27 -0400
Subject: [Biojava-dev]  next steps
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com><061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu><f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com><061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C864@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <061BFD133FA1584693D19C79A0072F5F76C866@FLMAIL1.fl.ad.scripps.edu>


Andreas
 
I think each jar probably needs its own svn trunk. This is how apache commons is setup. The advantage of this is that everything is modularized with nice defined boundaries on dependencies. If you have once source tree that builds multiple jars then it becomes very easy to grab a class from another jar and forcing additional dependencies.
 
You also don't need to worry about a single user having access to the entire source tree. If you have a new developer who wants to get involved with a specific interest then easy to give him access to that package without worrying about breaking other packages.
 
Do you think we should call the functional grouping packages or modules or something else?
 
If you take a wack at the refactoring based on X number of modules then you could check each one in a different subversion trunk. Each module will probably have a dependency on biojava-core which will also be a separate subversion trunk. In Netbeans I would setup a project for each and then I can add the biojava-core project as an external project dependency. This also allows each module to be released independently and more frequently. We probably need to come up with a versioning convention that is part of the jar name. Not sure if any of the ant build tools automate the upticking of major/minor version number when packaging jars.
 
For the user of biojava they would download a single jar for the module of interest where the download contains all the external jars that are required including biojava-core. For maven that would be done via POM.
 
As part of the refactoring now is the time to make any major namespace changes you want to make. I assume that eclipse refactoring makes this easy. Check all the code in and BioJava3 has begun!
 
Scooter

________________________________

From: andreas.prlic at gmail.com on behalf of Andreas Prlic
Sent: Thu 5/28/2009 12:31 AM
To: Scooter Willis
Cc: biojava-dev
Subject: Re: [Biojava-dev] next steps


Hi Scooter,

quick update: There is also an eclipse plugin for JDepend, that
provides a user interface to browse thought the dependencies.

As I already mentioned earlier, I had some quick progress with the
maven plugin to convert the project to maven and create a first pom.
At the moment I am testing how  best to create  sub-projects that
should contain the modules.  The plugin does not seem to make it easy
to create new modules, so I agree with your earlier suggestion that it
is best to modularize first and the mavenize 2nd... Should we create a
branch in svn and play around with refactoring there and once we are
happy with it we can switch that branch to become the trunk?

Andreas


On Mon, May 25, 2009 at 3:59 PM, Scooter Willis <HWillis at scripps.edu> wrote:
> I attached the JDepend output for BioJava. This will help on the circular
> dependencies where core classes should not have dependencies on other
> packages and if they do it should be refactored into the core class.
>
> Scooter
> ________________________________
> From: mike.smoot at gmail.com on behalf of Mike Smoot
> Sent: Mon 5/25/2009 1:07 PM
> To: Scooter Willis
> Cc: Andreas Prlic; biojava-dev at lists.open-bio.org
> Subject: Re: [Biojava-dev] next steps
>
>
>
> On Mon, May 25, 2009 at 7:48 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>
>> I was looking at the biojava code yesterday to see how easy it would be to
>> divide up into functionally grouped jars based on package hierarchy. I tried
>> to find some refactoring tools that would give a network graph view of class
>> relationships. It is simple enough to parse source for import statements and
>> build some sort of graph relationship tool. It is also easy enough to start
>> dragging packages around to different projects in netbeans and resolve
>> compiler errors.
>
> JDepend is a nice tool for evaluating package dependencies.
>
> http://www.clarkware.com/software/JDepend.html
>
>
> Mike
>
> --
> ____________________________________________________________
> Michael Smoot, Ph.D.               Bioengineering Department
> tel: 858-822-4756         University of California San Diego
>


From andreas.prlic at gmail.com  Fri May 29 04:53:22 2009
From: andreas.prlic at gmail.com (Andreas Prlic)
Date: Thu, 28 May 2009 21:53:22 -0700
Subject: [Biojava-dev] next steps
In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C866@FLMAIL1.fl.ad.scripps.edu>
References: <59a41c430905242122oed51ea4o169ef94386133982@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85E@FLMAIL1.fl.ad.scripps.edu>
	<f9ac1d730905251007t15897898s693e54ba352916f7@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C85F@FLMAIL1.fl.ad.scripps.edu>
	<59a41c430905272131q5c00e587r1e22f3fc84dc2818@mail.gmail.com>
	<061BFD133FA1584693D19C79A0072F5F76C864@FLMAIL1.fl.ad.scripps.edu>
	<061BFD133FA1584693D19C79A0072F5F76C866@FLMAIL1.fl.ad.scripps.edu>
Message-ID: <59a41c430905282153r5c82b7cfp1648807b6042eaf5@mail.gmail.com>

> I think each jar probably needs its own svn trunk. This is how apache
> commons is setup. The advantage of this is that everything is modularized
> with nice defined boundaries on dependencies. If you have once source tree
> that builds multiple jars then it becomes very easy to grab a class from
> another jar and forcing additional dependencies.

sounds good.  Guess it might be good not  to have too many .jar files
in the end as well.

> You also don't need to worry about a single user having access to the entire
> source tree. If you have a new developer who wants to get involved with a
> specific interest then easy to give him access to that package without
> worrying about breaking other packages.

might be useful in the future. For now I think it is good enough to
give developers write  access to all of biojava.


>
> Do you think we should call the functional grouping packages or modules or
> something else?

What about: we call a toplevel project, a package. A package can then
consist of several modules. Not sure if we should have a jar per
package or per module.


> If you take a wack at the refactoring based on X number of modules then you
> could check each one in a different subversion trunk. Each module will
> probably have a dependency on biojava-core which will also be a separate
> subversion trunk. In Netbeans I would setup a project for each and then I
> can add the biojava-core project as an external project dependency.

Sounds good and you would do the same in eclipse.

This
> also allows each module to be released independently and more frequently. We
> probably need to come up with a versioning convention that is part of the
> jar name.

I think we should stick to the  maven naming conventions.
http://maven.apache.org/guides/mini/guide-naming-conventions.html
e.g.
groupId org.biojava.phylo for the phylogenetic package
artifactId biojava-phylo
version 3.0.0  or 3.0.0-SNAPSHOT if it is a nightly build


Not sure if any of the ant build tools automate the upticking of
> major/minor version number when packaging jars.

Not sure about ant, but maven has a built in release plugin.  if it is
set up correctly you can just write
mvn release:prepare
and the release is being prepared...


> As part of the refactoring now is the time to make any major namespace
> changes you want to make. I assume that eclipse refactoring makes this easy.

Namespace changes are tricky. In principle I don;t want to break
backwards compatibility with the existing code base. On the other side
having package names starting with org.biojava.structure, rather than
org.biojava.bio.structure would be simpler. If in doubt I am for
backwards compatibility. One case where I would like to see a change
is the core blast parsing modules. org.biojava.bio.program.sax does
not indicate at all that this has to do with blast.

Andreas


From heuermh at acm.org  Fri May 29 16:29:04 2009
From: heuermh at acm.org (Michael Heuer)
Date: Fri, 29 May 2009 12:29:04 -0400 (EDT)
Subject: [Biojava-dev] next steps
In-Reply-To: <59a41c430905282153r5c82b7cfp1648807b6042eaf5@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.0905291225190.13945-100000@shell3.shore.net>

Andreas Prlic wrote:

> > I think each jar probably needs its own svn trunk. This is how apache
> > commons is setup. The advantage of this is that everything is modularized
> > with nice defined boundaries on dependencies. If you have once source tree
> > that builds multiple jars then it becomes very easy to grab a class from
> > another jar and forcing additional dependencies.
>
> sounds good.  Guess it might be good not  to have too many .jar files
> in the end as well.
>
> > You also don't need to worry about a single user having access to the entire
> > source tree. If you have a new developer who wants to get involved with a
> > specific interest then easy to give him access to that package without
> > worrying about breaking other packages.
>
> might be useful in the future. For now I think it is good enough to
> give developers write  access to all of biojava.
>
>
> >
> > Do you think we should call the functional grouping packages or modules or
> > something else?
>
> What about: we call a toplevel project, a package. A package can then
> consist of several modules. Not sure if we should have a jar per
> package or per module.
>
>
> > If you take a wack at the refactoring based on X number of modules then you
> > could check each one in a different subversion trunk. Each module will
> > probably have a dependency on biojava-core which will also be a separate
> > subversion trunk. In Netbeans I would setup a project for each and then I
> > can add the biojava-core project as an external project dependency.
>
> Sounds good and you would do the same in eclipse.
>
> This
> > also allows each module to be released independently and more frequently. We
> > probably need to come up with a versioning convention that is part of the
> > jar name.
>
> I think we should stick to the  maven naming conventions.
> http://maven.apache.org/guides/mini/guide-naming-conventions.html
> e.g.
> groupId org.biojava.phylo for the phylogenetic package
> artifactId biojava-phylo
> version 3.0.0  or 3.0.0-SNAPSHOT if it is a nightly build
>
>
> Not sure if any of the ant build tools automate the upticking of
> > major/minor version number when packaging jars.
>
> Not sure about ant, but maven has a built in release plugin.  if it is
> set up correctly you can just write
> mvn release:prepare
> and the release is being prepared...
>
>
> > As part of the refactoring now is the time to make any major namespace
> > changes you want to make. I assume that eclipse refactoring makes this easy.
>
> Namespace changes are tricky. In principle I don;t want to break
> backwards compatibility with the existing code base. On the other side
> having package names starting with org.biojava.structure, rather than
> org.biojava.bio.structure would be simpler. If in doubt I am for
> backwards compatibility. One case where I would like to see a change
> is the core blast parsing modules. org.biojava.bio.program.sax does
> not indicate at all that this has to do with blast.