[Biojava-dev] The future of BioJava

Sun Sep 23 12:06:21 UTC 2007

>    1. I like the idea of making readers more pluggable, and Dozer
>    definitely looks interesting.  Is this going to be supported via the Service
>    Provider Interface approach (used by Taverna and other projects)?
>

An SPI interface would be a great addition. I believe taverna's is
quite a nice feature. It would be good to have.

>    2. Andy brought up the point of people who create non-standard
>    variations of EMBL-formatted files.  I was wondering if these files were
>    created in programming languages other than Java?  If so, would those users
>    be willing to use a Jython, JRuby, or a Perl-like scripting language like
>    Sleep,?  This would allow them to use biojava as a library, and still use a
>    scripting language whose syntax they were familiar with.  They would also be
>    producing files in a more standardized format.  This might cut down on the
>    number of parsing mistakes caused by "unsupported" file variations.  You can
>    go to http://scripting.dev.java.net for more information on the
>    scripting languages that the Java VM supports.
>

I think if we designed it right you could do a lot with Groovy with
the added benefit of very java like syntax.  Richard and I did discuss
the possibility of having all I/O file processing written in Groovy
and compiled to classes.

>    3. Was there any reason why non-standard files were being created?
>    Perhaps some use-case not being covered?
>

Non standard GenBank type files are made by VectorNTI. Also formats
change over the years. I think this recently happened with EMBL
format.  Unfortunately flatfiles unlike XML do not have versioning or
need to validate against a definition.

>    4. If BioJava is split up into a variety of smaller JARs, how would
>    you insure that the users had all of the JARs that they needed?  Would an
>    installer be provided to allow users to select groups of JARs?  There are a
>    number of open source installers that would make this process easier.  Using
>    Maven is suitable if you're a developer, if you're a scripter it's a little
>    more difficult to deal with.
>

Many projects are distributed as multiple jars (eg hibernate).
Typically the user would download the core bundle and put them in a
lib folder. Additional jars could be downloaded for extra activities.

>
>    6. When it comes to unit testing and continuous building, is the
>    bio*.org server going to handle that automated build & burn, or is someone
>    in the group going to have to do it?  I think the inability to have the
>    build setup on the server had us stymied before.

The open-bio servers are a natural choice but I think a discussion of
the pros and cons of others is a good idea.

>
>    7. Now that Java also includes the Derby database, and the Java
>    Persistence API (JPA), has anyone considered migrating the BioSQL support
>    from Hibernate to JPA, and using Derby as the default database?  This would
>    make it a little easier to maintain and would minimize the setup work that a
>    new user would have to do.
>

I agree on this. This is also a good argument for making our classes
more bean like so they can be easily turned into enterprise beans.  A
nice part of JPA is that you can use hibernate to do the persistence.
Having the Derby database built in offers other interesting
possibilities as well.

>    8. Richard, you mention in the "Reasoning" section that "users have
>    moved on".  What types of use-cases beyond basic sequence analysis, should
>    BioJava support?  Would support for more of lab-related processes expand the
>    user base and number of committers?  Would support for parsing different
>    types of instrument files be a useful addition? I could imagine use cases
>    where users would like to be able to parse an Affy file and fetch probe
>    information, gene information, and perhaps pathway data.
>
>    9. Are there any thoughts about using annotations (perhaps in
>    combination with ontologies) to handle semantic validation of arguments?
>    For example, you might have an annotation like
>
> @id {ontologyURI="http://www.mygrid.org.uk/ontology#LocusLink_record_id"}
>
> indicating that the attribute or method argument is a LocusLink id.
>

I think this is an excellent example of how we can use Annotations. It
would allow quite a bit of flexibility for integration tasks.

- Mark Schreiber

> Mark Fortner
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>