[Biojava-dev] The future of BioJava
Mark Schreiber
markjschreiber at gmail.com
Sun Sep 23 12:06:21 UTC 2007
> 1. I like the idea of making readers more pluggable, and Dozer
> definitely looks interesting. Is this going to be supported via the Service
> Provider Interface approach (used by Taverna and other projects)?
>
An SPI interface would be a great addition. I believe taverna's is
quite a nice feature. It would be good to have.
> 2. Andy brought up the point of people who create non-standard
> variations of EMBL-formatted files. I was wondering if these files were
> created in programming languages other than Java? If so, would those users
> be willing to use a Jython, JRuby, or a Perl-like scripting language like
> Sleep,? This would allow them to use biojava as a library, and still use a
> scripting language whose syntax they were familiar with. They would also be
> producing files in a more standardized format. This might cut down on the
> number of parsing mistakes caused by "unsupported" file variations. You can
> go to http://scripting.dev.java.net for more information on the
> scripting languages that the Java VM supports.
>
I think if we designed it right you could do a lot with Groovy with
the added benefit of very java like syntax. Richard and I did discuss
the possibility of having all I/O file processing written in Groovy
and compiled to classes.
> 3. Was there any reason why non-standard files were being created?
> Perhaps some use-case not being covered?
>
Non standard GenBank type files are made by VectorNTI. Also formats
change over the years. I think this recently happened with EMBL
format. Unfortunately flatfiles unlike XML do not have versioning or
need to validate against a definition.
> 4. If BioJava is split up into a variety of smaller JARs, how would
> you insure that the users had all of the JARs that they needed? Would an
> installer be provided to allow users to select groups of JARs? There are a
> number of open source installers that would make this process easier. Using
> Maven is suitable if you're a developer, if you're a scripter it's a little
> more difficult to deal with.
>
Many projects are distributed as multiple jars (eg hibernate).
Typically the user would download the core bundle and put them in a
lib folder. Additional jars could be downloaded for extra activities.
>
> 6. When it comes to unit testing and continuous building, is the
> bio*.org server going to handle that automated build & burn, or is someone
> in the group going to have to do it? I think the inability to have the
> build setup on the server had us stymied before.
The open-bio servers are a natural choice but I think a discussion of
the pros and cons of others is a good idea.
>
> 7. Now that Java also includes the Derby database, and the Java
> Persistence API (JPA), has anyone considered migrating the BioSQL support
> from Hibernate to JPA, and using Derby as the default database? This would
> make it a little easier to maintain and would minimize the setup work that a
> new user would have to do.
>
I agree on this. This is also a good argument for making our classes
more bean like so they can be easily turned into enterprise beans. A
nice part of JPA is that you can use hibernate to do the persistence.
Having the Derby database built in offers other interesting
possibilities as well.
> 8. Richard, you mention in the "Reasoning" section that "users have
> moved on". What types of use-cases beyond basic sequence analysis, should
> BioJava support? Would support for more of lab-related processes expand the
> user base and number of committers? Would support for parsing different
> types of instrument files be a useful addition? I could imagine use cases
> where users would like to be able to parse an Affy file and fetch probe
> information, gene information, and perhaps pathway data.
>
> 9. Are there any thoughts about using annotations (perhaps in
> combination with ontologies) to handle semantic validation of arguments?
> For example, you might have an annotation like
>
> @id {ontologyURI="http://www.mygrid.org.uk/ontology#LocusLink_record_id"}
>
> indicating that the attribute or method argument is a LocusLink id.
>
I think this is an excellent example of how we can use Annotations. It
would allow quite a bit of flexibility for integration tasks.
- Mark Schreiber
> Mark Fortner
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
More information about the biojava-dev
mailing list