[Biojava-dev] BioJava 3 Begins - Volunteers please!

Tue Oct 21 03:17:28 UTC 2008

Hi,

Couple of thoughts regarding biojava v3:

License: Since it seems we will end up copying code from biojava 1.6
to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e.
people should still use the same biojava license headers when
committing new files and all code will be considered to be LGPL, if no
header is present. Do NOT commit code under other licenses.

Installation: We need some installation instructions on the wiki site,
e.g. how to get the maven setup running.  What are the code
conventions for the new version?

Blast: the Blast parsing modules are among the most frequently used
ones in biojava 1.6. To make people use biojava v3 it will be crucial
to have a port of them to the new version. Does anybody want to take
care of that?

Automated builds: is it interesting to have automated builds set up
for the new version at this stage, or should we wait until a more
mature stage? I could easily add another auto-build similar to the one
for biojava 1.6 at http://www.spice-3d.org/cruise/

Andreas

On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland
<holland at eaglegenomics.com> wrote:
> Hi all,
>
> I've just committed some new code to the biojava3 branch of the biojava-live
> subversion repository. It's the foundations of a brand new alphabet+symbol
> set of classes, and an example of how to use them to represent DNA. You'll
> notice that the new code is very lightweight and allows for a lot more
> flexibility than the old code - for instance, the concept of Alphabet has
> changed radically. It also makes much more extensive use of the Collections
> API.
>
> I haven't got any test cases or usage examples yet but give me a shout if
> you don't understand the code and I'll explain how it works. (Hint:
> SymbolFormat is there to convert Strings into SymbolList objects, and vice
> versa).
>
> So, now we want some volunteers! We're starting from scratch here so there's
> a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> whether it be copy-and-paste existing classes and modify them to suit the
> new style, or write completely new ones to provide equivalent functionality.
>
>
> I'll post an example of how to do file parsing soon, probably starting with
> FASTA. In the meantime, a good place to start would be for people to design
> object models to represent their favourite data types (e.g. Genbank, or
> microarray data). Utility classes to manipulate those objects would be great
> too.
>
> The object models need to be normalised as much as possible - e.g. if your
> data has a lot of comments, and the order of those comments is important,
> then give your object model a collection of comment objects. The object
> model for each data type should be completely independent and use basic data
> types wherever possible (e.g. store sequences as strings, don't attempt to
> parse them into anything fancy like SymbolLists). The closer the object
> model is to the original data format, the better. There's going to be clever
> tricks when it comes to converting data between different object models
> (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> parsing examples up.
>
> You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> because we want to make it as modular as possible, so if you want to write
> microarray stuff, create a new microarray sub-project (as per the dna
> example that's already there). This way if someone only wants the microarray
> bit of BJ3, they only need install the appropriate JAR file and can ignore
> the rest. (The 'core' module is for stuff that is so generic it could be
> used anywhere, or is used in every single other module.)
>
> If coding isn't your cup of tea, then we would very much welcome testers
> (particularly those who enjoy writing test cases!), documenters
> (particularly code commenters), translators (for internationalisation of the
> code), and of course all those who wish to contribute ideas and suggestions
> no matter how off-the-wall they might be. In particular if you'd like to
> take charge of an area of the development process, e.g. Documentation Chief,
> or Protein Champion, then that would be much appreciated.
>
> I'm very much looking forward to working with everyone on this. Good luck,
> and happy coding!
>
> cheers,
> Richard
>
> PS. Please don't forget to attach the appropriate licence to your code. You
> can copy-and-paste it from the existing classes I just committed this
> evening.
>
> PPS. For those who are worried about backwards compatibility - this was
> discussed on the lists a while back and it was made clear that BJ3 is a
> clean break. However, the existing code will continue to be maintained and
> bugfixed for a couple of years so you don't have to upgrade if you don't
> want to - it just won't have any new features developed for it. This is
> largely because it'll probably take just that long to write all the new BJ3
> code. When we do decide to desupport the existing BJ code, plenty of notice
> will be given (i.e. years as opposed to months).
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>