[Bioperl-l] GSoC/BioPerl Reorganization Project

Wed Apr 27 19:35:43 UTC 2011

Sheena,

Congrats on being accepted! We've talked about doing this over the years, but it's not an easy task and it needs a dedicated project to get the ball rolling, so to speak.  Hopefully this isn't tl;dr.  I'll start off with a few of my questions/thoughts (Rob could probably chime in as well, but I think his general thoughts on the project parallel mine):

1) The current BioPerl CPAN could just be a simple install script, acting like a 'Task' or 'Bundle' module, installing the actual Bio-specific distributions.  Doing it this way would allow you to iteratively split off additional code but retain the original Task/Bundle-based approach to installation.  For instance, the first pass could split out Root, then have a dependency-light and 'extras' distribution, 2nd round split further based on function, and so on:

  1st round (v 1.9)   :  BioPerl (just an installer) -> installs root, min-deps, extra-deps
  2nd round (v 1.901) :  BioPerl (just an installer) -> root, seq/feature, other-min-deps, extra-deps
  ...
  Xth round (v 1.99)  :  BioPerl (just an installer) -> root, tools, seq, tree, align, coord, map, everything-else
  ...

Also, one could potentially install modules in various ways: interactively, in predetermined groups, using a user-defined list, etc (one could effectively create custom BioPerl installs for GBrowse or other tools for instance).  Of course I would only pick the easiest route to start, but maybe that gives some ideas.  Regardless, if the dependency tree is set up correctly any reliance on other Bio* modules would be defined in the various Build.PL/Makefile.PL and then installed via CPAN (as is any dependency). 

2) The Bio::Root modules are probably the true core modules and are the most stable with regards to changes, so those could be moved to something like BioPerl-Core.  Beyond that, what are the proposed splits?  (we've discussed this on-list before, but it's appropriate to bring this up again)

3) How do we want to handle versioning?  We can't (and probably shouldn't) release everything on a synchronized versioning scheme (via Bio::Root::Version, for instance), that'll quickly fall apart.  Personally I can foresee each split-off dist having it's own version, with the BioPerl network of modules being in effect it's own mini-CPAN. 

5) Related to versioning, in my opinion we should maybe aim on eventually calling this BioPerl v2.0 and starting with a simpler X.Y versioning scheme.  Lincoln has already done something like this with Bio::Graphics, which was originally part of BioPerl but split off prior to v 1.6.0.

6) In some cases I can see particularly thorny problems, such as circular dependencies.  I can think of a few ways to address that (creating a simple lightweight Bio::Species class as a fallback if Bio::Tree code isn't present, for instance), but any additional thoughts on this would be helpful.  

7) Do we want to set up something like 'git submodule' for the devs to pull down all BioPerl-relevant code?

Other thoughts?

chris

On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote:

> Hey everyone,
> 
> I wanted to take a minute to introduce myself as one of the Google Summer of
> Code interns. I was the lucky one chosen to work on the BioPerl
> Reorganization (*crowd cheers*). I am a grad student in bioinformatics, and
> somewhat new to this level of programming so bear with me as I learn the
> technical jargon. Luckily I have both Rob and Chris to mentor me this
> summer!
> 
> Reading through the mailing list archives, I see there have been many
> discussion and differing opinions about tackling this project. Given the
> time frame for GSoC and my limited experience, there is no way I will
> complete this project on my own but I will at least be able to start it,
> which will hopefully motivate others to pitch in. So far, the plan for the
> GSoC project is to start by breaking out Bio::Root, followed by a couple
> other modules based on their dependencies and the time allowed. Each will be
> published to CPAN independently. You can follow the project (once it starts)
> on github at https://github.com/sheenams.
> 
> I look forward to collaborating with many of you on the reorganization (hint
> hint)!
> 
> Sheena
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l