[Bioperl-l] bioperl reorganization

Sat Jul 18 01:31:29 UTC 2009

On Jul 17, 2009, at 4:08 AM, Robert Buels wrote:

> Chris Fields wrote:
>> Yes, I agree.  However a large set of modules in bioperl were  
>> effectively donated by the author, so they will fall to the core  
>> devs to maintain by sheer property of legacy.
>
> This is a very sticky point.  The only way I can think of would be  
> to have each distro have a "principal maintainer", that is the go-to  
> guy for issues related to keeping it running, but can beg and cajole  
> others to help.  At least there will be fewer problems per  
> distribution, since they would be smaller.  If a maintainer has to  
> stop, he has to find somebody else to do it, or the package sits  
> there and bit rot sets in. That's just how it goes.  If it's  
> important enough (like if it's depended on by a dist that IS  
> maintained), somebody will pick it up.

Just so this isn't misunderstood, much of that code is fairly stable  
so I don't think it will be a significant problem, and it can be  
addressed at a later point.

I think if we trim off enough of the current distribution the issue  
won't matter in the long term.  I do think any legacy code will have  
to fall to the core devs for the primary reason that if bit rot does  
set in (and no one is maintaining critical modules) we can easily  
switch maintainers, fix bugs, and drop a CPAN release.

We have a bioperl-specific account on CPAN that makes it easy.  All of  
the code is currently under that name anyway so it might as well stay  
there for the time being.

>> On bugs:
> <snip>
>> On API and the 'chicken-or-egg' issue:
> <snip>
>> What I would like is have the various breakaway Bio::* either fall  
>> back to Module::Build if Bio::Root::Build isn't present, or just  
>> use Module::Build.  My suggestion is to just use Module::Build  
>> directly, but we could scale down Bio::Root::Build to respect the  
>> Module::Build API (thus allowing it as a fallback).
> I'm not sure about this, I'm not an expert on the ins and outs  of  
> subclassing Module::Build.
>
> One idea I do have, however, is that we might think about using an  
> xt/ directory for intensive and network-based tests that are not  
> meant to be run by automated installers, which could help simplify  
> the test and build code.  I've heard that this is a pretty common  
> practice in other projects.

That's a possibility.  I have already started towards a few of those  
bug fixes, but I would rather they be *Module::Build* bugs, not  
bioperl ones (i.e. if we go with their API, it should be their bug ;)

> =====================
>
> Anyway, let's develop some concrete plans. I would say that the plan  
> at http://www.bioperl.org/wiki/Proposed_core_modules_changes is a  
> half-measure, in light of the successful (painless?) Bio::Graphics  
> extraction.
>
> Here's a new proposal:
>
> 1.) renew/construct the Bundle/Task::Bioperl, get it pulling in all  
> the current Bioperl modules as dependencies (or however it works)
>
> 2.) start repeating the same extraction procedure used with  
> Bio::Graphics:
>  * identify a candidate set of modules in bioperl-live to be  
> extracted into their own distribution, propose the extraction on the  
> mailing list, get some kind of agreement
>  * make a new component in the svn repository (alongside the bioperl- 
> live and other dirs) named something like Bio-Something-Something,  
> with trunk/, branches/, and tags/ subdirs.
>  * svn cp modules into the new trunk/lib/, tests into trunk/t,  
> scripts into trunk/scripts, and write a Build.PL just like the one  
> Lincoln wrote for Bio::Graphics.
>  * when the extracted copy looks good, use svn merge to port any  
> changes that happened in trunk to the new extracted modules if  
> necessary and test.
>  * delete the old copy from bioperl-live/trunk.
>  * identify a new candidate set of modules, propose on the mailing  
> list, and repeat

We may have to think a bit outside of just namespaces alone.  Some  
(like EUtilities) are present in more than one. These would also have  
to be in line with what others want (so now's the time to chime in).

If going strictly on namespace, these may be easiest:

* Assembly
* Biblio
* Cluster/ClusterIO
* Coordinate
* Draw (modules in the Graphics namespace that weren't related to  
Bio::Graphics)
* Expression
* Map
* Matrix
* Microarray
* Restriction
* MolEvol
* Phenotype
* PhyloNetwork
* PopGen
* SeqEvolution
* Structure
* Symbol (may be deprecated)
* Taxonomy (is deprecated, so don't bother)
* Variation

These probably a little trickier:

* Search/SearchIO
* Align/AlignIO
* Index/DB/Das (General)
* Tools (very tricky, as there are several outside requirements)

It's possible (and probably best) that these be grouped by function.   
MolEvol, Phenotype, PopGen, PhyloNetwork, SeqEvolution, for instance  
could go into a general evol package.

> 2.5) continue releasing 1.6.X bugfix releases while this is going on.

Speaking of, I want to push an alpha out in the next week or two for  
1.6.x (may be 1.6.2 in order to sync run with the others).

> 3.) when bioperl-live is down to a truly reasonable core set, (fewer  
> than 10 modules might be a good target), rename it to Bio-Perl-Core,  
> go through a round of testing, and push them all to CPAN at once.  
> Task::BioPerl will have dependencies on the module names, I think,  
> so it will continue to install the same from users' perspectives, it  
> will just be downloading different dists.

I don't think it's possible to get it to 10 as there are too many  
interrelated modules.  That is, unless you subscribe to the more  
extreme core==Root, and we whittle core down to those root-based  
modules.

> 4.) repeat steps 1-3 with bioperl-run, and maybe others.

bioperl-db will probably need to stay largely intact, with the  
possible exception of the DB-specific modules for mysql, pg, oracle,  
etc. bioperl-network is pretty self-contained as well; it doesn't make  
much sense to split it up completely.

> Thoughts?  If people like it, I or somebody else could put it on the  
> wiki.
>
> And of course, I volunteer to put in a lot of work on this.  I'll  
> try to see if I can identify some other likely extraction candidates  
> as a preliminary step and report back to the list.
>
> Also we need some more people besides just me and Chris talking and  
> thinking about this, these are large reshufflings being proposed.
>
> Rob

As mentioned before, I think most are on board.  It comes down to  
exactly how we package these smaller distributions.  I don't think we  
can simply dump individual modules into CPAN; IIRC Sendu corresponded  
with Andreas König about this and was strongly dissuaded from doing it  
(focused packages were promoted instead, but maybe Sendu can elaborate  
more on that).

chris