[Bioperl-l] bioperl reorganization

Jay Hannah jay at jays.net
Fri Jul 17 15:55:38 EDT 2009


Jason Stajich wrote:
> I'm curious how it will work that we'll have dozens of separate 
> distros that we'll have a hard time keeping track of what directory 
> things are in? Will there have to be a master list of what version and 
> what modules are in what distro now?
>
> When I do a SVN (or git) checkout do I need to checkout each of these 
> in its own directory?  Or will there be a master packaging script that 
> makes the necessary zip files for CPAN submission?


Perhaps my Catalyst experience would be a useful additional to this 
discussion. Catalyst is a popular web framework composed of dozens of 
CPAN distributions.

   http://www.catalystframework.org/

Users install Catalyst (cpan Catalyst), which is everything a user needs 
to build a basic website. The list of classes the user just installed is 
here:

   http://search.cpan.org/~flora/Catalyst-Runtime-5.80007/

Which lives in SVN here:

   http://dev.catalyst.perl.org/repos/Catalyst/Catalyst-Runtime/

As each user finds additional shiny things relevant to them on CPAN 
(Catalyst::* e.g. Catalyst::Plugin::FillInForm), they install those, 
individually (cpan Catalyst::Plugin::FillInForm).

All Catalyst::* distributions live in the same SVN repository, as 
entirely independent, ready-to-ship CPAN distributions:

   http://dev.catalyst.perl.org/repos/Catalyst/
   http://dev.catalyst.perl.org/repos/Catalyst/trunk/

So, as a new or veteran developer, when I find a bug in 
Catalyst::Plugin::FillInForm I patch it in SVN

   
http://dev.catalyst.perl.org/repos/Catalyst/Catalyst-Plugin-FillInForm/trunk/

and then, like any other CPAN distribution, I prep and push that 
distribution to PAUSE.

   -make my code changes-
   -vi Changes-
   -vi lib/Catalyst/Plugin/FillInForm.pm, increment VERSION-
   svn diff
   svn commit
   perl Makefile.PL
   make
   make test
   make manifest
   make dist
   make disttest
   ftp Catalyst-Plugin-FillInForm-0.11.tar.gz to pause.cpan.org:/incoming

That's it. I just upgraded Catalyst::Plugin::FillInForm from 0.10 to 0.11.

There is no "master list of what version and what modules are in what 
distro now". CPAN itself is that resource.

Bottom line, small parts of Catalyst are pushed out to CPAN *every day*. 
Very cool. Shocking when compared to the BioPerl release history on CPAN.

(Catalyst::Plugin::FillInForm happens to use Module::Install. But 
another author may prefer ExtUtils::MakeMaker, or Dist::Zilla, or 
Module::Build, or whatever. Each* Catalyst:: is an independent 
distribution that is free to shift slowly, or quickly, over time as 
developer interest dictates.)

(* Each meaning "tiny, highly inter-relevant groups of classes.")

Large, seismic shifts in Catalyst itself (Catalyst-Runtime) are a new 
branch in SVN, that can take a few months. Like this year's total 
reworking of Catalyst to use Moose internally (the move from the 5.70 
branch to the 5.80 branch).

But "total reworkings" of Catalyst can and do continue to happen because 
the "Catalyst" distribution (Catalyst-Runtime) is independent from the 
dozens of other great Catalyst:: packages available on CPAN.

So Catalyst:: is a loose federation of cooperative modules on CPAN tied 
together by namespace and the API of Catalyst-Runtime.


>   If they are in separate directories are we organizing by conceptual 
> topic (phylogenetics, alignment, database search) or by namespace of 
> the modules? Do all the 'database' modules live together - probably 
> not  - so do we name bioperl-db-remote bioperl-db-local-index, 
> bioperl-db-local-sql, etc?  really bioperl-db is somewhat focused on 
> sequences and features, but what about things that integrate multiple 
> data types - like biosql?

In the Catalyst development model CPAN namespace (package name), the SVN 
path, and distribution name are all the same. (Hopefully namespaces 
somewhat match conceptual topics. -grin-)


> If they are in separate directories, what about all the test data that 
> might be shared, is this replicated among all the sub-directories - 
> how do we do a good job keeping that up to date, could we have a 
> test-data distro instead with symlinks within SVN?

I don't believe Catalyst packages ever share test data. Is there lots of 
re-use of large amounts of test data by what should be separate 
distributions in BioPerl? I'm not familiar with SVN symlinks. I don't 
think Catalyst SVN has any.

(
   14:33 <@t0m> jhannah: you mean svn:externals, and yes, it's used by a
                load of the engines to steal the TestApp from -Runtime
   14:34 <@t0m> I'd be more tempted to make the test data it's own dist
                if that's sane.
)


> My nightmare is that we're going to have to manage a lot of 'use XX 
> 1.01' enforcing version requiring when dealing with the dependancies 
> on the interface classes and having to keep these all up to date?  The 
> version was implicit when they are all part of the same big distro.

Catalyst::Plugin::FillInForm has this in its Makefile.PL:

   requires 'Catalyst' => '5.7012';

CPAN then enforces and auto-installs dependencies for the users. Like 
the rest of CPAN, Catalyst lets CPAN enforce dependencies.

Doesn't that render most 'use XX 1.01' statements obsolete?


> Also the splits need not only include one namespace if need be I guess 
> but we have generally grouped things by namespace.

I believe all Catalyst distibutions are *very* cleanly split on 
namespace. I imagine not doing so would be a nightmare.


> I'm not convinced that the Bio::Graphics splitoff has been painless so 
> we should take stock of how that is working.

I'd like to hear about any pain so I could compare to Catalyst...


> How much of this effort is worth triaging on the current code versus 
> the efforts we want to make on a cleaner, simpler bioperl system that 
> appears to scare so many users (and potential developers) off.

One of the amazing things that happen in Catalyst and Moose frequently 
is that random people wander into irc.perl.org #catalyst or #moose and 
say "this is broke". If they seem to be clued then they get an SVN (or 
git) commit bit (on the specific directory of that distribution) after 
submitting a single patch, and then become CPAN co-maintainers of that 
package after the second patch. Soon they're improving that part of CPAN 
on their own.

The risk to the community is mitigated by the fact that even if jhannah 
breaks Catalyst::Plugin::FillInForm 0.11, most Catalyst users don't use 
that specific Plugin anyway. Also, CPAN has copies of the 4 previous 
versions of C::P::F sitting around all over the world for people to fall 
back to.

This leaves the hard-core Catalyst developers free to improve the 
central engine, rather than forcing them to focus on POD patches to the 
500 peripheral bits all the time.

Catalyst and Moose are amazingly delegated. It's hard not to end up with 
commit bits and CPAN co-maint of their small distributions when you 
express good ideas.

The small/rare bits get fixed by the few people that care. The wizards 
focus on the big picture changes.
 
...

I know all this is already happening in BioPerl SVN. It'd just be great 
if it happened on CPAN too.

Since I've been using bioperl-live directly for years I really haven't 
cared about CPAN release schedules. But if you're not a BioPerl 
developer, you probably pull CPAN only.


> Okay I rambled, hope that was helpful.

Ditto!! Only worse rambling, and probably less helpful.   :)

Jay Hannah
http://bioperl.org/wiki/User:Jhannah





More information about the Bioperl-l mailing list