[Bioperl-l] Tree refactor? was Re: Bootstrap, root, reroot...
Chris Fields
cjfields at illinois.edu
Thu Jul 16 05:44:40 UTC 2009
(Warning, longish response, I'll probably add to my blog at some point)
Robert,
I agree with you, but we've had this discussion before. Repeatedly,
actually. I have a page in the wiki dedicated to it, having first
raised the issue myself:
http://www.bioperl.org/wiki/Proposed_core_modules_changes
It also has a mention on the Core page:
http://www.bioperl.org/wiki/Core_package
In fact, I was planning on writing up a blog entry this week on this
very thing to get the ball rolling again, but it probably should go
here first anyway...
First: the problem we have consistently run into is exactly how to
deliver a core set of modules in a way that works both for users and
for release managers. We have settled on one of the original
proposals noted above, starting by roughly splitting up the current
'core' into something based on similar functions and level of
development/support. bioperl-dev was part of that, for instance, and
represents code we consider 'developer-only' or experimental. The
true 'core' would be a base set of modules with minimal additional
dependencies (see below for how nebulous this becomes).
If you haven't already noticed, prior to 1.6.0 Bio::Graphics basically
started the process (it's now an independent release on CPAN) and we
already have a bioperl-dev. As you've noted we can't split everything
up right from the beginning, but we have started down that path.
Second: Bio::Tree seems independent of the other modules, but that's
highly misleading. Bio::Species and Bio::Taxon (and thus anything that
will use said objects, like Bio::Seqs, which are very much core) are
now completely dependent on Bio::Tree code. Both are-a
Bio::Tree::NodeI, I believe since 1.5.2. If we split that code off it
then creates a circular dependency (Bio::Species, in core, requires
Bio::Tree in the bio-tree package, which in turn requires
Bio::Root::Root in the core package). Bio::Tree code also has a
Bio::DB::Taxonomy, thus expanding core a little bit more. Similarly,
Bio::Ontology classes are used by several key modules
(Bio::Annotation::OntologyTerm comes to mind, but also
Bio::Annotation::OntologyTerm). In other words, there are some parts
of core that can't easily be split off w/o repercussions (and thus
probably won't be).
Third: the largest issue in my opinion, that being what really
constitutes 'core', not just to us but to current bioperl users. To
me, the idea or a true 'core' is the bare essentials (Seq, Features,
Annotations, and some basic IO modules, the most common interfaces).
Should 'core' include SearchIO, or AlignIO? Remote and/or local DB
functionality? Bio::Tools? All of those are feasibly independent
sets of modules, and I would definitely support those being in their
own subdistributions and would be easier to fix bugs and release
updates, but I may be in the minority as they are extremely popular,
and many users still consider them 'core'. We need need a workaround
for that.
Finally (a wrap-up of bits and pieces): a) how are the various bio-*
packages to be maintained? Would there be several release pumpkins,
one for each release? b) How do we sort out versioning? For
instance, would bio-foo have a separate version (like Bio::Graphics
now does) and require a specific core version? c) I'm sure I have
forgotten a few things, but I've rambled on enough already.
</breather>
Now, my suggestions. We have settled on a general layout, so...
* Each subdistribution would have a separate version and require a
specific core (Bio::Root::Root) version. Note that Bio::Graphics is
using a different versioning scheme than BioPerl, but we may want to
stick to a similar tripartite numbering scheme as for core. Whatever
happens, this must be decided on first, as there will be no turning
back.
* We repurpose Bundle::BioPerl (or a similar Bundle::* package) or
make the BioPerl distribution itself a bundle-like installation. This
would be for packaging up an old-style 'everything and the kitchen
sink' core package from the various distributions. Anytime we split
off something into it's own distribution we release a newly trimmed-
down core and add the new distribution to the bundle or BioPerl.
Refer everyone to install the bundle if they want the old-style
installation.
* Other current subdistributions (run, db, network, etc) follow the
same pattern as the above. Releases for non-core distributions do not
have to be tied together with core except where needed.
* Avoid any circular dependencies (Bio::ASN1::EntrezGene, I'm staring
at you).
* As you mention, work these out on branches to test things out.
And finally, and I am saying this with the utmost respect and
sincerest thanks for everything Sendu is doing and has done for
BioPerl, but I'm not convinced we should keep using Bio::Root::Build.
It does make some things convenient, but at the cost of additional
bugs (2-3 at last count), some API breakage (some methods conflict
with Module::Build), and a bit of a chicken-and-egg dilemma that
particularly impacts subdistributions (attempting to fall back to
Module::Build doesn't work due to API issues). I can elaborate on
that more if asked, but I think this post is already long enough, so
I'll leave that to later.
chris
On Jul 15, 2009, at 8:05 PM, Robert Buels wrote:
> Rather than putting this in bioperl-dev, perhaps this would be a
> nice opportunity to make a new distribution called something
> standard like "Bio-Tree", with a standard directory structure, and a
> sane number of modules in it.
>
> I hadn't planned to start an actual battle about this yet, but I
> would just like to get it out there that the current 'huge
> monolithic distributions' model of BioPerl is completely insane.
> Talking to people about BioPerl at YAPC::NA last month, I saw that
> this is quite puzzling to the wider Perl community. I was going to
> say it was a laughingstock, but that's not actually the case. They
> are mostly puzzled and strongly suspect that it's not right. Well,
> the diplomatic ones do, anyway. Matt Trout (of DBIx::Class and
> Catalyst fame) would probably yell and curse about it in a very
> entertaining way.
>
> If things were in smaller distributions, making and testing releases
> would be a lot easier, because the pieces of code you're testing and
> releasing are smaller, and the dependencies among the pieces are
> characterized, codified, and enforced via the Build.PL files of each
> distribution.
>
> There, I said it.
>
> But aside from my inflammatory remarks above, this sort of thing
> need not happen all at once. The "Bio-Tree" distribution is a nice
> example of how things could be extracted from or begun outside the
> bioperl-* distributions, with the bioperl-* monolithic balls of mud
> getting smaller as things are moved from them into their own
> distributions. This needs to be done carefully, but so things like
> this should probably be done only with major releases, and with lots
> of notifications and release notes and things like that.
>
> OK, now that I've said "this sucks and needs to change", I now go on
> to volunteer to do work to make it happen. I will take and execute
> orders from you core developers saying things like "make a branch,
> take this list of modules, copy them into a new distribution, move
> their tests over, and write a Build.PL with the correct
> dependencies", and later "merge the moved_thing_somewhere" branch
> into the some_other_branch and test it". I bet somebody whose name
> rhymes with "Jay Hannah" would probably do grunt work to help with
> this also, but of course he would have to volunteer first. ;-) I
> also volunteer to help teach others how to do this, but they have to
> figure out how to use IRC.
>
> Oh, and I also volunteer to keep writing inflammatory emails.
>
> Rob
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list