[Bioperl-l] bioperl reorganization
Robert Buels
rmb32 at cornell.edu
Fri Jul 17 17:23:01 EDT 2009
I was going to write a longer post, but Jay wrote everything I was going
to write, plus more, and did a better job.
Jason Stajich wrote:
> For some other obvious modules that can be split off and self-contained,
> each of these could be a package. I would estimate more than 20
> packages depending on how Bio::Tools are carved up.
> - I think Bio::DB::SeqFeature needs to be split off for sure this is a
> nice logical peeling off. Could be another test case since it is a
> Gbrowse dependancy.
> - Bio::DB::GFF as well for the same reasons.
> - Bio::PopGen - self contained for the most part, but depends on
> Bio::Tree and Bio::Align objects
> - Bio::Variation
> - Bio::Map and Bio::MapIO
> - Bio::Cluster and Bio::ClusterIO
> - Bio::Assembly
> - Bio::Coordinate
Oh, this is a nice list. <saves it>
> What do you want to do about the bioperl-run. Do we make a set of
> parallel splits from all of these? I think at the outset we need to
> coordinate the applications supported here in some sort of loose
> ontology - the namespaces were not consistently applied so we have some
> alignment tools in different directories, etc. So the namespace sort of
> classifies them but it could be better. One of the challenges of
> multiple developers without a totally shared vision on how it should be
> done.
I would say that all alignment tools (for example) should probably not
all go into the same distribution. For example if Alice wrote some
alignment thing and Bob wrote some other thing, but they're not really
related beyond the fact that they do similar things and possibly depend
on similar things, they should go in separate distributions.
> I'm not convinced that the Bio::Graphics splitoff has been painless so
> we should take stock of how that is working.
Yes, lets. I would like to hear more about that.
> It seems like this split off would be a way to better streamline things
> in bioperl so that modern versions of bioperl might be able to better
> interface with things like Ensembl again too.
Once things are less monolithic, developing and releasing *should* be a
LOT easier. As Jay also mentioned a bit, it's more like on Tuesday
Charlie notices a bug in Bio::Foo::Bar, fixes it. Pushes it to CPAN
(with a small version bump) immediately afterward. Users pick it up via
Task::BioPerl. That's it.
Or, how about a slightly longer case study:
Say on Wednesday Charlie notices that the design of Bio::Foo::Bar sucks
and it really needs some work. He codes furiously for however long it
takes, makes Bio::Fooer::Bar or something like that, in a new
distribution, and pushes it to CPAN. Initially, no other modules are
going to be using it, but then say Jason, the maintainer of
Bio::SeqIO::fasta, notices that hey, Bio::Fooer::Bar is a lot better
than Bio::Foo::Bar. Then he can just use it, test his new
Bio::SeqIO::fasta with it, put it in his dist's Build.PL as a
dependency, and push to CPAN. Now it's getting pulled in with
Task::BioPerl and *USERS* now have been given that improvement, probably
in only a matter of days. There are automated tests at every step of
the process to ensure quality throughout.
Or for larger changes, coordination among several distros may be
necessary, but the nice thing is, exactly which ones those are is
codified in all their Build.PL files! Much less guessing and worrying
about unintended consequences. Things are abstracted into smaller
chunks, which are much easier for developers to wrap their minds around,
which means developing is easier, which leads to more contributors and
accelerated development.
> How much of this effort is worth triaging on the current code versus the
> efforts we want to make on a cleaner, simpler bioperl system that
> appears to scare so many users (and potential developers) off.
If there were not so many person-years of development time already in
BioPerl, I would probably be pushing for ground-up rewrite to simplify
things. But as chromatic frequently says (he's fantastic, look him up),
ground-up rewrites of large projects almost never work. You lose a year
(or multiple years) of person time rewriting instead of adding features,
or if you also add features to the old version in parallel, you have to
also port those features to the new version (over a really long time
period). It's theoretically possible to do, but in practice it almost
never works, he says. I don't know, I've never been involved in an
attempt like that from start to finish.
> Okay I rambled, hope that was helpful.
Quite helpful! Please keep it up if you can!
Rob
More information about the Bioperl-l
mailing list