[Biopython-dev] Fwd: Breaking up the code base / modular installation

Thu Jul 17 09:53:11 UTC 2014

Dear Biopythoneers,

To anyone in Boston last week for BOSC 2014 or its preceding
CodeFest (organised by Brad Chapman) who I missed saying
hello to - sorry.

This email is to bring up one of the things we (me, Bow Arindrarto,
Eric Talevich, and others including Laurent Gautier from rpy2,
Chris Fields from BioPerl, Pjotr Prins from BioRuby/BioGems
and briefly also Brad Chapman) talked about at the CodeFest.

** Breaking up the code base / modular installation **

Currently BioPerl is still undergoing a similar process using
the CPAN infrastructure, but this has proved complex due
to many circular interdependencies which have proved hard
to untangle.

Meanwhile BioRuby has shifted over to their BioGem setup
using the quite mature Ruby Gem packing system, leaving
BioRuby's core as a relative small stable unit of code:
http://biogems.info/

You might enjoy the video of Pjotr giving the BioRuby project
update (sadly there was no BioPerl update talk this year):
http://video.open-bio.org/video/15/bioruby-and-distributed-development

I believe Pjotr is interested in the possibility of expanding the
BioGems website to track BioPerl or Biopython modules in
the future - it was designed with the goal of highlighting the
user contributed modules with links to automated test results
etc.

Biopython has tried to balance conflicting needs with a
minimal install time dependency set (a C compiler and
NumPy under C Python, not available on Jython and PyPy)
and the inclusion of "beta" modules under an experimental
warnings system.

However, we have accumulated a growing number of soft
dependencies imported on use (e.g. ReportLab, matplotlib,
SciPy, ...) which for many platforms (e.g. Linux) it would be
nice to have installable automatically (even if this is harder
on Windows).

During the CodeFest, Bow was able to try out a couple of
experiments using the Test Python Package Index (test PyPI,
https://wiki.python.org/moin/TestPyPI ) for a potential division
of the Biopython "monolith" into a set of inter-dependent
modules.

i.e. A proof-of-principle shared namespace is feasible within
the PyPI system.

(Bow was also trying alternative namespaces, and while
that and a version bump might be appropriate if we go this
route, we could equally stick with the Bio.* namespace for
now.)

The state of Python packaging and best practises remains
a little murky, with competing efforts serving different design
goals. This has put me off attempting to breakup the Biopython
codebase before. However, with Python 3.4 it seems PIP is
going to be the future standard we should target?:
https://docs.python.org/3.4/installing/index.html

Note that there is some precedent in how some Linux distributions
package Biopython now - e.g. Debian splits off our BioSQL
wrappers from the main Bio.* modules. That would be one
natural split I would expect to see as a unit under these
proposals (Biopython's BioSQL module as a separate unit).

Note that for any modularisation plan I would want us to
continue to have a "standard set" of module equivalent to
the current Biopython which can easily be installed in one go,
especially under Windows where a single Windows Installer
is highly desirable.

GitHub is perhaps not the best place for an open debate, so
let's try to continue this discussion here on the mailing list.

Have a read of https://github.com/biopython/biopython/issues/349
and then post back here please.

Let's be clear this will be a lot of work, but on the other hand
the status quo is causing some people frustration.

Thanks,

Peter