[Bio-packaging] Making the case for GNU Guix ... advice sought

Ricardo Wurmus ricardo.wurmus at mdc-berlin.de
Tue Feb 9 15:03:13 UTC 2016


Cook, Malcolm <MEC at stowers.org> writes:

> I have been asked to write up an argument for the advantages that GNU
> Guix confers for deploying linux software, especially scientific
> computing and including bioinformatics software.
> 	
> To that end I have written up this document 
>
> 	https://github.com/malcook/sce/blob/master/MakingTheCase.org

Nice!

Here’s a first correction for a mistake that I’m responsible for: the
number of bioinfo packages is actually closer to 114 (not 54).  Guix web
was misconfigured on that host and it would show the status of a very,
very outdated version of Guix rather than the latest and greatest.

(Note that a very small number of the packages listed there are released
under restrictive “academic only” licenses or have undeclared licenses;
the license field says “non-free” or “undeclared”.  Having them on this
page does not mean I endorse the use of these tools and users should be
careful to check the license.)

> I have some notes contrasting GNU Guix with other similar/related
> tool-sets (modules, lmod, homebrew, spack, easyBuild, rolling rpms),
> as well as our current practices (sort of hybrid of rpms, bastard son
> of homebrew, and "just wedge it in there").  However I am not now
> intent upon setting up such a contrast, rather, I hope to focus on the
> advantages of GNU Guix in general.

I’d also like to point you at https://hal.inria.fr/hal-01161771/en, in
case you haven’t read it yet.  It includes a very general comparison
with tools like environment modules, spack, and easyBuild with a focus
on reproducibility.

> + Guix detects and prohibits program name collisions; loading
>   conflicting packages into a users environment is _impossible_.  For
>   example, this prevents the ambiguity and associated error that can
>   arise when two packages define different programs by the same name
>   and both are placed in the user's execution PATH.

Well... I don’t know if this is worth pointing out.  Guix does detect
collisions but it doesn’t do anything intelligent here, maybe because
there isn’t much that can be done when a collision happens.  You mention
support for multiple profiles later, and I think that maybe this could
be merged.  I found collision detection to be not so useful because it
happens *during* profile generation, not *before* I commit to installing
a package.

> + Guix packages naturally extend to include all package resources,
>   including man pages, libraries, as well as binaries.  This is a
>   result of their using the underlying build engine's (i.e. gnu make)
>   installation targets.

Is there any package manager that does not handle this?

> + Guix packages, being independent of host on which they are built,
>   can be downloaded already built by upstream servers known as
>   [[https://www.gnu.org/software/guix/manual/html_node/Substitutes.html#Substitutes][substitues]], with the assurance of their being bit-level identical
>   to the results of the generally longer process of configuration and
>   compilation on local servers.

Building packages in isolated environments is a necessary requirement
for bit-reproducibility, but it is not sufficient in itself.  To speak
of “assurance” is maybe a bit strong.

> + Guix packages, though dependent upon the machine architecture, are
>   independent on the linux distribution, or its version.  Thus, for
>   example, packages built under CentOS 6.5 on x86_64 will run under
>   operating system on x86_64, for example, CentOS 7.x and or Ubuntu
>   15.10 or 14.04.

This is true and it’s a great feature.  We’re using the very same Guix
store on various subversions of CentOS 6.x, a wide range of Ubuntus, and
Fedora.  “Linux distribution” is confusing, though.  I’d write
“GNU/Linux distribution”.  There *are* some kernel requirements, so the
version of Linux itself (i.e. the kernel version) does matter up to a
point for some of the advanced features of Guix (such as container
support).

> + Guix allows for unprivileged package management; users do not need
>   special elevated privileges (root or sudo) to create custom
>   environment profiles.  Especially noteworthy is that this allows
>   for rationally sharing package management as a distributed
>   responsibility.  This includes installing new site-available
>   applications (for those available in the guix repositories).  Thus,
>   if one user observes that a new release of a site-installed
>   application has become available, that user can safely install the
>   upgrade centrally and immediately start using it, without effecting
            ^—— what does this mean?  At the centre of ... what?

>   any other user's environment; without any further coordination,
>   when another user _is_ ready to adopt the upgrade, they will find
>   that installation is unnecessary, as it has already occurred.

I would clearly separate building and installing.  Installation is the
act of creating a new profile generation that contains a link to a
previously built (or substituted) item in the store.  Building (or
substituting) is what happens only once because of the purely functional
properties of the Guix package “functions”.

> + Guix provides a set of [[https://www.gnu.org/software/guix/manual/guix.html#Build-Systems][build systems]] providing support for language
>   specific package management systems, including R, perl, python,
>   ruby, haskell, and emacs.  This should allow a single approach for
>   managing computing environments for each of these languages/tools,
>   as opposed to needing to master the ideosyncracies of multiple
>   approaches, i.e. perlbrew for perl, pyenv or pythonbrew for python,
>   rbenv for ruby, etc.

Build systems are unrelated to “guix environment” or “rbenv”,
“virtualenv” and the like.  A build system is just a generalisation of
the steps that need to performed to build something.  For GNU-style
packages that’s

    ./configure --prefix=/something && make && make install

for Python it’s something like

    python setup.py install

etc.  The build systems make *packaging* easier as we no longer need to
express the build procedure for R packages in terms of the GNU build
system (which would be inappropriate).  This gives us the power of
abstraction, something that package managers like “conda” do not aim to
provide — there you need to ship a script along with the metadata file,
which is then run to build the package.  The script may need to do
little more than

    ./configure && make && make install

but it is a much less principled approach as bash has hardly the means
to provide for clear and easy abstraction.  (And to have to run
arbitrary shell scripts without any sort of isolation is not very
encouraging.)

The fact that build processes run as unprivileged, dedicated build users
is another feature: you don’t need to be afraid that the build script
breaks your precious home directory.  (See
https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/commit/a047be85247755cdbe0acce6f1dafc8beb84f2ac
for a shell script bug that isn’t all that rare and could really spoil
your day if you’re running a script like that without isolation.)

> + Guix control of build processes down to level of
>   binary-compatibility, along with the management tools for
>   environment profiles can provide the basis for improvements in
>   reproducible pipelines, such as are begining to appear in
>   computational centers of excellence.

This sounds a bit confused.  When applying functional package management
ideas to a package and *all* of its dependencies recursively, you end up
with a directed acyclic graph (loops are “unrolled” by bootstrapping
with existing packages where possible) that is rooted in a very small
number of essential bootstrap binaries.  Together with build-time
isolation we get very close to a reproducible software stack.  The key
idea here, though, is functional package management because without it
build-time isolation wouldn’t be of much use.

> + Guix provides commands to
>   [[https://www.gnu.org/software/guix/manual/guix.html#Emacs-Commands][display
>   package information]] which can be used to automate the production
>   and publishing of software package catalogs.  Such catalogs may then
>   be shared, printed, emailed, or embedded into community-visible web
>   pages as various means of advertising package availability to the
>   research community.

Guix is a library, which allows for the Emacs interface and the Guix web
interface (which the MDC uses to display what software we have available
in Guix).

> + Guix package specifications, being written in [[https://www.gnu.org/software/guix/manual/guix.html#Defining-Packages][Guile/scheme]], do not
>   depend upon the users SHELL (i.e. guix works equally well with zsh,
>   bash, tcsh, etc) (TODO: confirm).

The user’s shell does not matter, but this is nothing to do with being
written in Scheme.  The Guix daemon spawns a shell on its own that does
not depend on the user’s environment.

> + Guix has a future - it is the package manager for the new
>   GNU-backed linux distribution, [[https://www.gnu.org/software/guix/][GuixSD]].

“linux distribution” ...  GuixSD provides a variant of the GNU system
(which by default happens to use Linux as its kernel).

> + Guix is [[https://www.gnu.org/software/guix/manual/][well documented]].

Note that the manual on the website is somewhat outdated.  It’s the
manual for the latest 0.9.0 release (which is already quite some time
ago).  The latest version is only available in the git repository.
Maybe we should change this.

 > + Guix community has already prepared [[https://www.gnu.org/software/guix/packages/][many recipes]], of which
 >   currently [[http://guix.mdc-berlin.de/packages?/?search=bioinfo][54 are bioinformatics]] packages.

As I wrote above: it’s 114 (thanks for prompting me to fix the
configuration error that made it show 54 packages only).

> + Guix packaging is relatively easy to learn. It is reasonably
>   documented and there are `lint` style tools that check recipes for
>   being well-structured; they identify common errors in package
>   specification.

We also have importers (some with great, others with good results) so
that often there isn’t much work to be done at all.  We have importers
for CRAN and bioconductor, which worked pretty well for me; also the
hackage importer is great and it saved me a lot of time when I packaged
all missing dependencies for pandoc in about an afternoon.

The importers are really very useful and I’d definitely mention them.

> + Guix development is open source.  It is open to input from all
>   community members.

It’s free software ;)

Maybe it’s also good to mention that you do not need to rely on Guix
upstream to add packages.  It is trivial to use custom packages with
GUIX_PACKAGE_PATH.

> + Guix exposes the actual system calls to the package developer.

What does this mean?

~~ Ricardo


More information about the bio-packaging mailing list