[Bio-packaging] Making the case for GNU Guix ... advice sought

Cook, Malcolm MEC at stowers.org
Thu Feb 18 23:29:49 UTC 2016


Hi Ricardo,

Great specifics.  If you can help me solve a few of my remaining questions, in-line below, I'd be much obliged....

> Cook, Malcolm <MEC at stowers.org> writes:
 > 
 > > I have been asked to write up an argument for the advantages that GNU
 > > Guix confers for deploying linux software, especially scientific
 > > computing and including bioinformatics software.
 > >
 > > To that end I have written up this document
 > >
 > > 	https://github.com/malcook/sce/blob/master/MakingTheCase.org
 > 
 > Nice!

Thanks.  I hope it helps fellow-travelers later...

 > Here’s a first correction for a mistake that I’m responsible for: the
 > number of bioinfo packages is actually closer to 114 (not 54).  Guix web
 > was misconfigured on that host and it would show the status of a very,
 > very outdated version of Guix rather than the latest and greatest.
 > 
 > (Note that a very small number of the packages listed there are released
 > under restrictive “academic only” licenses or have undeclared licenses;
 > the license field says “non-free” or “undeclared”.  Having them on this
 > page does not mean I endorse the use of these tools and users should be
 > careful to check the license.)
 > 
 > > I have some notes contrasting GNU Guix with other similar/related
 > > tool-sets (modules, lmod, homebrew, spack, easyBuild, rolling rpms),
 > > as well as our current practices (sort of hybrid of rpms, bastard son
 > > of homebrew, and "just wedge it in there").  However I am not now
 > > intent upon setting up such a contrast, rather, I hope to focus on the
 > > advantages of GNU Guix in general.
 > 
 > I’d also like to point you at https://hal.inria.fr/hal-01161771/en, in
 > case you haven’t read it yet.  It includes a very general comparison
 > with tools like environment modules, spack, and easyBuild with a focus
 > on reproducibility.

Excellent - thanks - nice write-up - I'd somehow missed that earlier.
 
 > > + Guix detects and prohibits program name collisions; loading
 > >   conflicting packages into a users environment is _impossible_.  For
 > >   example, this prevents the ambiguity and associated error that can
 > >   arise when two packages define different programs by the same name
 > >   and both are placed in the user's execution PATH.
 > 
 > Well... I don’t know if this is worth pointing out.  Guix does detect
 > collisions but it doesn’t do anything intelligent here, maybe because
 > there isn’t much that can be done when a collision happens.  You mention
 > support for multiple profiles later, and I think that maybe this could
 > be merged.  I found collision detection to be not so useful because it
 > happens *during* profile generation, not *before* I commit to installing
 > a package.
 > > + Guix packages naturally extend to include all package resources,
 > >   including man pages, libraries, as well as binaries.  This is a
 > >   result of their using the underlying build engine's (i.e. gnu make)
 > >   installation targets.
 > 
 > Is there any package manager that does not handle this?

Alas, I have inherited a homebrew variant that is incapacitated in this regard.... but, enough of that!

> > + Guix packages, being independent of host on which they are built,
 > >   can be downloaded already built by upstream servers known as
 > >
 > [[https://www.gnu.org/software/guix/manual/html_node/Substitutes.html#Su
 > bstitutes][substitues]], with the assurance of their being bit-level identical
 > >   to the results of the generally longer process of configuration and
 > >   compilation on local servers.
 > 
 > Building packages in isolated environments is a necessary requirement
 > for bit-reproducibility, but it is not sufficient in itself.  To speak
 > of “assurance” is maybe a bit strong.
 > 
 > > + Guix packages, though dependent upon the machine architecture, are
 > >   independent on the linux distribution, or its version.  Thus, for
 > >   example, packages built under CentOS 6.5 on x86_64 will run under
 > >   operating system on x86_64, for example, CentOS 7.x and or Ubuntu
 > >   15.10 or 14.04.
 > 
 > This is true and it’s a great feature.  We’re using the very same Guix
 > store on various subversions of CentOS 6.x, a wide range of Ubuntus, and
 > Fedora.  “Linux distribution” is confusing, though.  I’d write
 > “GNU/Linux distribution”.  There *are* some kernel requirements, so the
 > version of Linux itself (i.e. the kernel version) does matter up to a
 > point for some of the advanced features of Guix (such as container
 > support).
 > 
 > > + Guix allows for unprivileged package management; users do not need
 > >   special elevated privileges (root or sudo) to create custom
 > >   environment profiles.  Especially noteworthy is that this allows
 > >   for rationally sharing package management as a distributed
 > >   responsibility.  This includes installing new site-available
 > >   applications (for those available in the guix repositories).  Thus,
 > >   if one user observes that a new release of a site-installed
 > >   application has become available, that user can safely install the
 > >   upgrade centrally and immediately start using it, without effecting
 >             ^—— what does this mean?  At the centre of ... what?
 > 
 > >   any other user's environment; without any further coordination,
 > >   when another user _is_ ready to adopt the upgrade, they will find
 > >   that installation is unnecessary, as it has already occurred.
 > 
 > I would clearly separate building and installing.  Installation is the
 > act of creating a new profile generation that contains a link to a
 > previously built (or substituted) item in the store.  Building (or
 > substituting) is what happens only once because of the purely functional
 > properties of the Guix package “functions”.
 > 
 > > + Guix provides a set of
 > [[https://www.gnu.org/software/guix/manual/guix.html#Build-Systems][build
 > systems]] providing support for language
 > >   specific package management systems, including R, perl, python,
 > >   ruby, haskell, and emacs.  This should allow a single approach for
 > >   managing computing environments for each of these languages/tools,
 > >   as opposed to needing to master the ideosyncracies of multiple
 > >   approaches, i.e. perlbrew for perl, pyenv or pythonbrew for python,
 > >   rbenv for ruby, etc.
 > 
 > Build systems are unrelated to “guix environment” or “rbenv”,
 > “virtualenv” and the like.  A build system is just a generalisation of
 > the steps that need to performed to build something.  For GNU-style
 > packages that’s
 > 
 >     ./configure --prefix=/something && make && make install
 > 
 > for Python it’s something like
 > 
 >     python setup.py install
 > 
> etc.  The build systems make *packaging* easier as we no longer need to
 > express the build procedure for R packages in terms of the GNU build
 > system (which would be inappropriate).  

Yes.  Agreed.  However, my emphasis here was intended to be that Guix can be used to obviate the need for rbenv, virtualenv, and friends.    I thought that `guix environment` was going to be an effective replacement for them.  Am I mistaken in this?   I hope not!  Assuming not, and if I understand your point, then I should write instead that this by virtue of guix's ability to set-up and tear down environments/profiles that not only specify versions of applications, but also libraries/plug-ins/modules for a variety of languages (ruby, perl, etc) and tools (emacs, etc).  You mention the importance of 'importers' below... perhaps it is the combination of available importers (for scaffolding the packaging from external repos) along with the ability to use `guix environment` to make them available in specified contexts.

Let me try.... Does this do better justice to characterizing the separation of concerns and attendant advantages:

 + Guix abstracts the idea of [[https://www.gnu.org/software/guix/manual/guix.html#Build-Systems][build systems]] to not only encompass the
   standard Gnu recipe "(configure; make; make check; make install)"
   but to also extend to common language/tool specific
   package/module/add-on managers, including those for R, perl,
   python, ruby, haskell, and emacs.  This should allow a single
   command line protocol for managing the building and deployment of
   such packages.  Additionally, when deployed using guix, such packages can
   also be enabled with guix using the `[[https://www.gnu.org/software/guix/manual/guix.html#Invoking-guix-environment][environment]]` sub-command,
   providing a unified approach where previously multiple tools might
   have been employed (perlbrew for perl, pyenv or pythonbrew for
   python, rbenv for ruby, etc.)


> This gives us the power of
 > abstraction, something that package managers like “conda” do not aim to
 > provide — there you need to ship a script along with the metadata file,
 > which is then run to build the package.  The script may need to do
 > little more than
 > 
 >     ./configure && make && make install
 > 
 > but it is a much less principled approach as bash has hardly the means
 > to provide for clear and easy abstraction.  (And to have to run
 > arbitrary shell scripts without any sort of isolation is not very
 > encouraging.)
 > 
 > The fact that build processes run as unprivileged, dedicated build users
 > is another feature: you don’t need to be afraid that the build script
 > breaks your precious home directory.  (See
 > https://github.com/MrMEEE/bumblebee-Old-and-
 > abbandoned/commit/a047be85247755cdbe0acce6f1dafc8beb84f2ac
 > for a shell script bug that isn’t all that rare and could really spoil
 > your day if you’re running a script like that without isolation.)

Heh - /usr rhymes with /lusr !

> > + Guix control of build processes down to level of
 > >   binary-compatibility, along with the management tools for
 > >   environment profiles can provide the basis for improvements in
 > >   reproducible pipelines, such as are begining to appear in
 > >   computational centers of excellence.
 > 
 > This sounds a bit confused.  When applying functional package management
 > ideas to a package and *all* of its dependencies recursively, you end up
 > with a directed acyclic graph (loops are “unrolled” by bootstrapping
 > with existing packages where possible) that is rooted in a very small
 > number of essential bootstrap binaries.  Together with build-time
 > isolation we get very close to a reproducible software stack.  The key
 > idea here, though, is functional package management because without it
 > build-time isolation wouldn’t be of much use.
 > 

Thanks.  Noted.

 > > + Guix provides commands to
 > >   [[https://www.gnu.org/software/guix/manual/guix.html#Emacs-
 > Commands][display
 > >   package information]] which can be used to automate the production
 > >   and publishing of software package catalogs.  Such catalogs may then
 > >   be shared, printed, emailed, or embedded into community-visible web
 > >   pages as various means of advertising package availability to the
 > >   research community.
 > 
 > Guix is a library, which allows for the Emacs interface and the Guix web
 > interface (which the MDC uses to display what software we have available
 > in Guix).
 > 
 > > + Guix package specifications, being written in
 > [[https://www.gnu.org/software/guix/manual/guix.html#Defining-
 > Packages][Guile/scheme]], do not
 > >   depend upon the users SHELL (i.e. guix works equally well with zsh,
 > >   bash, tcsh, etc) (TODO: confirm).
 > 
 > The user’s shell does not matter, but this is nothing to do with being
 > written in Scheme.  The Guix daemon spawns a shell on its own that does
 > not depend on the user’s environment.


What is not clear to me is whether bash is assumed elsewhere in guix.... for instance, the fact that 

	guix package --search-paths

reports environment variables in bash syntax.  Any insights here?


> > + Guix has a future - it is the package manager for the new
 > >   GNU-backed linux distribution,
 > [[https://www.gnu.org/software/guix/][GuixSD]].
 > 
 > “linux distribution” ...  GuixSD provides a variant of the GNU system
 > (which by default happens to use Linux as its kernel).

I will try and bear this distinction in mind.  

 > 
 > > + Guix is [[https://www.gnu.org/software/guix/manual/][well documented]].
 > 
 > Note that the manual on the website is somewhat outdated.  It’s the
 > manual for the latest 0.9.0 release (which is already quite some time
 > ago).  The latest version is only available in the git repository.
 > Maybe we should change this.
 > 
 >  > + Guix community has already prepared
 > [[https://www.gnu.org/software/guix/packages/][many recipes]], of which
 >  >   currently [[http://guix.mdc-berlin.de/packages?/?search=bioinfo][54 are
 > bioinformatics]] packages.
 > 
 > As I wrote above: it’s 114 (thanks for prompting me to fix the
 > configuration error that made it show 54 packages only).
 > 
 > > + Guix packaging is relatively easy to learn. It is reasonably
 > >   documented and there are `lint` style tools that check recipes for
 > >   being well-structured; they identify common errors in package
 > >   specification.
 > 
 > We also have importers (some with great, others with good results) so
 > that often there isn’t much work to be done at all.  We have importers
 > for CRAN and bioconductor, which worked pretty well for me; also the
 > hackage importer is great and it saved me a lot of time when I packaged
 > all missing dependencies for pandoc in about an afternoon.

Looking forward to that - getting pandoc and tex and Haskell and everything else lined up was a bear for me.

 > 
 > The importers are really very useful and I’d definitely mention them.
 > 
 > > + Guix development is open source.  It is open to input from all
 > >   community members.
 > 
 > It’s free software ;)
 > 
 > Maybe it’s also good to mention that you do not need to rely on Guix
 > upstream to add packages.  It is trivial to use custom packages with
 > GUIX_PACKAGE_PATH.
 > 
> > + Guix exposes the actual system calls to the package developer.
 > 
 > What does this mean?

Never mind.  Deleted.

> ~~ Ricardo

~~~Malcolm



More information about the bio-packaging mailing list