knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(rix)
If you are familiar with the concept of package managers, you can skip to the next section.
To ensure that a project is reproducible you need to deal with at least four things:
{rJava}
R package on Linux);For the three first bullet points, the consensus seems to be a mixture of Docker
to deal with system dependencies, {renv}
for the packages (or {groundhog}
,
or a fixed CRAN snapshot like those Posit
provides)
and the R installation manager to install the
correct version of R (unless you use a Docker image as base that already ships
the required version by default).
As for the last bullet point, the only way out is to be able to compile the software for the target architecture. There's a lot of moving pieces, and knowledge that you need to have to get it right.
But it turns out that this is not the only solution. Docker + {renv}
(or some
other way to deal with R packages) is likely the most popular way to ensure
reproducibility of your projects, but there are other tools to achieve this. One
such tool is called Nix.
Nix is a package manager for Linux distributions, macOS and it even works on Windows if you enable WSL2. What's a package manager? If you're not a Linux user, you may not be aware. Let me explain it this way: in R, if you want to install a package to provide some functionality not included with a vanilla installation of R, you'd run this:
install.packages("dplyr")
It turns out that Linux distributions, like Ubuntu for example, work in a similar way, but for software that you'd usually install using an installer (at least on Windows). For example you could install Firefox on Ubuntu using:
sudo apt-get install firefox
(there's also graphical interfaces that make this process "more user-friendly").
In Linux jargon, packages
are simply what we call software (or I guess
it's all "apps" these days). These packages get downloaded from so-called
repositories (think of CRAN, the repository of R packages) but for any type of
software that you might need to make your computer work: web browsers, office
suites, multimedia software and so on.
So Nix is just another package manager that you can use to install software.
But what interests us is not using Nix to install Firefox, but instead to install R and the R packages that we require for our analysis (or any other programming language that we need). But why use Nix instead of the usual ways to install software on our operating systems?
The first thing that you should know is that Nix's repository, nixpkgs
, is
huge. Humongously huge. As I'm writing these lines, there's more than 120'000
pieces of software available, and the
entirety of CRAN and Bioconductor is also available through nixpkgs
. So
instead of installing R as you usually do and then use install.packages()
to
install packages, you could use Nix to handle everything. But still, why use Nix
at all?
Nix has an interesting feature: using Nix, it is possible to install software in (relatively) isolated environments. So using Nix, you can install as many versions of R and R packages that you need. Suppose that you start working on a new project. As you start the project, with Nix, you would install a project-specific version of R and R packages that you would only use for that particular project. If you switch projects, you'd switch versions of R and R packages.
If you are familiar with {renv}
, you should see that this is exactly
the same thing: the difference is that not only will you have a project-specific
library of R packages, you will also have a project-specific R version. So if
you start a project now, you'd have R version 4.2.3 installed (the latest
version available in nixpkgs
but not the latest version available, more on
this later), with the accompagnying versions of R packages, for as long as the
project lives (which can be a long time). If you start a project next year, then
that project will have its own R, maybe R version 4.4.2 or something like that,
and the set of required R packages that would be current at that time. This is
because Nix always installs the software that you need in separate, (isolated)
environments on your computer. So you can define an environment for one specific
project.
But Nix goes even further: not only can you install R and R packages using
Nix (in isolated) project-specific environments, Nix even installs the required
system dependencies. So for example if I need {rJava}
, Nix will make sure to
install the correct version of Java as well, always in that project-specific
environment (so if you already some Java version installed on your system, there
won't be any interference).
What's also pretty awesome, is that you can use a specific version of nixpkgs
to always get exactly the same versions of all the software whenever you
build that environment to run your project's code. The environment gets defined
in a simple plain-text file, and anyone using that file to build the environment
will get exactly, byte by byte, the same environment as you when you initially
started the project. And this also regardless of the operating system that is
used.
Nix is a package manager that can be installed on your computer (regardless of
OS) and can be used to install software like with any other package manager. If
you're familiar with the Ubuntu Linux distribution, you likely have used
apt-get
to install software. On macOS, you may have used homebrew
for
similar purposes. Nix functions in a similar way, but has many advantages over
classic package managers, as it focuses on reproducible builds and downloads
packages from nixpkgs
,
currently the largest software repository.
This means that using Nix, it is possible to install not only R, but also all
the packages required for your project. The obvious question is why use Nix
instead of simply installing R and R packages as usual. The answer is that Nix
makes sure to install every dependency of any package, up to required system
libraries. For example, the {xlsx}
package requires the Java programming
language to be installed on your computer to successfully install. This can be
difficult to achieve, and {xlsx}
bullied many R developers throughout the
years (especially those using a Linux distribution, sudo R CMD javareconf
still plagues my nightmares).
But with Nix, it suffices to declare that we want the {xlsx}
package for our
project, and Nix figures out automatically that Java is required and installs
and configures it. It all just happens without any required intervention from
the user. The second advantage of Nix is that it is possible to pin a certain
revision of the Nix packages' repository (called nixpkgs
) for our project.
Pinning a revision ensures that every package that Nix installs will always be
at exactly the same versions, regardless of when in the future the packages get
installed.
The idea of {rix}
is for you to declare the environment you need using the
provided rix()
function. rix()
is the package's main function and generates
a file called default.nix
which is then used by the Nix package manager to
build that environment. Ideally, you would set up such an environment for each
of your projects. You can then use this environment to either work
interactively, or run R scripts. It is possible to have as many environments as
projects, and software that is common to environments will simply be re-used and
not get re-installed to save space. Environments are isolated for each other,
but can still interact with your system's files, unlike with Docker where a
volume must be mounted. Environments can also interact with the software
installed on your computer through the usual means, which can sometimes lead to
issues. For example, if you already have R installed, and a user library of R
packages, more caution is required to properly use environments managed by Nix.
It is important at this stage to understand that you should not call
install.packages()
from a running Nix environment. If you want to add packages
to a Nix environment while analyzing data, you need to add it the default.nix
expression and rebuild the environment. This is explained in greater detail in
vignette("d1-installing-r-packages-in-a-nix-environment")
.
To avoid interference between your main R library of packages and your Nix
environments, calling rix()
will also run rix_init()
, which will create a
custom .Rprofile
in the project's directory. This .Rprofile
will ensure that
if you have a user library of packages, these won't get loaded by an R version
running in a Nix shell. It will also redefine install.packages()
to throw an
error if you try to use it.
rix()
has several arguments:
.tar.gz
format to install;default.nix
file (by default the current working directory)For example:
rix( r_ver = "latest-upstream", r_pkgs = c("dplyr", "chronicler"), ide = "none" )
The call above writes a default.nix
file in the current working directory.
This default.nix
can in turn be used by Nix to build an environment containing
the latest version of R included in the upstream nixpkgs
, with the {dplyr}
and {chronicler}
packages.
Take note of the ide = "none"
argument: this argument, and the values it
can take, are further discussed in the vignette vignette("e-configuring-ide")
but continue reading this vignette and then vignettes numbered by a "d".
You can instead provide a specific R version:
rix( r_ver = "4.4.1", r_pkgs = c("dplyr", "chronicler"), ide = "none" )
or a date:
rix( date = "2024-12-14", r_pkgs = c("dplyr", "chronicler"), ide = "none" )
Providing a specific R version or a date will not use the upstream/official
nixpkgs
package repository, but our rstats-on-nix
fork. We decided to use a
fork instead of the official nixpkgs
for older releases, because it allows us
to backport fixes and improve package compatibility, which is espcially
important for Apple Silicon. It also allows us to provide newer releases of R
more quickly than through the official channels. For example, as of writing
(December 18th 2024), R version 4.4.2 is not yet included in upstream nixpkgs
but
is already available through our fork. Thanks to our fork, we are also able to
snapshot CRAN more frequently, and thus include more CRAN and Bioconductor
package versions than in the upstream nixpkgs
repository.
{rix}
also includes an renv2nix()
function that converts an renv.lock
file into a valid Nix expression. Read the vignette vignette("f-renv2nix")
to learn more.
The Nix package manager can be used to build reproducible development
environments according to the specifications found in the generated
default.nix
files, which contain a Nix expression. An expression is Nix
jargon for a function with multiple inputs and one output, this output being our
development environment. {rix}
does not require Nix to be installed to
generate valid expressions (but does require an internet connection), so you
could generate expressions and use them on other machines. To actually build an
environment using a default.nix
file, go to where you chose to write it
(ideally in a new, empty folder that will be the root folder of your project)
and use the Nix package manager to build the environment. Call the following
function in a terminal:
nix-build
Nix install packages in a dedicated folder on your computer, called the Nix store.
Once Nix is done building the environment, you can start working on it interactively by using the following command in a terminal emulator (not the R console):
nix-shell
You will drop into a Nix shell which provides the installed software. It is not
mandatory to call nix-build
first: you can immediately call nix-shell
. The
advantage of using nix-build
first is that it creates a file called result
which will prevent the environment to get garbage collected if you clean
the Nix store.
If you want to build an environment for an older version of R, you might get a warning telling you that you cannot build the expression, but that you can directly drop into it.
If you want to completely isolate your Nix environment from the rest of the
system, we recommend using nix-shell --pure
to drop into the environment, as
described in the documentation of rix_init()
.
Finally, if you want to delete an environment, delete the result
file first
(if you used nix-build
) and then call nix-store --gc
, which will delete all
the orphaned packages.
Now that you know more about Nix and {rix}
, it is time to get these tools
installed on your system.
vignette("b1-setting-up-and-using-rix-on-linux-and-windows")
vignette("b2-setting-up-and-using-rix-on-macos")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.