knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Adhering to sound versioning practices is crucial for ensuring the reproducibility of software. Despite the expertise in software engineering, the ever-growing complexity and continuous development of new, potentially disruptive features present significant challenges in maintaining code functionality over time. This pertains not only to backward compatibility but also to future-proofing. When code handles critical production loads and relies on numerous external software libraries, it's likely that these dependencies will evolve. Infrastructure-as-code and other DevOps principles shine in addressing these challenges. However, they may appear less approachable and more labor-intensive to set up for the average R developer.
Are you ready to test your custom R functions and system commands in a a different environment with isolated software builds that are both pure at build and at runtime, without leaving the R console?
Let's introduce with_nix()
. with_nix()
will evaluate custom R code or shell
commands with command line interfaces provided by Nixpkgs in a Nix environment,
and thereby bring the read-eval-print-loop feeling. Not only can you evaluate
custom R functions or shell commands in Nix environments, but you can also bring
the results back to your current R session as R objects.
We aim to accommodate various use cases, considering a gradient of declarativity in individual or sets of software environments based on personal preferences. There are two main modes for defining and comparing code running through R and system commands (command line interfaces; CLIs)
with_nix()
from, too. You are probably on the way of getting a
passionate Nix user.Carefully curated software improves over time, so does R. We pick an example from the R changelog, the following literal entry in R 4.2.0:
as.vector()
gains a data.frame
method which returns a simple named
list, also clearing a long standing 'FIXME' to enable
as.vector(<data.frame>, mode ="list")
. This breaks code relying on
as.vector(<data.frame>)
to return the unchanged data frame."The goal is to illustrate this change in behavior from R versions 4.1.3 and before to R versions 4.2.0 and later.
First, we write a default.nix
file containing Nix expressions that pin R
version 4.1.3 from Nixpkgs.
library("rix") path_env_1 <- file.path(".", "_env_1_R-4-1-3") rix( r_ver = "4.1.3", overwrite = TRUE, project_path = path_env_1 )
The following expression is written to default.nix in the subfolder
./_env_1_R-4-1-3/
.
let pkgs = import (fetchTarball "https://github.com/rstats-on-nix/nixpkgs/archive/2022-04-19.tar.gz") {}; system_packages = builtins.attrValues { inherit (pkgs) glibcLocales nix R; }; in pkgs.mkShell { LOCALE_ARCHIVE = if pkgs.system == "x86_64-linux" then "${pkgs.glibcLocales}/lib/locale/locale-archive" else ""; LANG = "en_US.UTF-8"; LC_ALL = "en_US.UTF-8"; LC_TIME = "en_US.UTF-8"; LC_MONETARY = "en_US.UTF-8"; LC_PAPER = "en_US.UTF-8"; LC_MEASUREMENT = "en_US.UTF-8"; buildInputs = [ system_packages ]; }
This also includes a custom .Rprofile
file that ensure that this subshell
will not load any packages installed to the user's library of packages.
We now have set up the configuration for R 4.1.3 set up in a default.nix
file
in the folder ./_env_1_R-4-1-3
. Since you are sure you are using an R version
higher 4.2.0 available on your system, you can check what that
as.vector.data.frame()
S3 method returns a list.
df <- data.frame(a = 1:3, b = 4:6) as.vector(x = df, mode = "list") #> $a #> [1] 1 2 3 #> #> $b #> [1] 4 5 6
This is different for R versions 4.1.3 and below, where you should get an identical data frame back.
To formally validate in a 'System-to-Nix' approach that the object returned from
as.vector.data.frame()
is before R
\< 4.2.0, we define a function that runs
the computation above.
df_as_vector <- function(x) { out <- as.vector(x = x, mode = "list") return(out) } (out_system_1 <- df_as_vector(x = df)) #> $a #> [1] 1 2 3 #> #> $b #> [1] 4 5 6
Then, we will evaluate this test code through a nix-shell
R session. This adds
both build-time and run-time purity with the declarative Nix software
configuration we have made earlier. with_nix()
leverages the following
principles under the hood:
Computing on the Language: Manipulating language objects using code.
Static Code Analysis: Detecting global objects and package environments in the function call stack of 'expr'. This involves utilizing essential functionality from the 'codetools' package, which is recursively iterated.
Serialization of Dependent R objects: Saving them to disk and
deserializing them back into the R session's RAM via a temporary folder.
This process establishes isolation between two distinct computational
environments, accommodating both 'System-to-Nix' and 'Nix-to-Nix'
computational modes. Simultaneously, it facilitates the transfer of input
arguments, dependencies across the call stack, and outputs of expr
between
the Nix-R and the system's R sessions.
This approach guarantees reproducible side effects and effectively streams
messages and errors into the R session. Thereby, the {sys} package facilitates
capturing standard outputs and errors as text output messages. Please be aware
that with_nix()
will invoke nix-shell
, which will itself run nix-build
in
case the Nix derivation (package) for R version 4.1.3 is not yet in your Nix
store. This will take a bit of time to get the cache. You will see in your
current R console the specific Nix paths that will be downloaded and copied into
your Nix store automatically.
# now run it in `nix-shell`; `with_nix()` takes care # of exporting global objects of `df_as_vector` recursively out_nix_1 <- with_nix( expr = function() df_as_vector(x = df), # wrap to avoid evaluation program = "R", project_path = path_env_1, message_type = "simple" # you can do `"verbose"`, too ) # compare results of custom codebase with indentical # inputs and different software environments identical(out_system_1, out_nix_1) # should return `FALSE` if your system's R versions in # current interactive R session is R >= 4.2.0
expr
argument of with_nix()
In the previous code snippet we wrapped the top-level expr
function with
function()
or function(){}
. As an alternative, you can also provide default
arguments when assigning the function used as expr
input like this:
df_as_vector <- function(x = df) { out <- as.vector(x = x, mode = "list") return(out) }
Then, you just supply the name of the function to evaluate with default arguments.
out_nix_1_b <- with_nix( expr = df_as_vector, # provide name of function program = "R", project_path = path_env_1, message_type = "simple" # you can do `"verbose"`, too )
It yields the same results.
Reduce(f = identical, list(out_nix_1, out_nix_1_b))
as.vector.data.frame()
for both R versions 4.1.3 and 4.2.0 from NixpkgsHere follows an example a Nix-to-Nix
solution, with two subshells to track the
evolution of base R in this specific case. We can verify the breaking changes in
case study 1 in more declarative manner when we use both R 4.1.3 and R 4.2.0
from Nixpkgs. Since we already have defined R 4.1.3 in the env
_1_R-4-1-3
subshell, we can use it as a source environment where with_nix() is launched
from. Accordingly, we define the R 4.2.0 environment in a
env
_1_2_R-4-2-0
using Nix via rix::rix()
. The latter environment will be
the target environment where df_as_vector()
will be evaluated in.
library("rix") path_env_1_2 <- file.path(".", "_env_1_2_R-4-2-0") rix( r_ver = "4.2.0", overwrite = TRUE, project_path = path_env_1_2, shell_hook = "R" ) list.files(path_env_1_2)
"default.nix"
Now, initiate a new R session as development environment using nix-shell
. Open
a new terminal at the current working directory of your R session. The provided
expression default.nix
. defines R 4.1.3 in a "subfolder per subshell"
approach. nix-shell
will use the expression by default.nix
and prefer it
over any other .nix
files, except when you put a shell.nix
file in that
folder, which takes precedence.
```{sh parsermd-chunk-12, eval = FALSE} nix-shell --pure ./_env_1_R-4-1-3
After some time downloading caches and doing builds, you will enter an R console session with R 4.1.3. You did not need to type in R first, because we set up a R shell hook via `rix::rix()`. Next, we define again the target function to test in R 4.2.0, too. ```r # current Nix-R session with R 4.1.3 df_as_vector <- function(x) { out <- as.vector(x = x, mode = "list") return(out) } (out_nix_1 <- df_as_vector(x = df))
out_nix_1_2 <- with_nix( expr = function() df_as_vector(x = df), program = "R", project_path = path_env_1_2, message_type = "simple" # you can do `"verbose"`, too )
You can now formally compare the outputs of the computation of the same code in R 4.1.3 vs. R 4.2.0 environments controlled by Nix.
identical(out_nix_1, out_nix_1_2) # yields FALSE
We add one more layer to the reproducibility of the R ecosystem. User libraries from CRAN or GitHub, one thing that makes R shine is the huge collection of software packages available from the community.
There was a change introduce in {stringr} 1.5.0; in earlier versions, this line of code:
stringr::str_subset(c("", "a"), "")
would return the character "a"
. However, this behaviour is unexpected:
it really should return an error. This was addressed in versions after
1.5.0:
out_system_stringr <- tryCatch( expr = { stringr::str_subset(c("", "a"), "") }, error = function(e) NULL )
Since the code returns an error, we wrap it inside tryCatch()
and return
NULL
instead of an error (if we wouldn't do that, this vignette could not
compile!).
Let's build a subshell with the latest version of R, but an older version
of {stringr}
:
library("rix") path_env_stringr <- file.path(".", "_env_stringr_1.4.1") rix( r_ver = "4.3.1", r_pkgs = "stringr@1.4.1", overwrite = TRUE, project_path = path_env_stringr )
We can now run the code in the subshell
out_nix_stringr <- with_nix( expr = function() stringr::str_subset(c("", "a"), ""), program = "R", project_path = path_env_stringr, message_type = "simple" )
Here are the last few lines printed on screen:
==> `expr` succeeded! ### Finished code evaluation in `nix-shell` ### * Evaluating `expr` in `nix-shell` returns: [1] "a"
Not only do we see the result of evaluating the code in the subshell, we also
have access to it: out_nix_stringr
holds this result.
We can now compare the two: the result of the code running in our main session
with the latest version of {stringr}
and the result of the code running in the
subshell with the old version of {stringr}
:
identical(out_system_stringr, out_nix_stringr)
As expected, the result is FALSE
.
Nix subshells are quite useful in cases where you need to use a package that
might be difficult to install, such as {arrow}
, or other packages that must be
compiled. Depending on your operating system you need to compile {arrow}
from
source, which can be a frustrating experience, especially if you only need it to
load data and bring it down to a manageable size (using select()
and
filter()
for instance). This use cases illustrates how to achieve this.
Let's start by building a subshell that is based on a distinct revision of
nixpkgs
, for which we know that arrow compiles on both linux and macOS
(darwin).
library("rix") path_env_arrow <- file.path("env_arrow") rix( r_ver = "4.1.1", r_pkgs = c("dplyr", "arrow"), overwrite = TRUE, project_path = path_env_arrow )
This specific revision of R contains {arrow}
13. Let's now suppose that you
already have a script with some code to load and transform some data using
{arrow}
. It may look something like this:
library(arrow) library(dplyr) arrow_cars <- arrow_table(cars) arrow_cars %>% filter(speed > 10) %>% as.data.frame()
To run this code in a subshell, we recommend wrapping it inside a function:
arrow_script <- function() { library(arrow) library(dplyr) arrow_cars <- arrow_table(cars) arrow_cars %>% filter(speed > 10) %>% as.data.frame() }
Which we can then run in the subshell:
out_nix_arrow <- with_nix( expr = function() arrow_script(), program = "R", project_path = path_env_arrow, message_type = "simple" )
This will run the function in the subshell, and its output will be saved in the
out_nix_arrow
variable, for further manipulation in your main shell/session.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.