knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette introduces the core functions required to build a {rixpress}
pipeline, but doesn't cover everything yet. It also
assumes that you've read vignette("intro-concepts"). In the next vignette
vignette("tutorial"), you'll learn how to set up a complete pipeline from
start to finish.
{rixpress} provides several functions to help you write derivations. These
functions typically start with the prefix rxp_ and follow a similar structure.
The first step in any pipeline is usually to import data. To include data in a
{rixpress} pipeline, use rxp_r_file():
d0 <- rxp_r_file( name = mtcars, path = 'data/mtcars.csv', read_function = \(x) (read.csv(file = x, sep = "|")) )
rxp_r_file()'s read_function argument requires an R function with a single
argument: the path to the file to be read. In this example, we assume the
columns in the mtcars.csv file are separated by the | symbol. We use an
anonymous function to set the correct separator and create a temporary function
with a single argument to read the file at 'data/mtcars.csv'.
Important: This approach means that the mtcars.csv file will be copied to
the Nix store. This is essential to how Nix works.
Note that rxp_r_file() is quite flexible: it works with any function that
reads a file, regardless of the file type. The path to the file can also be a
URL. See the vignette("importing-data") for more details.
Once the data is imported, we can start manipulating it. To generate a
derivation similar to the one described in vignette("intro-concepts"), but
using R and {dplyr} instead of awk, we would write:
d1 <- rxp_r( name = filtered_mtcars, expr = dplyr::filter(mtcars, am == 1) )
This syntax should be familiar to users of the {targets} package: similar to
the tar_target() function, you simply provide a name for the derivation and
the expression to generate it. That's all you need to write for {rixpress} to
generate all the required Nix code automatically.
To continue transforming the data, you only need to define a new derivation:
d2 <- rxp_r( name = mtcars_mpg, expr = dplyr::select(filtered_mtcars, mpg) )
Notice how the name of d1 (filtered_mtcars) is used in d2: this is how
dependencies between derivations are defined.
Let's stop here and generate our pipeline. First, we need to define a list of derivations:
derivs <- list(d0, d1, d2)
and pass it to the rxp_populate() function:
rxp_populate(derivs)
To make the code more concise, you can directly define the list and pass it to
rxp_populate() using the pipe operator |>:
library(rixpress) list( rxp_r_file( name = mtcars, path = 'data/mtcars.csv', read_function = \(x) (read.csv(file = x, sep = "|")) ), rxp_r( name = filtered_mtcars, expr = dplyr::filter(mtcars, am == 1) ), rxp_r( name = mtcars_mpg, expr = dplyr::select(filtered_mtcars, mpg) ) ) |> rxp_populate()
Running rxp_populate() performs several actions:
_rixpress in the project's root directory. This
folder contains automatically generated files needed for the pipeline to build
successfully.pipeline.nix, which defines the entire pipeline in
the Nix language.build = TRUE, calls rxp_make() to build the pipeline.However, if you try to run the code above, it will likely fail. This is because a crucial piece is missing: the environment in which the pipeline must run!
Remember that the core purpose of using Nix is to ensure reproducibility by
forcing you to explicitly declare all dependencies. For our pipeline above, we
need to specify: Which version of R and which R packages should be used? The
pipeline uses filter() and select() from the {dplyr} package, so we must
declare these dependencies.
This is where the {rix} package comes in. {rix} allows you to define
reproducible development environments using simple R code. For example, we can
define an environment with R and {dplyr} like this:
library(rix) rix( date = "2025-04-11", r_pkgs = "dplyr", ide = "rstudio", project_path = ".", overwrite = TRUE )
Running this code generates a default.nix file that can be built using Nix
by calling nix-build. This creates a development environment containing
RStudio, R, and {dplyr} as they existed on April 11, 2025. You can use this
environment for interactive data analysis just as you would with a standard
installation of RStudio, R, and {dplyr}. To learn more about {rix}, visit
https://docs.ropensci.org/rix/.
The reproducible development environments generated by {rix} define all the
dependencies needed for your pipeline. To use this environment to build a
{rixpress} pipeline, you must also add {rixpress} to the list of packages in
the environment. Since {rixpress} is still under development, it must be
installed from GitHub. Here's how the complete environment setup script looks:
library(rix) # Define execution environment rix( date = "2025-04-11", r_pkgs = "dplyr", git_pkgs = list( package_name = "rixpress", repo_url = "https://github.com/ropensci/rixpress", commit = "HEAD" ), ide = "rstudio", project_path = ".", overwrite = TRUE )
In the next vignette, we'll learn how to use {rix} effectively to provide a
reproducible execution environment for our pipelines. For now, let's assume
that we've used the code above to generate our environment, which we can
also use for interactive data analysis.
We can go back to our pipeline to finalise it:
library(rixpress) # Define pipeline list( rxp_r_file( name = mtcars, path = 'data/mtcars.csv', read_function = \(x) (read.csv(file = x, sep = "|")) ), rxp_r( name = filtered_mtcars, expr = dplyr::filter(mtcars, am == 1) ), rxp_r( name = mtcars_mpg, expr = dplyr::select(filtered_mtcars, mpg) ) ) |> rxp_populate(project_path = ".")
I recommend always using two separate scripts:
gen-env.R: Uses {rix} to define the execution environmentgen-pipeline.R: Uses {rixpress} to define the reproducible analytical
pipelineYou can quickly create these scripts using the rxp_init() function, which
generates both files with starter code to help you get started quickly.
It's often helpful to visualise your pipeline as a DAG (directed acyclic graph).
By default, the build argument of rxp_populate() is FALSE, so calling
this will not build the pipeline:
rxp_populate(derivs)
This won't build the pipeline but will generate useful files, including a JSON
representation of the pipeline at _rixpress/dag.json. This process is quick
and allows you to visualise the graph using rxp_visnetwork(), which opens a
new tab in your web browser displaying the pipeline's DAG, generated using the
{visNetwork} package:
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.