knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette details how to effectively use the {cmdstanr} package within a
{rixpress} pipeline for Bayesian statistical modelling with Stan. For a general
introduction to {rixpress} and its core concepts, please refer to
vignette("intro-concepts") and vignette("core-functions").
{cmdstanr} provides a user-friendly R interface to cmdstan, Stan's
command-line interface. While powerful, its reliance on external processes and
file system interactions requires careful handling within the hermetic build
environment of {rixpress}.
As with any {rixpress} pipeline, the first step is to define the execution
environment using {rix}:
library(rix) rix( date = "2025-04-29", r_pkgs = c("readr", "dplyr", "ggplot2"), # Add other R packages as needed system_pkgs = "cmdstan", # Crucial: include cmdstan as a system dependency git_pkgs = list( list( package_name = "cmdstanr", repo_url = "https://github.com/stan-dev/cmdstanr", commit = "79d37792d8e4ffcf3cf721b8d7ee4316a1234b0c" # Pin to a specific commit ), list( package_name = "rixpress", repo_url = "https://github.com/ropensci/rixpress", commit = "HEAD" # Or pin to a specific commit ) ), ide = "none", # Or your preferred IDE project_path = ".", overwrite = TRUE )
Key points in this environment definition:
cmdstan is included in system_pkgs. This makes the cmdstan executables
available to the pipeline.{cmdstanr} is installed from its GitHub repository, as it's not available
on CRAN. Pinning to a specific commit is recommended for maximum
reproducibility.With the environment set up, we can define the pipeline:
The Stan model code itself should reside in a .stan file. We use
rxp_r_file() to bring its contents into the pipeline as a character string.
rxp_r_file( bayesian_linear_regression_model, "model.stan", readLines )
Next, we define parameters and simulate some data for our model.
rxp_r( parameters, list( N = 100, alpha = 2, beta = -0.5, sigma = 1.e-1 ) ), rxp_r( x, rnorm(parameters$N, 0, 1) ), rxp_r( y, rnorm( n = parameters$N, mean = parameters$alpha + parameters$beta * x, sd = parameters$sigma ) ), rxp_r( # Prepare the data list for cmdstanr inputs, list(N = parameters$N, x = x, y = y) ),
Interfacing with cmdstan from within {rixpress} requires a specific strategy
due to the hermetic nature of Nix sandboxes. We'll use a wrapper function to
handle model compilation and sampling within a single rxp_r() step.
First, let's define the wrapper function (e.g., in a functions.R file that
we'll include):
# In functions.R cmdstan_model_wrapper <- function( stan_string = NULL, # The Stan model code as a character string inputs, # Data list for the model seed, # Seed for reproducibility ... # Additional arguments for cmdstan_model or sample ) { # Create a temporary .stan file within the sandbox stan_file <- tempfile(pattern = "model_", fileext = ".stan") writeLines(stan_string, con = stan_file) # Compile the Stan model # cmdstanr will find cmdstan via the CMDSTAN environment variable model <- cmdstanr::cmdstan_model( stan_file = stan_file, ... ) # Sample from the posterior fitted_model <- model$sample( data = inputs, seed = seed, ... ) return(fitted_model) }
Now, we use this wrapper in our pipeline:
# ... (continuation of pipeline_steps list) rxp_r( model, # Target name for the fitted model object cmdstan_model_wrapper( stan_string = bayesian_linear_regression_model, inputs = inputs, seed = 22 ), user_functions = "functions.R", encoder = "save_model", env_var = c("CMDSTAN" = "${defaultPkgs.cmdstan}/opt/cmdstan") )
Explanation of the Wrapper Approach:
stan_string = bayesian_linear_regression_model: We pass the model code
(read by rxp_r_file) as a string to our wrapper.writeLines(stan_string, con = stan_file): Inside the wrapper, the Stan
code is written to a temporary .stan file. This file exists within the
sandbox of the current rxp_r step. This is crucial because
cmdstan_model needs a file path. Attempting to pass the original
model.stan path directly via additional_files to cmdstan_model can
lead to permission or path issues when cmdstan tries to compile it from a
different working directory or context.cmdstanr::cmdstan_model(): Compiles the model from the temporary
stan_file.model$sample(): Samples from the compiled model.rxp_r step (and thus the same sandbox). This is because the model object
returned by cmdstan_model() contains paths to the compiled executable. If
these were separate steps, the paths from the compilation sandbox wouldn't
be valid in the sampling sandbox.env_var = c("CMDSTAN" = "${defaultPkgs.cmdstan}/opt/cmdstan"): This
sets the CMDSTAN environment variable within the sandbox for this specific
step. {cmdstanr} uses this variable to locate the cmdstan installation.
The ${defaultPkgs.cmdstan} is a Nix interpolation that resolves to the
path of the cmdstan package in the Nix store. If the environment providing
cmdstan were named differently, for example cmdstan-env.nix, then you would
need to use ${cmdstan_envPkgs.cmdstan}.{cmdstanr} provides a specific method for saving fitted model objects to
ensure all necessary components are preserved. We define a simple wrapper for
this to use with {rixpress}.
save_model <- function(fitted_model, path, ...) { fitted_model$save_object(file = path, ...) }
By specifying encoder = "save_model" in the rxp_r() call,
{rixpress} will use this function instead of the default saveRDS(). The fitted
model can then be read using rxp_read("model"), which will internally use
readRDS().
Using {cmdstanr} with {rixpress} involves these key considerations:
Include cmdstan in system_pkgs and {cmdstanr} (from Git) in your
{rix} environment definition.
Read your .stan file into the pipeline using rxp_r_file().
Implement a wrapper function that:
.stan file
inside the wrapper.cmdstanr::cmdstan_model() on this temporary file.model$sample() to fit the model.Returns the fitted model object.
Perform model compilation and sampling within the same rxp_r() call using
the wrapper.
Set the CMDSTAN environment variable for the rxp_r() step that runs the
wrapper, pointing to the Nix store path of cmdstan.
Use {cmdstanr}'s $save_object() method via a custom encoder
for robust saving of the fitted model.
This approach ensures that cmdstan can operate correctly within the isolated
and reproducible environment provided by {rixpress} and Nix.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.