This vignette is intended to be read after the Getting started with CmdStanR vignette. Please read that first for important background. In this document we provide additional details about compiling models, passing in data, and how CmdStan output is saved and read back into R.
We will only use the $sample()
method in examples, but all model fitting
methods work in a similar way under the hood.
library(cmdstanr) check_cmdstan_toolchain(fix = TRUE, quiet = TRUE)
The cmdstan_model()
function creates a new CmdStanModel
object. The
CmdStanModel
object stores the path to a Stan program as well as the
path to a compiled executable.
exe <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli") unlink(exe)
stan_file <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan") mod <- cmdstan_model(stan_file) mod$print() mod$stan_file() mod$exe_file()
Subsequently, if you create a CmdStanModel
object from the same Stan file
then compilation will be skipped (assuming the file hasn't changed).
mod <- cmdstan_model(stan_file)
Internally, cmdstan_model()
first creates the CmdStanModel
object from
just the Stan file and then calls its $compile()
method. Optional arguments to the $compile()
method can be passed via ...
.
mod <- cmdstan_model( stan_file, force_recompile = TRUE, include_paths = "paths/to/directories/with/included/files", cpp_options = list(stan_threads = TRUE, STANC2 = TRUE) )
It is also possible to delay compilation when creating the CmdStanModel
object
by specifying compile=FALSE
and then later calling the $compile()
method
directly.
unlink(mod$exe_file()) mod <- cmdstan_model(stan_file, compile = FALSE) mod$exe_file() # not yet created mod$compile() mod$exe_file()
If you are using CmdStan version 2.24 or later and CmdStanR version 0.2.1 or later, you can run a pedantic check for your model. CmdStanR will always check that your Stan program does not contain any invalid syntax but with pedantic mode enabled the check will also warn you about other potential issues in your model, for example:
~
symbols).For the latest information on the checks performed in pedantic mode see the Pedantic mode chapter in the Stan Reference Manual.
Pedantic mode is available when compiling the model or when using the separate
$check_syntax()
method of a CmdStanModel
object. Internally this corresponds
to setting the stanc
(Stan transpiler) option warn-pedantic
. Here we
demonstrate pedantic mode with a Stan program that is syntactically correct but
is missing a lower bound and a prior for a parameter.
stan_file_pedantic <- write_stan_file(" data { int N; array[N] int y; } parameters { // should have <lower=0> but omitting to demonstrate pedantic mode real lambda; } model { y ~ poisson(lambda); } ")
To turn on pedantic mode at compile time you can set pedantic=TRUE
in
the call to cmdstan_model()
(or when calling the $compile()
method directly
if using the delayed compilation approach described above).
mod_pedantic <- cmdstan_model(stan_file_pedantic, pedantic = TRUE)
To turn on pedantic mode separately from compilation use the pedantic
argument
to the $check_syntax()
method.
mod_pedantic$check_syntax(pedantic = TRUE)
Using pedantic=TRUE
via the $check_syntax()
method also has the advantage
that it can be used even if the model hasn't been compiled yet. This can be
helpful because the pedantic and syntax checks themselves are much faster than
compilation.
file.remove(mod_pedantic$exe_file()) # delete compiled executable rm(mod_pedantic) mod_pedantic <- cmdstan_model(stan_file_pedantic, compile = FALSE) mod_pedantic$check_syntax(pedantic = TRUE)
If using CmdStan 2.27 or newer, you can obtain the names, types
and dimensions of the data, parameters, transformed parameters
and generated quantities variables of a Stan model using the
$variables()
method of the CmdStanModel
object.
stan_file_variables <- write_stan_file(" data { int<lower=1> J; vector<lower=0>[J] sigma; vector[J] y; } parameters { real mu; real<lower=0> tau; vector[J] theta_raw; } transformed parameters { vector[J] theta = mu + tau * theta_raw; } model { target += normal_lpdf(tau | 0, 10); target += normal_lpdf(mu | 0, 10); target += normal_lpdf(theta_raw | 0, 1); target += normal_lpdf(y | theta, sigma); } ") mod_v <- cmdstan_model(stan_file_variables) variables <- mod_v$variables()
The $variables()
method returns a list with data
, parameters
,
transformed_parameters
and generated_quantities
elements, each
corresponding to variables in their respective block of the program. Transformed
data variables are not listed as they are not used in the model's input
or output.
names(variables) names(variables$data) names(variables$parameters) names(variables$transformed_parameters) names(variables$generated_quantities)
Each variable is represented as a list containing the type
information (currently limited to real
or int
) and the number of dimensions.
variables$data$J variables$data$sigma variables$parameters$tau variables$transformed_parameters$theta
By default, the executable is created in the same directory as the file
containing the Stan program. You can also specify a different location with the
dir
argument.
mod <- cmdstan_model(stan_file, dir = "path/to/directory/for/executable")
There are three data formats that CmdStanR allows when fitting a model:
Like the RStan interface, CmdStanR accepts a named list of R objects where the
names correspond to variables declared in the data block of the Stan program.
In the Bernoulli model the data is N
, the number of data points, and y
an integer array of observations.
mod$print()
# data block has 'N' and 'y' data_list <- list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1)) fit <- mod$sample(data = data_list)
Because CmdStan doesn't accept lists of R objects, CmdStanR will first write the
data to a temporary JSON file using write_stan_json()
. This happens
internally, but it is also possible to call write_stan_json()
directly.
data_list <- list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1)) json_file <- tempfile(fileext = ".json") write_stan_json(data_list, json_file) cat(readLines(json_file), sep = "\n")
If you already have your data in a JSON file you can just pass that file
directly to CmdStanR instead of using a list of R objects. For example, we could
pass in the JSON file we created above using write_stan_json()
:
fit <- mod$sample(data = json_file)
Finally, it is also possible to use the R dump file format. This is not
recommended because CmdStan can process JSON faster than R dump, but CmdStanR
allows it because CmdStan will accept files created by rstan::stan_rdump()
:
rdump_file <- tempfile(fileext = ".data.R") rstan::stan_rdump(names(data_list), file = rdump_file, envir = list2env(data_list)) cat(readLines(rdump_file), sep = "\n") fit <- mod$sample(data = rdump_file)
data_list <- list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1)) fit <- mod$sample(data = data_list)
When fitting a model, the default behavior is to write the output from CmdStan to CSV files in a temporary directory.
fit$output_files()
These files will be lost if you end your R session or if you remove the
fit
object and force (or wait for) garbage collection.
files <- fit$output_files() file.exists(files) rm(fit) gc() file.exists(files)
To save these files to a non-temporary location there are two options. You
can either specify the output_dir
argument to mod$sample()
or use
fit$save_output_files()
after fitting the model.
# see ?save_output_files for info on optional arguments fit$save_output_files(dir = "path/to/directory")
fit <- mod$sample( data = data_list, output_dir = "path/to/directory" )
With the exception of some diagnostic information, the CSV files are not read
into R until their contents are requested by calling a method that requires them
(e.g., fit$draws()
, fit$summary()
, etc.). If we examine the structure of the
fit
object, notice how the Private
slot draws_
is NULL
, indicating that
the CSV files haven't yet been read into R.
fit <- mod$sample(data = data_list)
str(fit)
After we call a method that requires the draws then if we reexamine the
structure of the object we will see that the draws_
slot in Private
is no longer empty.
draws <- fit$draws() # force CSVs to be read into R str(fit)
For models with many parameters, transformed parameters, or generated
quantities, if only some are requested (e.g., by specifying the variables
argument to fit$draws()
) then CmdStanR will only read in the requested
variables (unless they have already been read in).
Internally, the read_cmdstan_csv()
function is used to read the CmdStan CSV
files into R. This function is exposed to users, so you can also call it
directly.
# see ?read_cmdstan_csv for info on optional arguments controlling # what information is read in csv_contents <- read_cmdstan_csv(fit$output_files()) str(csv_contents)
If you need to manually create fitted model objects from CmdStan CSV files use
as_cmdstan_fit()
.
fit2 <- as_cmdstan_fit(fit$output_files())
This is pointless in our case since we have the original fit
object, but this
function can be used to create fitted model objects (CmdStanMCMC
,
CmdStanMLE
, etc.) from any CmdStan CSV files.
If save_latent_dynamics
is set to TRUE
when running the $sample()
method
then additional CSV files are created (one per chain) that provide access to
quantities used under the hood by Stan's implementation of dynamic Hamiltonian
Monte Carlo.
CmdStanR does not yet provide a special method for processing these files but they can be read into R using R's standard CSV reading functions.
fit <- mod$sample(data = data_list, save_latent_dynamics = TRUE)
fit$latent_dynamics_files() # read one of the files in x <- utils::read.csv(fit$latent_dynamics_files()[1], comment.char = "#") head(x)
The column lp__
is also provided via fit$draws()
, and the columns
accept_stat__
, stepsize__
, treedepth__
, n_leapfrog__
, divergent__
, and
energy__
are also provided by fit$sampler_diagnostics()
, but there are
several columns unique to the latent dynamics file.
head(x[, c("theta", "p_theta", "g_theta")])
Our model has a single parameter theta
and the three columns above correspond
to theta
in the unconstrained space (theta
on the constrained space is
accessed via fit$draws()
), the auxiliary momentum p_theta
, and the gradient
g_theta
. In general, each of these three columns will exist for every
parameter in the model.
CmdStanR can of course be used for developing other packages that require compiling and running Stan models as well as using new or custom Stan features available through CmdStan.
You may compile a Stan model at runtime (e.g. just before sampling), or you may compile all the models inside the package file system in advance at installation time. The latter avoids compilations at runtime, which matters in centrally managed R installations where users should not compile their own software.
To pre-compile all the models in a package,
you may create top-level scripts configure
and configure.win
which run cmdstan_model()
with compile = TRUE
and save the compiled executables
somewhere inside the inst/
folder of the package source.
The instantiate
package helps developers
configure packages this way,
and it documents other topics such as submitting to CRAN and administering CmdStan.
Kevin Ushey's configure
package helps
create and manage package configuration files in general.
When developing or testing new features it might be useful to have more
information on how CmdStan is called internally and to see more information
printed when compiling or running models. This can be enabled for an entire R
session by setting the option "cmdstanr_verbose"
to TRUE
.
options("cmdstanr_verbose"=TRUE) mod <- cmdstan_model(stan_file, force_recompile = TRUE) fit <- mod$sample( data = data_list, chains = 1, iter_warmup = 100, iter_sampling = 100 )
options("cmdstanr_verbose" = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.