library(rstan) knitr::opts_chunk$set( echo = TRUE, error = TRUE, comment = NA, fig.align = "center", fig.height = 5, fig.width = 7 )
This vignette demonstrates how to access most of data stored in a stanfit
object. A stanfit object (an object of class "stanfit"
) contains the output
derived from fitting a Stan model using Markov chain Monte Carlo or one of
Stan's variational approximations (meanfield or full-rank). Throughout the
document we'll use the stanfit object obtained from fitting the Eight Schools
example model:
library(rstan) fit <- stan_demo("eight_schools", refresh = 0)
J <- 8 y <- c(28, 8, -3, 7, -1, 1, 18, 12) sigma <- c(15, 10, 16, 11, 9, 11, 10, 18) fit <- stan( file= "schools.stan", model_name = "eight_schools", data = c("y", "J", "sigma"), refresh = 0 )
class(fit)
There are several functions that can be used to access the draws from the
posterior distribution stored in a stanfit object. These are extract
,
as.matrix
, as.data.frame
, and as.array
, each of which returns the draws in
a different format.
The extract
function (with its default arguments) returns a list with named
components corresponding to the model parameters.
list_of_draws <- extract(fit) print(names(list_of_draws))
In this model the parameters mu
and tau
are scalars and theta
is a vector
with eight elements. This means that the draws for mu
and tau
will be
vectors (with length equal to the number of post-warmup iterations times the
number of chains) and the draws for theta
will be a matrix, with each column
corresponding to one of the eight components:
head(list_of_draws$mu) head(list_of_draws$tau) head(list_of_draws$theta)
The as.matrix
, as.data.frame
, and as.array
functions can also be used
to retrieve the posterior draws from a stanfit object:
matrix_of_draws <- as.matrix(fit) print(colnames(matrix_of_draws)) df_of_draws <- as.data.frame(fit) print(colnames(df_of_draws)) array_of_draws <- as.array(fit) print(dimnames(array_of_draws))
The as.matrix
and as.data.frame
methods essentially return the same
thing except in matrix and data frame form, respectively. The as.array
method returns the draws from each chain separately and so has an additional
dimension:
print(dim(matrix_of_draws)) print(dim(df_of_draws)) print(dim(array_of_draws))
By default all of the functions for retrieving the posterior draws return the
draws for all parameters (and generated quantities). The optional argument
pars
(a character vector) can be used if only a subset of the parameters is
desired, for example:
mu_and_theta1 <- as.matrix(fit, pars = c("mu", "theta[1]")) head(mu_and_theta1)
Summary statistics are obtained using the summary
function. The object
returned is a list with two components:
fit_summary <- summary(fit) print(names(fit_summary))
In fit_summary$summary
all chains are merged whereas fit_summary$c_summary
contains summaries for each chain individually. Typically we want the
summary for all chains merged, which is what we'll focus on here.
The summary is a matrix with rows corresponding to parameters and columns to the
various summary quantities. These include the posterior mean, the posterior
standard deviation, and various quantiles computed from the draws. The probs
argument can be used to specify which quantiles to compute and pars
can be
used to specify a subset of parameters to include in the summary.
For models fit using MCMC, also included in the summary are the Monte Carlo
standard error (se_mean
), the effective sample size (n_eff
), and the R-hat
statistic (Rhat
).
print(fit_summary$summary)
If, for example, we wanted the only quantiles included to be 10% and 90%, and for only the parameters included to be mu
and tau
, we would specify that like this:
mu_tau_summary <- summary(fit, pars = c("mu", "tau"), probs = c(0.1, 0.9))$summary print(mu_tau_summary)
Since mu_tau_summary
is a matrix we can pull out columns using their names:
mu_tau_80pct <- mu_tau_summary[, c("10%", "90%")] print(mu_tau_80pct)
For models fit using MCMC the stanfit object will also contain the values of
parameters used for the sampler. The get_sampler_params
function can
be used to access this information.
The object returned by get_sampler_params
is a list with one component (a
matrix) per chain. Each of the matrices has number of columns corresponding to
the number of sampler parameters and the column names provide the parameter
names. The optional argument inc_warmup (defaulting to TRUE
) indicates whether
to include the warmup period.
sampler_params <- get_sampler_params(fit, inc_warmup = FALSE) sampler_params_chain1 <- sampler_params[[1]] colnames(sampler_params_chain1)
To do things like calculate the average value of accept_stat__
for each chain
(or the maximum value of treedepth__
for each chain if using the NUTS
algorithm, etc.) the sapply
function is useful as it will apply the
same function to each component of sampler_params
:
mean_accept_stat_by_chain <- sapply(sampler_params, function(x) mean(x[, "accept_stat__"])) print(mean_accept_stat_by_chain) max_treedepth_by_chain <- sapply(sampler_params, function(x) max(x[, "treedepth__"])) print(max_treedepth_by_chain)
The Stan program itself is also stored in the stanfit object and can be
accessed using get_stancode
:
code <- get_stancode(fit)
The object code
is a single string and is not very intelligible when printed:
print(code)
A readable version can be printed using cat
:
cat(code)
The get_inits
function returns initial values as a list with one component per
chain. Each component is itself a (named) list containing the initial values for each parameter for the corresponding chain:
inits <- get_inits(fit) inits_chain1 <- inits[[1]] print(inits_chain1)
The get_seed
function returns the (P)RNG seed as an integer:
print(get_seed(fit))
The get_elapsed_time
function returns a matrix with the warmup and sampling
times for each chain:
print(get_elapsed_time(fit))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.