fit_models: Fit a stan model to multiple datasets

Description Usage Arguments Value Examples

View source: R/fit_models.R

Description

fit_models() fits a stan model across multiple datasets, collates, and returns summary information and data for all fitted models as a stansim_simulation object. All fitted models have basic reproducibility information recorded; such as parameter inits and seeds, along with parameter estimates, and simulation information such as time and date ran.

Raw stan posterior samples are not returned, rather the user specifies the estimates they wish to record (e.g. posterior percentiles, Rhat, etc.) and the parameters for which they wish to record these estimates. All data is collated into a single, tidy dataframe for further analysis.

By default the function caches completed runs as it progresses, so that progress is not lost in the case of function failure. By simply running the function with the same calls in the same working directory it will pick up where it left off. When the function terminates as expected this cache is removed.

Usage

1
2
3
4
5
6
fit_models(sim_name = paste0("Stansim_", Sys.time()), sim_data = NULL,
  stan_args = list(), calc_loo = FALSE, use_cores = 1L,
  parameters = "all", probs = c(0.025, 0.25, 0.5, 0.75, 0.975),
  estimates = c("mean", "se_mean", "sd", "n_eff", "Rhat"),
  stan_warnings = "catch", cache = TRUE, seed = floor(stats::runif(1, 1,
  1e+05)))

Arguments

sim_name

A name attached to the stansim_simulation object to help identify it. It is strongly recommended that an informative name is assigned, especially if stansim_simulation objects are to be combined in to a stansim_collection object for management of results.

sim_data

Either an object of class stansim_data or a vector of strings pointing to the location of .rds files containing the simulation data. See the vignette on producing simulation data for details on the formatting of these datasets.

stan_args

A list of function arguments to be used by the internal rstan::sampling() function when fitting the models. If not specified then the rstan::sampling() function defaults are used.

calc_loo

If TRUE then model fit statistics will be calculated using the loo package. If TRUE there must be a valid log_lik quantity specified in the generated quantities section of the provided stan model.

use_cores

Number of cores to use when running in parallel. Each stan model is fitted serially regardless of the number of chains ran, as parallelisation across models is more flexible than within.

parameters

A character vector indicating which parameters should have estimates returned and stored from the fitted models. By default all parameters are returned, for non-scalar parameters you cannot select subsets of the parameter (e.g. must request theta rather than theta[1]).

probs

A numeric vector of values between 0 and 1. Corresponding quantiles will be estimated and returned for all fitted models.

estimates

A character vector of non-quantile estimates to be returned for each model parameter. Argument must be some subset of the default character vector.

stan_warnings

How warnings returned by individual stan instances should be handled. "catch" records all warnings in the returned object alongside other instance level data, "print" simply prints warnings to the console as the models are fit (default stan behaviour), and "suppress" suppresses all warnings without recording them.

cache

If TRUE then the results for each instance are written to a local, temporary file so that data is not lost should the function not terminate properly. This temporary data is removed upon the model terminating as expected. If FALSE no data is written and results are only returned upon the correct termination of the whole function. The default value of TRUE is recommended unless there are relevant write-permission restrictions.

seed

Set a seed for the function.

Value

An S3 object of class stansim_simulation recording relevant simulation data.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Not run: 
# specify arguments for stan
StanArgs <- list(file = '8schools.stan',
                 iter = 1000, chains = 4)

# get number of cores
core_num <- parallel::detectCores()

# get the list of data file locations
datasets <- dir("data/repo", full.names = TRUE)

# fit the model to all datasets using specified stan arguments
# store the specified estimates for all parameters
simulation <- fit_models(
  sim_name = "stansim simulation",
  sim_data = datasets,
  stan_args = StanArgs,
  calc_loo = T,
  use_cores = core_num,
  probs =  c(.025, .5, .975),
  estimates = c("mean", "n_eff", "Rhat")
)

## End(Not run)

rstansim documentation built on Sept. 22, 2017, 1:06 a.m.