PSSample: Sample with 'Stan'
In PStrata: Principal Stratification Analysis in R

PSSample

R Documentation

Sample with `Stan`

Description

Sample from the posterior distribution by calling stan. Check stan for details of the arguments.

Usage

PSSample(
  file,
  model_name = "anon_model",
  model_code = "",
  fit = NA,
  data = list(),
  pars = NA,
  chains = 4,
  iter = 2000,
  warmup = floor(iter/2),
  thin = 1,
  init = "random",
  seed = sample.int(.Machine$integer.max, 1),
  algorithm = c("NUTS", "HMC", "Fixed_param"),
  control = NULL,
  sample_file = NULL,
  diagnostic_file = NULL,
  save_dso = TRUE,
  verbose = FALSE,
  include = TRUE,
  cores = getOption("mc.cores", 1L),
  open_progress = interactive() && !isatty(stdout()) && !identical(Sys.getenv("RSTUDIO"),
    "1"),
  ...,
  boost_lib = NULL,
  eigen_lib = NULL
)

Arguments

`file`	The path to the Stan program to use. `file` should be a character string file name or a connection that R supports containing the text of a model specification in the Stan modeling language. A model may also be specified directly as a character string using the `model_code` argument, but we recommend always putting Stan programs in separate files with a `.stan` extension. The `stan` function can also use the Stan program from an existing `stanfit` object via the `fit` argument. When `fit` is specified, the `file` argument is ignored.
`model_name`	A string to use as the name of the model; defaults to `"anon_model"`. However, the model name will be derived from `file` or `model_code` (if `model_code` is the name of a character string object) if `model_name` is not specified. This is not a particularly important argument, although since it affects the name used in printed messages, developers of other packages that use rstan to fit models may want to use informative names.
`model_code`	A character string either containing the model definition or the name of a character string object in the workspace. This argument is used only if arguments `file` and `fit` are not specified.
`fit`	An instance of S4 class `stanfit` derived from a previous fit; defaults to `NA`. If `fit` is not `NA`, the compiled model associated with the fitted result is re-used; thus the time that would otherwise be spent recompiling the C++ code for the model can be saved.
`data`	A named `list` or `environment` providing the data for the model, or a character vector for all the names of objects to use as data. See the Passing data to Stan section below.
`pars`	A character vector specifying parameters of interest to be saved. The default is to save all parameters from the model. If `include = TRUE`, only samples for parameters named in `pars` are stored in the fitted results. Conversely, if `include = FALSE`, samples for all parameters except those named in `pars` are stored in the fitted results.
`chains`	A positive integer specifying the number of Markov chains. The default is 4.
`iter`	A positive integer specifying the number of iterations for each chain (including warmup). The default is 2000.
`warmup`	A positive integer specifying the number of warmup (aka burnin) iterations per chain. If step-size adaptation is on (which it is by default), this also controls the number of iterations for which adaptation is run (and hence these warmup samples should not be used for inference). The number of warmup iterations should be smaller than `iter` and the default is `iter/2`.
`thin`	A positive integer specifying the period for saving samples. The default is 1, which is usually the recommended value. Unless your posterior distribution takes up too much memory we do not recommend thinning as it throws away information. The tradition of thinning when running MCMC stems primarily from the use of samplers that require a large number of iterations to achieve the desired effective sample size. Because of the efficiency (effective samples per second) of Hamiltonian Monte Carlo, rarely should this be necessary when using Stan.
`init`	Specification of initial values for all or some parameters. Can be the digit `0`, the strings `"0"` or `"random"`, a function that returns a named list, or a list of named lists: `init="random"` (default): Let Stan generate random initial values for all parameters. The seed of the random number generator used by Stan can be specified via the `seed` argument. If the seed for Stan is fixed, the same initial values are used. The default is to randomly generate initial values between `-2` and `2` on the unconstrained support. The optional additional parameter `init_r` can be set to some value other than `2` to change the range of the randomly generated inits. `init="0", init=0`: Initialize all parameters to zero on the unconstrained support. inits via list: Set inital values by providing a list equal in length to the number of chains. The elements of this list should themselves be named lists, where each of these named lists has the name of a parameter and is used to specify the initial values for that parameter for the corresponding chain. inits via function: Set initial values by providing a function that returns a list for specifying the initial values of parameters for a chain. The function can take an optional parameter `chain_id` through which the `chain_id` (if specified) or the integers from 1 to `chains` will be supplied to the function for generating initial values. See the Examples section below for examples of defining such functions and using a list of lists for specifying initial values. When specifying initial values via a `list` or `function`, any parameters for which values are not specified will receive initial values generated as described in the `init="random"` description above.
`seed`	The seed for random number generation. The default is generated from 1 to the maximum integer supported by R on the machine. Even if multiple chains are used, only one seed is needed, with other chains having seeds derived from that of the first chain to avoid dependent samples. When a seed is specified by a number, `as.integer` will be applied to it. If `as.integer` produces `NA`, the seed is generated randomly. The seed can also be specified as a character string of digits, such as `"12345"`, which is converted to integer. Using R's `set.seed` function to set the seed for Stan will not work.
`algorithm`	One of the sampling algorithms that are implemented in Stan. The default and preferred algorithm is `"NUTS"`, which is the No-U-Turn sampler variant of Hamiltonian Monte Carlo (Hoffman and Gelman 2011, Betancourt 2017). Currently the other options are `"HMC"` (Hamiltonian Monte Carlo), and `"Fixed_param"`. When `"Fixed_param"` is used no MCMC sampling is performed (e.g., for simulating with in the generated quantities block).
`control`	A named `list` of parameters to control the sampler's behavior. It defaults to `NULL` so all the default values are used. First, the following are adaptation parameters for sampling algorithms. These are parameters used in Stan with similar names here. `adapt_engaged` (`logical`) `adapt_gamma` (`double`, positive, defaults to 0.05) `adapt_delta` (`double`, between 0 and 1, defaults to 0.8) `adapt_kappa` (`double`, positive, defaults to 0.75) `adapt_t0` (`double`, positive, defaults to 10) `adapt_init_buffer` (`integer`, positive, defaults to 75) `adapt_term_buffer` (`integer`, positive, defaults to 50) `adapt_window` (`integer`, positive, defaults to 25) In addition, algorithm HMC (called 'static HMC' in Stan) and NUTS share the following parameters: `stepsize` (`double`, positive, defaults to 1) Note: this controls the initial stepsize only, unless `adapt_engaged=FALSE`. `stepsize_jitter` (`double`, [0,1], defaults to 0) `metric` (`string`, one of "unit_e", "diag_e", "dense_e", defaults to "diag_e") For algorithm NUTS, we can also set: `max_treedepth` (`integer`, positive, defaults to 10) For algorithm HMC, we can also set: `int_time` (`double`, positive) For `test_grad` mode, the following parameters can be set: `epsilon` (`double`, defaults to 1e-6) `error` (`double`, defaults to 1e-6)
`sample_file`	An optional character string providing the name of a file. If specified the draws for all parameters and other saved quantities will be written to the file. If not provided, files are not created. When the folder specified is not writable, `tempdir()` is used. When there are multiple chains, an underscore and chain number are appended to the file name.
`diagnostic_file`	An optional character string providing the name of a file. If specified the diagnostics data for all parameters will be written to the file. If not provided, files are not created. When the folder specified is not writable, `tempdir()` is used. When there are multiple chains, an underscore and chain number are appended to the file name.
`save_dso`	Logical, with default `TRUE`, indicating whether the dynamic shared object (DSO) compiled from the C++ code for the model will be saved or not. If `TRUE`, we can draw samples from the same model in another R session using the saved DSO (i.e., without compiling the C++ code again). This parameter only takes effect if `fit` is not used; with `fit` defined, the DSO from the previous run is used. When `save_dso=TRUE`, the fitted object can be loaded from what is saved previously and used for sampling, if the compiling is done on the same platform, that is, same operating system and same architecture (32bits or 64bits).
`verbose`	`TRUE` or `FALSE`: flag indicating whether to print intermediate output from Stan on the console, which might be helpful for model debugging.
`include`	Logical scalar defaulting to `TRUE` indicating whether to include or exclude the parameters given by the `pars` argument. If `FALSE`, only entire multidimensional parameters can be excluded, rather than particular elements of them.
`cores`	The number of cores to use when executing the Markov chains in parallel. The default is to use the value of the `"mc.cores"` option if it has been set and otherwise to default to 1 core. However, we recommend setting it to be as many processors as the hardware and RAM allow (up to the number of chains). See `detectCores` if you don't know this number for your system.
`open_progress`	Logical scalar that only takes effect if `cores > 1` but is recommended to be `TRUE` in interactive use so that the progress of the chains will be redirected to a file that is automatically opened for inspection. For very short runs, the user might prefer `FALSE`.
`...`	Other optional parameters: `chain_id` (`integer`) `init_r` (`double`, positive) `test_grad` (`logical`) `append_samples` (`logical`) `refresh`(`integer`) `save_warmup`(`logical`) deprecated: `enable_random_init`(`logical`) `chain_id` can be a vector to specify the chain_id for all chains or an integer. For the former case, they should be unique. For the latter, the sequence of integers starting from the given `chain_id` are used for all chains. `init_r` is used only for generating random initial values, specifically when `init="random"` or not all parameters are initialized in the user-supplied list or function. If specified, the initial values are simulated uniformly from interval [-`init_r`, `init_r`] rather than using the default interval (see the manual of (cmd)Stan). `test_grad` (`logical`). If `test_grad=TRUE`, Stan will not do any sampling. Instead, the gradient calculation is tested and printed out and the fitted `stanfit` object is in test gradient mode. By default, it is `FALSE`. `append_samples` (`logical`). Only relevant if `sample_file` is specified and is an existing file. In that case, setting `append_samples=TRUE` will append the samples to the existing file rather than overwriting the contents of the file. `refresh` (`integer`) can be used to control how often the progress of the sampling is reported (i.e. show the progress every `refresh` iterations). By default, `refresh = max(iter/10, 1)`. The progress indicator is turned off if `refresh <= 0`. Deprecated: `enable_random_init` (`logical`) being `TRUE` enables specifying initial values randomly when the initial values are not fully specified from the user. `save_warmup` (`logical`) indicates whether to save draws during the warmup phase and defaults to `TRUE`. Some memory related problems can be avoided by setting it to `FALSE`, but some diagnostics are more limited if the warmup draws are not stored.
`boost_lib`	The path for an alternative version of the Boost C++ to use instead of the one in the BH package.
`eigen_lib`	The path for an alternative version of the Eigen C++ library to the one in RcppEigen.