View source: R/tar_stan_mcmc_rep_draws.R
tar_stan_mcmc_rep_draws | R Documentation |
tar_stan_mcmc_rep_draws()
creates targets
to run MCMC multiple times per model and
save only the draws from each run.
tar_stan_mcmc_rep_draws(
name,
stan_files,
data = list(),
batches = 1L,
reps = 1L,
combine = FALSE,
compile = c("original", "copy"),
quiet = TRUE,
stdout = NULL,
stderr = NULL,
dir = NULL,
pedantic = FALSE,
include_paths = NULL,
cpp_options = list(),
stanc_options = list(),
force_recompile = FALSE,
seed = NULL,
refresh = NULL,
init = NULL,
save_latent_dynamics = FALSE,
output_dir = NULL,
output_basename = NULL,
sig_figs = NULL,
chains = 4,
parallel_chains = getOption("mc.cores", 1),
chain_ids = seq_len(chains),
threads_per_chain = NULL,
opencl_ids = NULL,
iter_warmup = NULL,
iter_sampling = NULL,
save_warmup = FALSE,
thin = NULL,
max_treedepth = NULL,
adapt_engaged = TRUE,
adapt_delta = NULL,
step_size = NULL,
metric = NULL,
metric_file = NULL,
inv_metric = NULL,
init_buffer = NULL,
term_buffer = NULL,
window = NULL,
fixed_param = FALSE,
show_messages = TRUE,
diagnostics = c("divergences", "treedepth", "ebfmi"),
inc_warmup = FALSE,
variables = NULL,
data_copy = character(0),
transform = NULL,
tidy_eval = targets::tar_option_get("tidy_eval"),
packages = targets::tar_option_get("packages"),
library = targets::tar_option_get("library"),
format = "qs",
format_df = "fst_tbl",
repository = targets::tar_option_get("repository"),
error = targets::tar_option_get("error"),
memory = "transient",
garbage_collection = TRUE,
deployment = targets::tar_option_get("deployment"),
priority = targets::tar_option_get("priority"),
resources = targets::tar_option_get("resources"),
storage = targets::tar_option_get("storage"),
retrieval = targets::tar_option_get("retrieval"),
cue = targets::tar_option_get("cue"),
description = targets::tar_option_get("description")
)
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of paths to known existing Stan model files created before running the pipeline. |
data |
Code to generate a single replication of a simulated dataset.
The workflow simulates multiple datasets, and each
model runs on each dataset. To join data on to the model
summaries, include a |
batches |
Number of batches. Each batch is a sequence of branch targets containing multiple reps. Each rep generates a dataset and runs the model on it. |
reps |
Number of replications per batch. |
combine |
Logical, whether to create a target to combine all the model results into a single data frame downstream. Convenient, but duplicates data. |
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
refresh |
(non-negative integer) The number of iterations between
printed screen updates. If |
init |
(multiple options) The initialization method to use for the variables declared in the parameters block of the Stan program. One of the following:
|
save_latent_dynamics |
(logical) Should auxiliary diagnostic information
about the latent dynamics be written to temporary diagnostic CSV files?
This argument replaces CmdStan's |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For MCMC there will be one file per chain; for other
methods there will be a single file. For interactive use this can typically
be left at
|
output_basename |
(string) A string to use as a prefix for the names of
the output CSV files of CmdStan. If |
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
chains |
(positive integer) The number of Markov chains to run. The default is 4. |
parallel_chains |
(positive integer) The maximum number of MCMC chains
to run in parallel. If |
chain_ids |
(integer vector) A vector of chain IDs. Must contain as many
unique positive integers as the number of chains. If not set, the default
chain IDs are used (integers starting from |
threads_per_chain |
(positive integer) If the model was
compiled with threading support, the number of
threads to use in parallelized sections within an MCMC chain (e.g., when
using the Stan functions |
opencl_ids |
(integer vector of length 2) The platform and device IDs of
the OpenCL device to use for fitting. The model must be compiled with
|
iter_warmup |
(positive integer) The number of warmup iterations to run
per chain. Note: in the CmdStan User's Guide this is referred to as
|
iter_sampling |
(positive integer) The number of post-warmup iterations
to run per chain. Note: in the CmdStan User's Guide this is referred to as
|
save_warmup |
(logical) Should warmup iterations be saved? The default
is |
thin |
(positive integer) The period between saved samples. This should typically be left at its default (no thinning) unless memory is a problem. |
max_treedepth |
(positive integer) The maximum allowed tree depth for the NUTS engine. See the Tree Depth section of the CmdStan User's Guide for more details. |
adapt_engaged |
(logical) Do warmup adaptation? The default is |
adapt_delta |
(real in |
step_size |
(positive real) The initial step size for the discrete approximation to continuous Hamiltonian dynamics. This is further tuned during warmup. |
metric |
(string) One of |
metric_file |
(character vector) The paths to JSON or Rdump files (one
per chain) compatible with CmdStan that contain precomputed inverse
metrics. The |
inv_metric |
(vector, matrix) A vector (if |
init_buffer |
(nonnegative integer) Width of initial fast timestep adaptation interval during warmup. |
term_buffer |
(nonnegative integer) Width of final fast timestep adaptation interval during warmup. |
window |
(nonnegative integer) Initial width of slow timestep/metric adaptation interval. |
fixed_param |
(logical) When |
show_messages |
(logical) When |
diagnostics |
(character vector) The diagnostics to automatically check
and warn about after sampling. Setting this to an empty string These diagnostics are also available after fitting. The
Diagnostics like R-hat and effective sample size are not currently
available via the |
inc_warmup |
(logical) Should warmup draws be included? Defaults to
|
variables |
(character vector) Optionally, the names of the variables (parameters, transformed parameters, and generated quantities) to read in.
|
data_copy |
Character vector of names of scalars in |
transform |
Symbol or |
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy. Possible values:
For cloud-based dynamic files
(e.g. |
garbage_collection |
Logical: |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character string to control when the output of the target
is saved to storage. Only relevant when using
|
retrieval |
Character string to control when the current target
loads its dependencies into memory before running.
(Here, a "dependency" is another target upstream that the current one
depends on.) Only relevant when using
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Draws could take up a lot of storage. If storage becomes
excessive, please consider thinning the draws or using
tar_stan_mcmc_rep_summary()
instead.
Most of the arguments are passed to the $compile()
and $sample()
methods of the CmdStanModel
class. If you
previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_mcmc_rep_draws()
returns a
list of target objects. See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_mcmc_rep_draws(name = x, stan_files = "y.stan")
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with the paths to the
model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: use dynamic branching to generate multiple datasets
by repeatedly running the R expression in the data
argument.
Each dynamic branch returns a batch of Stan data lists that x_y
supplies to the model.
x_y
: dynamic branching target to run MCMC once per dataset.
Each dynamic branch returns a tidy data frames of draws
corresponding to a batch of Stan data from x_data
.
x
: combine all branches of x_y
into a single non-dynamic target.
Suppressed if combine
is FALSE
.
Returns a long tidy data frame of draws.
Rep-specific random number generator seeds for the data and models
are automatically set based on the seed
argument, batch, rep,
parent target name, and tar_option_get("seed")
. This ensures
the rep-specific seeds do not change when you change the batching
configuration (e.g. 40 batches of 10 reps each vs 20 batches of 20
reps each). Each data seed is in the .seed
list element of the output,
and each Stan seed is in the .seed column of each Stan model output.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other MCMC:
tar_stan_mcmc()
,
tar_stan_mcmc_rep_diagnostics()
,
tar_stan_mcmc_rep_summary()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") {
targets::tar_dir({ # tar_dir() runs code from a temporary directory.
targets::tar_script({
library(stantargets)
# Do not use temporary storage for stan files in real projects
# or else your targets will always rerun.
path <- tempfile(pattern = "", fileext = ".stan")
tar_stan_example_file(path = path)
list(
tar_stan_mcmc_rep_draws(
your_model,
stan_files = path,
data = tar_stan_example_data(),
batches = 2,
reps = 2,
stdout = R.utils::nullfile(),
stderr = R.utils::nullfile()
)
)
}, ask = FALSE)
targets::tar_make()
})
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.