README.md

Bayesim

A simulation framework for Bayesian models based on brms.

The main function is full_simulation with the main arguments being data_gen_confs, data_gen_fun, fit_confs and metrics. Bayesim will generate datasets by passing data_gen_confs rows to data_gen_fun and fit each model defined by fit_confs on each generated dataset. It then calculates all of the defined metrics for each model. This is, as of now, done in a fully crossed fashion.

Define Data Simulation

Data simulation consists of two parts. A data_gen_fun function and a data_gen_confs dataframe. Bayesim will feed each row of data_gen_confs into data_gen_fun to generate each individual dataset.

The only strictly necessary columns in data_gen_confs are:

data_gen_fun should output a named list that contains the following parts:

constant_linpred_dgp <- function(data_N,
                                 data_link,
                                 data_family,
                                 seed = NULL,
                                 testing_data = TRUE,
                                 vars_of_interest = list("mu"),
                                 mean = 0,
                                 ...) {
  arguments <- as.list(c(as.list(environment()), list(...)))
  arguments$seed <- NULL

  if (!is.null(seed)) {
    set.seed(seed)
  }

  if (testing_data) {
    data_gen_size <- data_N * 2
  } else {
    data_gen_size <- data_N
  }
  dataset <- data.frame()
  mu = rnorm(n = 1, mean = x, sd = 1)
  y = rnorm(n = data_gen_size, mean = mu, sd = 1)

  # This creates a list of values for each of the vars_of_interest. 
  arguments$references <- lapply(
    unlist(vars_of_interest),
    function(x) get(x)
  )

  data_gen_output <- list()
   # Anything in addition to the function arguments you want to save about
   # the data generation process ie. If you are resampling the number
   # of invalid samples.
  )
  data_gen_output <- c(data_gen_output, arguments)

  if (testing_data) {
    return(
      list(
        dataset = list(y = dataset[1:data_N, ]),
        testing_data = list(y = dataset[(data_N + 1):data_gen_size, ]),
        data_gen_output = data_gen_output
      )
    )
  } else {
    return(
      list(
        dataset = dataset,
        testing_data = NULL,
        data_gen_output = data_gen_output
      )
    )
  }
}

Define Fit Configurations

Fit configurations currently are dataframes with the following columns:

Define Metrics

Metrics are defined via a list of string identifiers. The supported metrics are:

Variable summaries

"v_mean" "v_sd" "v_median" "v_mad" "v_pos_prob" "v_quantiles" "v_bias" "v_rmse" "v_mae" "v_mse" "v_true_percentile"

Global MCMC Diagnostics

"divergent_transitions_rel" "divergent_transitions_abs" "rstar" "bad_pareto_ks" "pareto_k_values" "time_per_sample"

Variable MCMC Diagnostics

"rhat" "ess_bulk" "ess_tail"

Predictive Metrics

"elpd_loo" "elpd_loo_pointwise" "elpd_loo_pointwise_summary" "elpd_test" "elpd_test_pointwise_summary" "rmse_loo" "rmse_loo_pointwise" "rmse_loo_pointwise_summary" "rmse_test" "rmse_test_pointwise_summary" "r2_loo" "r2_loo_pointwise" "r2_loo_pointwise_summary" "r2_test" "r2_test_pointwise_summary"

Posterior sample based metrics

"log_lik_pointwise" "log_lik_summary" "ppred_summary_y_scaled" "ppred_pointwise" "residuals" "posterior_linpred" "posterior_linpred_transformed"

Observations

"y_pointwise" "y_pointwise_z_scaled" "y_summaries"

Data

"data_gen"

Fits

"fit_gen"

Or see metric_lookup for all currently implemented metrics.

Additional Arguments

seed, sets a seed that will result in the rest of the simulation happening deterministically, conditional on the seed. Allows for reproduction of individual results or the entire simulation run later on.

Output

Stan Options

stan_pars should be a named list that contains the following arguments:

Using multiple Cores

Related Work

Bayesim has been used in the following projects:



sims1253/bayesim documentation built on Aug. 13, 2024, 5:59 p.m.