compute.ensemble.weights: Compute model averaging weights
In hdbayes: Bayesian Analysis of Generalized Linear Models with Historical Data

View source: R/compute_ensemble_weights.R

compute.ensemble.weights

R Documentation

Compute model averaging weights

Description

Compute model averaging weights for a set of Bayesian models using Bayesian model averaging (BMA), pseudo-BMA, pseudo-BMA+ (pseudo-BMA with the Bayesian bootstrap), or stacking. This function takes a list of model fit objects, each containing posterior samples from a generalized linear model (GLM) or survival model, and returns normalized weights that can be used for model comparison or combining posterior samples using functions like sample.ensemble().

Usage

compute.ensemble.weights(
  fit.list,
  type = c("bma", "pseudobma", "pseudobma+", "stacking"),
  prior.prob = NULL,
  bridge.args = NULL,
  loo.args = NULL,
  loo.wts.args = NULL,
  iter_warmup = 1000,
  iter_sampling = 1000,
  chains = 4,
  ...
)

Arguments

`fit.list`	a list of model fit objects returned by functions in the `hdbayes` package. Each fit contains posterior samples from a generalized linear model (GLM) (e.g., via `glm.pp()`), an accelerated failure time (AFT) model (e.g., via `aft.pp()`), a piecewise exponential (PWE) model (e.g., via `pwe.pp()`), or a mixture cure rate model with a PWE component for the non-cured population (e.g., via `curepwe.pp()`). Each fit also includes two attributes: `data`, a list of variables specified in the data block of the Stan program, and `model`, a character string indicating the model name. To compute pseudo-BMA, pseudo-BMA+, or stacking weights, the fitting function must be called with `get.loglik = TRUE`.
`type`	a character string specifying the ensemble method used to compute model weights. Options are "bma" (Bayesian model averaging (BMA)), "pseudobma" (pseudo-BMA without the Bayesian bootstrap), "pseudobma+" (pseudo-BMA with the Bayesian bootstrap), and "stacking".
`prior.prob`	a numeric vector of prior model probabilities, used only when `type = "bma"`. Must be non-negative and sum to 1. If set to NULL, a uniform prior is used (i.e., all models are equally likely). Defaults to NULL.
`bridge.args`	a `list` of optional arguments (excluding `samples`, `log_posterior`, `data`, `lb`, and `ub`) to be passed to `bridgesampling::bridge_sampler()`. These arguments are used when estimating the log marginal likelihood, which is required if `type = "bma"`.
`loo.args`	a `list` of optional arguments (excluding `x`) to be passed to `loo::loo()` when computing pseudo-BMA, pseudo-BMA+, or stacking weights.
`loo.wts.args`	a `list` of optional arguments (excluding `x`, `method`, and `BB`) to be passed to `loo::loo_model_weights()` when computing pseudo-BMA, pseudo-BMA+, or stacking weights.
`iter_warmup`	number of warmup iterations to run per chain. Used only when computing the log marginal likelihood (i.e., when `type = "bma"`). Defaults to 1000. See the argument `iter_warmup` in `sample()` method in cmdstanr package.
`iter_sampling`	number of post-warmup iterations to run per chain. Used only when computing the log marginal likelihood (i.e., when `type = "bma"`). Defaults to 1000. See the argument `iter_sampling` in `sample()` method in cmdstanr package.
`chains`	number of Markov chains to run. Used only when computing the log marginal likelihood (i.e., when `type = "bma"`). Defaults to 4. See the argument `chains` in `sample()` method in cmdstanr package.
`...`	arguments passed to `sample()` method in cmdstanr package (e.g., `seed`, `refresh`, `init`). These are used only when computing the log marginal likelihood (i.e., when `type = "bma"`).

Details

The input fit.list should be a list of outputs from model fitting functions in the hdbayes package, such as glm.pp() (for generalized linear models), aft.pp() (for accelerated failure time models), pwe.pp() (for piecewise exponential (PWE) models), or curepwe.pp() (for mixture cure rate models with a PWE component for the non-cured population). To compute pseudo-BMA, pseudo-BMA+, or stacking weights, each fit must include pointwise log-likelihood values. To ensure this, the fitting function must be called with get.loglik = TRUE.

The arguments related to Markov chain Monte Carlo (MCMC) sampling are utilized to compute the logarithm of the normalizing constant for BMA, if applicable.

Value

The function returns a list with the following objects

weights: a numeric vector of normalized model weights corresponding to the models in fit.list. The names of the weights are made unique based on the model identifiers.
type: a character string indicating the method used to compute the model weights (e.g., "bma", "pseudobma", "pseudobma+", or "stacking")
res.logml: a list of log marginal likelihood estimation results, returned only when type = "bma"
loo.list: a list of outputs from loo::loo(), returned only when type is "pseudobma", "pseudobma+", or "stacking"

References

Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018). Using stacking to average Bayesian predictive distributions. Bayesian Analysis, 13(3), 917–1007.

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432.

Examples

if (instantiate::stan_cmdstan_exists()) {
  if(requireNamespace("survival")){
    library(survival)
    data(E1684)
    data(E1690)
    ## replace 0 failure times with 0.50 days
    E1684$failtime[E1684$failtime == 0] = 0.50/365.25
    E1690$failtime[E1690$failtime == 0] = 0.50/365.25
    E1684$cage = as.numeric(scale(E1684$age))
    E1690$cage = as.numeric(scale(E1690$age))
    data_list = list(currdata = E1690, histdata = E1684)
    nbreaks = 3
    probs   = 1:nbreaks / nbreaks
    breaks  = as.numeric(
      quantile(E1690[E1690$failcens==1, ]$failtime, probs = probs)
    )
    breaks  = c(0, breaks)
    breaks[length(breaks)] = max(10000, 1000 * breaks[length(breaks)])
    fit.pwe.pp = pwe.pp(
      formula = survival::Surv(failtime, failcens) ~ treatment + sex + cage + node_bin,
      data.list = data_list,
      breaks = breaks,
      a0 = 0.5,
      get.loglik = TRUE,
      chains = 1, iter_warmup = 1000, iter_sampling = 2000
    )
    fit.pwe.post = pwe.post(
      formula = survival::Surv(failtime, failcens) ~ treatment + sex + cage + node_bin,
      data.list = data_list,
      breaks = breaks,
      get.loglik = TRUE,
      chains = 1, iter_warmup = 1000, iter_sampling = 2000
    )
    fit.aft.post = aft.post(
      formula = survival::Surv(failtime, failcens) ~ treatment + sex + cage + node_bin,
      data.list = data_list,
      dist = "weibull",
      beta.sd = 10,
      get.loglik = TRUE,
      chains = 1, iter_warmup = 1000, iter_sampling = 2000
    )
    compute.ensemble.weights(
      fit.list = list(fit.pwe.post, fit.pwe.pp, fit.aft.post),
      type = "pseudobma+",
      loo.args = list(save_psis = FALSE),
      loo.wts.args = list(optim_method="BFGS")
    )
  }
}

hdbayes documentation built on Nov. 21, 2025, 1:07 a.m.