cat_glm_bayes: Bayesian Estimation for Catalytic Generalized Linear Models...

View source: R/cat_glm_bayes.R

cat_glm_bayesR Documentation

Bayesian Estimation for Catalytic Generalized Linear Models (GLMs) with Fixed tau

Description

Fits a Bayesian generalized linear model using synthetic and observed data based on an initial data structure, formula, and other model specifications. Supports only Gaussian and Binomial distributions in the GLM family.

Usage

cat_glm_bayes(
  formula,
  cat_init,
  tau = NULL,
  chains = 4,
  iter = 2000,
  warmup = 1000,
  algorithm = "NUTS",
  gaussian_variance_alpha = NULL,
  gaussian_variance_beta = NULL,
  ...
)

Arguments

formula

A formula specifying the GLMs. Should at least include response variables (e.g. ~.).

cat_init

A list generated from cat_glm_initialization.

tau

Optional numeric scalar controlling the weight of the synthetic data in the coefficient estimation. Defaults to the number of predictors / 4 for Gaussian models or the number of predictors otherwise.

chains

Number of Markov chains to run. Default is 4.

iter

Total number of iterations per chain. Default is 2000.

warmup

Number of warm-up iterations per chain (discarded from final analysis). Default is 1000.

algorithm

The sampling algorithm to use in rstan::sampling. Default is "NUTS" (No-U-Turn Sampler).

gaussian_variance_alpha

The shape parameter for the inverse-gamma prior on variance if the variance is unknown in Gaussian models. Defaults to the number of predictors.

gaussian_variance_beta

The scale parameter for the inverse-gamma prior on variance if the variance is unknown in Gaussian models. Defaults to the number of predictors times variance of observation response.

...

Additional parameters to pass to rstan::sampling.

Value

A list containing the values of all the arguments and the following components:

stan_data

The data list used for fitting RStan sampling model.

stan_model

Compiled RStan model object for GLMs.

stan_sample_model

Fitted RStan sampling model containing posterior samples.

coefficients

Mean posterior estimates of model coefficients from stan_sample_model.

Examples


gaussian_data <- data.frame(
  X1 = stats::rnorm(10),
  X2 = stats::rnorm(10),
  Y = stats::rnorm(10)
)

cat_init <- cat_glm_initialization(
  formula = Y ~ 1, # formula for simple model
  data = gaussian_data,
  syn_size = 100, # Synthetic data size
  custom_variance = NULL, # User customized variance value
  gaussian_known_variance = FALSE, # Indicating whether the data variance is unknown
  x_degree = c(1, 1), # Degrees for polynomial expansion of predictors
  resample_only = FALSE, # Whether to perform resampling only
  na_replace = stats::na.omit # How to handle NA values in data
)

cat_model <- cat_glm_bayes(
  formula = ~.,
  cat_init = cat_init, # Only accept object generated from `cat_glm_initialization`
  tau = 1, # Weight for synthetic data
  chains = 1, # Number of Markov chains to be run in the RStan sampling
  iter = 10, # Number of iterations per chain in the RStan sampling
  warmup = 5, # Number of warm-up (or burn-in) iterations for each chain
  algorithm = "NUTS", # Sampling algorithm to use in \code{rstan::sampling}
  gaussian_variance_alpha = 1, # The shape parameter for the inverse-gamma prior for variance
  gaussian_variance_beta = 2 # The scale parameter for the inverse-gamma prior for variance
)
cat_model


catalytic documentation built on April 4, 2025, 5:51 a.m.