| glm_b | R Documentation |
glm_b is used to fit linear models. It can be used to carry out
regression for gaussian, binomial, and poisson data. Note that if
the family is gaussian, this is just a wrapper for lm_b.
glm_b(
formula,
data,
family,
trials,
prior = c("zellner", "normal", "improper")[1],
zellner_g,
prior_beta_mean,
prior_beta_precision,
prior_phi_mean = 1,
ROPE,
CI_level = 0.95,
vb_maximum_iterations = 1000,
algorithm = "VB",
proposal_df = 5,
seed = 1,
mc_error = 0.01,
save_memory = FALSE
)
formula |
A formula specifying the model. |
data |
A data frame in which the variables specified in the formula will be found. If missing, the variables are searched for in the standard way. However, it is strongly recommended that you use this argument so that other generics for bayesics objects work correctly. |
family |
A description of the error distribution and link function
to be used in the model. See |
trials |
Either character naming the variable in |
prior |
character. One of "zellner", "normal", or "improper", giving the type of prior used on the regression coefficients. |
zellner_g |
numeric. Positive number giving the value of "g" in Zellner's g prior. Ignored unless prior = "zellner". Default is the number of observations. |
prior_beta_mean |
numeric vector of same length as regression coefficients (denoted p). Unless otherwise specified, automatically set to rep(0,p). Ignored unless prior = "normal". |
prior_beta_precision |
pxp matrix giving a priori precision matrix to be scaled by the residual precision. Ignored unless prior = "normal". |
prior_phi_mean |
For negative binomial distributed outcomes, an
exponential distribution is used for the prior of the dispersion parameter
|
ROPE |
vector of positive values giving ROPE boundaries for each regression coefficient. Optionally, you can not include a ROPE boundary for the intercept. If missing, defaults go to those suggested by Kruchke (2018). |
CI_level |
numeric. Credible interval level. |
vb_maximum_iterations |
if |
algorithm |
Either "VB" (default) for fixed-form variational Bayes, "IS" for importance sampling, or "LSA" for large sample approximation. |
proposal_df |
degrees of freedom used in the multivariate t proposal
distribution if |
seed |
integer. Always set your seed!!! Not used for
|
mc_error |
If importance sampling is used, the number of posterior
draws will ensure that with 99% probability the bounds of the credible
intervals will be within |
save_memory |
logical. If TRUE, a more memory efficient approach will be taken at the expense of computataional time (for important sampling only. But if memory is an issue, it's probably because you have a large sample size, in which case the normal approximation sans IS should probably work.) |
glm_b() returns an object of class "glm_b", which behaves as a list with the following elements:
summary - tibble giving results for regression coefficients
posterior_draws
ROPE
hyperparameters - list giving the user input or default hyperparameters used
fitted - posterior mean of the individuals' means
residuals - posterior mean of the residuals
If algorithm = "IS", the following:
proposal_draws - draws from
the importance sampling proposal distribution (i.e., multivariate
t centered at the posterior mode with precision equal to the
negative hessian, and degrees of freedom set to the user input proposal_df.
importance_sampling_weights - importance sampling
weights that match the rows of the returned proposal_draws.
effective_sample_size
mc_error
other inputs into glm_b
Importance sampling:
glm_b will, unless use_importance_sampling = FALSE, perform importance sampling.
The proposal will use a multivariate t distribution, centered at the
posterior mode, with the negative hessian as its precision matrix. Do NOT
treat the proposal_draws as posterior draws.
Priors:
If the prior is set to be either "zellner" or "normal", a normal distribution
will be used as the prior of \beta, specifically
\beta \sim N(\mu, V)
where \mu is the prior_beta_mean and V is the prior_beta_precision (not covariance) matrix.
zellner: glm_b sets \mu=0 and V = \frac{1}{g} X^{\top} X.
normal: If missing prior_beta_mean, glm_b sets \mu=0,
and if missing prior_beta_precision V will be a diagonal matrix. The first element,
corresponding to the intercept, will be (2.5\times \max{\tilde{s}_y,1})^{-2}, where
\tilde{s}_y is max of 1 and the standard deviation of y. Remaining diagonal elements
will equal (2.5 s_y/s_x)^{-2}, where s_x is the standard deviations
of the covariates. This equates to being 95% certain a priori that a change in
x by one standard deviation (s_x) would not lead to a change in the linear predictor of
more than 5 standard deviations (5s_y). This imposes weak regularization that adapts to the scale
of the data elements.
ROPE:
If missing, the ROPE bounds will be given under the principle of "half of a small effect size."
Gaussian. Using Cohen's D of 0.2 as a small effect size, the ROPE is
built under the principle that moving the full range of X (i.e., \pm 2 s_x)
will not move the mean of y by more than the overall mean of y
minus 0.1s_y to the overall mean of y plus 0.1s_y.
The result is a ROPE equal to |\beta_j| < 0.05s_y/s_j. If the covariate is
binary, then this is simply |\beta_j| < 0.2s_y.
Poisson. FDA guidance suggests a small effect is a rate ratio less
than 1.25. We use half this effect: 1.125, and consider ROPE to indicate
that a moving the full range of X (\pm 2s_x will not change the rate
ratio by more than this amount. Thus the ROPE for the regression
coefficient equals |\beta| < \frac{\log(1.125)}{4s_x}. For binary
covariates, this is simply |\beta| < \log(1.125).
Kruschke JK. Rejecting or Accepting Parameter Values in Bayesian Estimation. Advances in Methods and Practices in Psychological Science. 2018;1(2):270-280. doi:10.1177/2515245918771304
Tim Salimans. David A. Knowles. "Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression." Bayesian Anal. 8 (4) 837 - 882, December 2013. https://doi.org/10.1214/13-BA858
# Generate some negative-binomial data
set.seed(2025)
N = 500
test_data =
data.frame(x1 = rnorm(N),
x2 = rnorm(N),
x3 = letters[1:5],
time = rexp(N))
test_data$outcome =
rnbinom(N,
mu = exp(-2 + test_data$x1 + 2 * (test_data$x3 %in% c("d","e"))) * test_data$time,
size = 0.7)
# Fit using variational Bayes (default)
fit_vb1 <-
glm_b(outcome ~ x1 + x2 + x3 + offset(log(time)),
data = test_data,
family = negbinom(),
seed = 2025)
fit_vb1
summary(fit_vb1,
CI_level = 0.9)
plot(fit_vb1)
coef(fit_vb1)
credint(fit_vb1,
CI_level = 0.99)
bayes_factors(fit_vb1,
by = "v")
preds =
predict(fit_vb1)
# Try different priors
fit_vb2 <-
glm_b(outcome ~ x1 + x2 + x3 + offset(log(time)),
data = test_data,
family = negbinom(),
seed = 2025,
prior = "normal")
fit_vb2
fit_vb3 <-
glm_b(outcome ~ x1 + x2 + x3 + offset(log(time)),
data = test_data,
family = negbinom(),
seed = 2025,
prior = "improper")
fit_vb3
# Use Importance sampling instead of VB
fit_is <-
glm_b(outcome ~ x1 + x2 + x3 + offset(log(time)),
data = test_data,
family = negbinom(),
algorithm = "IS",
seed = 2025)
summary(fit_is)
# Use large sample approximation instead of VB
fit_lsa <-
glm_b(outcome ~ x1 + x2 + x3 + offset(log(time)),
data = test_data,
family = negbinom(),
algorithm = "LSA",
seed = 2025)
summary(fit_lsa)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.