brm | R Documentation |
Fit Bayesian generalized (non-)linear multivariate multilevel models using Stan for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit – among others – linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, extended-support beta regression, and even self-defined mixture models all in a multilevel context. Further modeling options include non-linear and smooth terms, auto-correlation structures, censored data, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distributions can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. In addition, model fit can easily be assessed and compared with posterior predictive checks and leave-one-out cross-validation.
brm(
formula,
data,
family = gaussian(),
prior = NULL,
autocor = NULL,
data2 = NULL,
cov_ranef = NULL,
sample_prior = "no",
sparse = NULL,
knots = NULL,
drop_unused_levels = TRUE,
stanvars = NULL,
stan_funs = NULL,
fit = NA,
save_pars = getOption("brms.save_pars", NULL),
save_ranef = NULL,
save_mevars = NULL,
save_all_pars = NULL,
init = NULL,
inits = NULL,
chains = 4,
iter = getOption("brms.iter", 2000),
warmup = floor(iter/2),
thin = 1,
cores = getOption("mc.cores", 1),
threads = getOption("brms.threads", NULL),
opencl = getOption("brms.opencl", NULL),
normalize = getOption("brms.normalize", TRUE),
control = NULL,
algorithm = getOption("brms.algorithm", "sampling"),
backend = getOption("brms.backend", "rstan"),
future = getOption("future", FALSE),
silent = 1,
seed = NA,
save_model = NULL,
stan_model_args = list(),
file = NULL,
file_compress = TRUE,
file_refit = getOption("brms.file_refit", "never"),
empty = FALSE,
rename = TRUE,
...
)
formula |
An object of class |
data |
An object of class |
family |
A description of the response distribution and link function to
be used in the model. This can be a family function, a call to a family
function or a character string naming the family. Every family function has
a |
prior |
One or more |
autocor |
(Deprecated) An optional |
data2 |
A named |
cov_ranef |
(Deprecated) A list of matrices that are proportional to the
(within) covariance structure of the group-level effects. The names of the
matrices should correspond to columns in |
sample_prior |
Indicate if draws from priors should be drawn
additionally to the posterior draws. Options are |
sparse |
(Deprecated) Logical; indicates whether the population-level
design matrices should be treated as sparse (defaults to |
knots |
Optional list containing user specified knot values to be used
for basis construction of smoothing terms. See
|
drop_unused_levels |
Should unused factors levels in the data be
dropped? Defaults to |
stanvars |
An optional |
stan_funs |
(Deprecated) An optional character string containing
self-defined Stan functions, which will be included in the functions
block of the generated Stan code. It is now recommended to use the
|
fit |
An instance of S3 class |
save_pars |
An object generated by |
save_ranef |
(Deprecated) A flag to indicate if group-level effects for
each level of the grouping factor(s) should be saved (default is
|
save_mevars |
(Deprecated) A flag to indicate if draws of latent
noise-free variables obtained by using |
save_all_pars |
(Deprecated) A flag to indicate if draws from all
variables defined in Stan's |
init |
Initial values for the sampler. If |
inits |
(Deprecated) Alias of |
chains |
Number of Markov chains (defaults to 4). |
iter |
Number of total iterations per chain (including warmup; defaults
to 2000). Can be set globally for the current R session via the
|
warmup |
A positive integer specifying number of warmup (aka burnin)
iterations. This also specifies the number of iterations used for stepsize
adaptation, so warmup draws should not be used for inference. The number
of warmup should not be larger than |
thin |
Thinning rate. Must be a positive integer. Set |
cores |
Number of cores to use when executing the chains in parallel,
which defaults to 1 but we recommend setting the |
threads |
Number of threads to use in within-chain parallelization. For
more control over the threading process, |
opencl |
The platform and device IDs of the OpenCL device to use for
fitting using GPU support. If you don't know the IDs of your OpenCL device,
|
normalize |
Logical. Indicates whether normalization constants should
be included in the Stan code (defaults to |
control |
A named |
algorithm |
Character string naming the estimation approach to use.
Options are |
backend |
Character string naming the package to use as the backend for
fitting the Stan model. Options are |
future |
Logical; If |
silent |
Verbosity level between |
seed |
The seed for random number generation to make results
reproducible. If |
save_model |
Either |
stan_model_args |
A |
file |
Either |
file_compress |
Logical or a character string, specifying one of the
compression algorithms supported by |
file_refit |
Modifies when the fit stored via the |
empty |
Logical. If |
rename |
For internal use only. |
... |
Further arguments passed to Stan.
For |
Fit a generalized (non-)linear multivariate multilevel model via
full Bayesian inference using Stan. A general overview is provided in the
vignettes vignette("brms_overview")
and
vignette("brms_multilevel")
. For a full list of available vignettes
see vignette(package = "brms")
.
Formula syntax of brms models
Details of the formula syntax applied in brms can be found in
brmsformula
.
Families and link functions
Details of families supported by brms can be found in
brmsfamily
.
Prior distributions
Priors should be specified using the
set_prior
function. Its documentation
contains detailed information on how to correctly specify priors. To find
out on which parameters or parameter classes priors can be defined, use
default_prior
. Default priors are chosen to be
non or very weakly informative so that their influence on the results will
be negligible and you usually don't have to worry about them. However,
after getting more familiar with Bayesian statistics, I recommend you to
start thinking about reasonable informative priors for your model
parameters: Nearly always, there is at least some prior information
available that can be used to improve your inference.
Adjusting the sampling behavior of Stan
In addition to choosing the number of iterations, warmup draws, and
chains, users can control the behavior of the NUTS sampler, by using the
control
argument. The most important reason to use control
is
to decrease (or eliminate at best) the number of divergent transitions that
cause a bias in the obtained posterior draws. Whenever you see the
warning "There were x divergent transitions after warmup." you should
really think about increasing adapt_delta
. To do this, write
control = list(adapt_delta = <x>)
, where <x>
should usually
be value between 0.8
(current default) and 1
. Increasing
adapt_delta
will slow down the sampler but will decrease the number
of divergent transitions threatening the validity of your posterior
draws.
Another problem arises when the depth of the tree being evaluated in each
iteration is exceeded. This is less common than having divergent
transitions, but may also bias the posterior draws. When it happens,
Stan will throw out a warning suggesting to increase
max_treedepth
, which can be accomplished by writing control =
list(max_treedepth = <x>)
with a positive integer <x>
that should
usually be larger than the current default of 10
. For more details
on the control
argument see stan
.
An object of class brmsfit
, which contains the posterior
draws along with many other useful information about the model. Use
methods(class = "brmsfit")
for an overview on available methods.
Paul-Christian Buerkner paul.buerkner@gmail.com
Paul-Christian Buerkner (2017). brms: An R Package for Bayesian Multilevel
Models Using Stan. Journal of Statistical Software, 80(1), 1-28.
doi:10.18637/jss.v080.i01
Paul-Christian Buerkner (2018). Advanced Bayesian Multilevel Modeling
with the R Package brms. The R Journal. 10(1), 395–411.
doi:10.32614/RJ-2018-017
brms
, brmsformula
,
brmsfamily
, brmsfit
## Not run:
# Poisson regression for the number of seizures in epileptic patients
fit1 <- brm(
count ~ zBase * Trt + (1|patient),
data = epilepsy, family = poisson(),
prior = prior(normal(0, 10), class = b) +
prior(cauchy(0, 2), class = sd)
)
# generate a summary of the results
summary(fit1)
# plot the MCMC chains as well as the posterior distributions
plot(fit1)
# predict responses based on the fitted model
head(predict(fit1))
# plot conditional effects for each predictor
plot(conditional_effects(fit1), ask = FALSE)
# investigate model fit
loo(fit1)
pp_check(fit1)
# Ordinal regression modeling patient's rating of inhaler instructions
# category specific effects are estimated for variable 'treat'
fit2 <- brm(rating ~ period + carry + cs(treat),
data = inhaler, family = sratio("logit"),
prior = set_prior("normal(0,5)"), chains = 2)
summary(fit2)
plot(fit2, ask = FALSE)
WAIC(fit2)
# Survival regression modeling the time between the first
# and second recurrence of an infection in kidney patients.
fit3 <- brm(time | cens(censored) ~ age * sex + disease + (1|patient),
data = kidney, family = lognormal())
summary(fit3)
plot(fit3, ask = FALSE)
plot(conditional_effects(fit3), ask = FALSE)
# Probit regression using the binomial family
ntrials <- sample(1:10, 100, TRUE)
success <- rbinom(100, size = ntrials, prob = 0.4)
x <- rnorm(100)
data4 <- data.frame(ntrials, success, x)
fit4 <- brm(success | trials(ntrials) ~ x, data = data4,
family = binomial("probit"))
summary(fit4)
# Non-linear Gaussian model
fit5 <- brm(
bf(cum ~ ult * (1 - exp(-(dev/theta)^omega)),
ult ~ 1 + (1|AY), omega ~ 1, theta ~ 1,
nl = TRUE),
data = loss, family = gaussian(),
prior = c(
prior(normal(5000, 1000), nlpar = "ult"),
prior(normal(1, 2), nlpar = "omega"),
prior(normal(45, 10), nlpar = "theta")
),
control = list(adapt_delta = 0.9)
)
summary(fit5)
conditional_effects(fit5)
# Normal model with heterogeneous variances
data_het <- data.frame(
y = c(rnorm(50), rnorm(50, 1, 2)),
x = factor(rep(c("a", "b"), each = 50))
)
fit6 <- brm(bf(y ~ x, sigma ~ 0 + x), data = data_het)
summary(fit6)
plot(fit6)
conditional_effects(fit6)
# extract estimated residual SDs of both groups
sigmas <- exp(as.data.frame(fit6, variable = "^b_sigma_", regex = TRUE))
ggplot(stack(sigmas), aes(values)) +
geom_density(aes(fill = ind))
# Quantile regression predicting the 25%-quantile
fit7 <- brm(bf(y ~ x, quantile = 0.25), data = data_het,
family = asym_laplace())
summary(fit7)
conditional_effects(fit7)
# use the future package for more flexible parallelization
library(future)
plan(multisession, workers = 4)
fit7 <- update(fit7, future = TRUE)
# fit a model manually via rstan
scode <- stancode(count ~ Trt, data = epilepsy)
sdata <- standata(count ~ Trt, data = epilepsy)
stanfit <- rstan::stan(model_code = scode, data = sdata)
# feed the Stan model back into brms
fit8 <- brm(count ~ Trt, data = epilepsy, empty = TRUE)
fit8$fit <- stanfit
fit8 <- rename_pars(fit8)
summary(fit8)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.