create_sampler: Create a sampler object
In mcmcsae: Markov Chain Monte Carlo Small Area Estimation

create_sampler

R Documentation

Create a sampler object

Description

This function sets up a sampler object, based on the specification of a model. The object contains functions to draw a set of model parameters from their prior and conditional posterior distributions, and to generate starting values for the MCMC simulation. The functions share a common environment containing precomputed quantities such as design matrices based on the model and the data. The sampler object is the main input for the MCMC simulation function MCMCsim.

Usage

create_sampler(
  formula,
  data = NULL,
  family = "gaussian",
  ny = NULL,
  ry = NULL,
  r.mod = NULL,
  sigma.fixed = NULL,
  sigma.mod = NULL,
  Q0 = NULL,
  formula.V = NULL,
  logJacobian = NULL,
  linpred = NULL,
  compute.weights = FALSE,
  block = NULL,
  prior.only = FALSE,
  control = sampler_control()
)

Arguments

`formula`	formula to specify the response variable and additive model components. The model components form the linear predictor part of the model. A model component on the right hand side can be either a regression term specified by `reg(...)`, a covariates subject to error term specified by `mec(...)`, or a generic random effect term specified by `gen(...)`. See for details the help pages for these model component creation functions. An offset can be specified as `offset(...)`. Other terms in the formula are collectively interpreted as ordinary regression effects, treated in the same way as a `reg(...)` term, but without the option to change the prior.
`data`	data frame with n rows in which the variables specified in model components can be found.
`family`	character string describing the data distribution. The default is 'gaussian'. Other options are 'binomial', 'multinomial', 'negbinomial' for the negative binomial distribution, 'poisson', and 'gamma'. Alternatively, functions starting with 'f_' followed by the family name can be used to specify the sampling distribution and possibly further options. See `f_gaussian`, `f_binomial`, `f_multinomial`, `f_negbinomial`, `f_poisson`, `f_gamma` and `f_gaussian_gamma`. For categorical or multinomial data, use `family="multinomial"` or `family=f_multinomial()` where the second form allows to additionally specify the number of trials (if not equal to 1). A stick-breaking representation of the multinomial distribution is used for model fitting, and the logistic link function relates each category except the last to a linear predictor. The categories can be referenced in the model specification formula by 'cat_'.
`ny`	NO LONGER USED. Please use `f_binomial` to specify the numbers of trials.
`ry`	NO LONGER USED. Please use `f_negbinomial` to specify further options for the negative binomial sampling distribution.
`r.mod`	NO LONGER USED. Please use `f_negbinomial` to specify further options for the negative binomial sampling distribution.
`sigma.fixed`	for Gaussian models, if `TRUE` the residual standard deviation parameter 'sigma_' is fixed at 1. In that case argument `sigma.mod` is ignored. This is convenient for Fay-Herriot type models with (sampling) variances assumed to be known. Default is `FALSE`. DEPRECATED, please use `f_gaussian` to specify variance options. In particular, a fixed scalar variance parameter with value 1 can be specified with `var.prior=pr_fixed(value=1)`.
`sigma.mod`	prior for the variance parameter of a gaussian sampling distribution. This can be specified by a call to one of the prior specification functions `pr_invchisq`, `pr_exp`, `pr_gig` or `pr_fixed` for inverse chi-squared, exponential, generalised inverse gaussian or degenerate prior distribution, respectively. The default is an improper prior `pr_invchisq(df=0, scale=1)`. A half-t prior on the standard deviation can be specified using `pr_invchisq` with a chi-squared distributed scale parameter. DEPRECATED, please use `f_gaussian` to specify variance options. In particular, to change the default prior for the scalar variance parameter use argument `var.prior`.
`Q0`	n x n data-level precision matrix for a Gaussian model. It defaults to the unit matrix. If an n-vector is provided it will be expanded to a (sparse) diagonal matrix with Q0 on its diagonal. If a name is supplied it will be looked up in `data` and subsequently expanded to a diagonal matrix. DEPRECATED, please use `f_gaussian`, in particular its `precision` argument, to specify unequal variances, or a non-diagonal precision matrix.
`formula.V`	a formula specifying the terms of a variance model in the case of a Gaussian likelihood. Currently two types of terms are supported: a regression term for the log-variance specified with `vreg(...)`, and a term `vfac(...)` for multiplicative modelled factors at a certain level specified by a factor variable. By using unit-level inverse-chi-squared factors the marginal sampling distribution becomes a Student-t distribution, and by using unit-level exponential factors it becomes a Laplace or double exponential distribution. DEPRECATED, please use `f_gaussian`, in particular its `var.model` argument, to specify a variance model.
`logJacobian`	if the data are transformed the logarithm of the Jacobian can be supplied so that it is incorporated in all log-likelihood computations. This can be useful for comparing information criteria for different transformations. It should be supplied as a vector of the same size as the response variable, and is currently only supported if `family="gaussian"`. For example, when a log-transformation is used on response vector `y`, the vector `-log(y)` should be supplied. DEPRECATED, this argument has moved to `f_gaussian`.
`linpred`	a list of matrices defining (possibly out-of-sample) linear predictors to be simulated. This allows inference on e.g. (sub)population totals or means. The list must be of the form `list(name_1=X_1, ...)` where the names refer to the model component names and predictions are computed by summing `X_i %*% p[[name_i]]`. Alternatively, `linpred="fitted"` can be used as a short-cut for simulations of the full in-sample linear predictor.
`compute.weights`	if `TRUE` weights are computed for each element of `linpred`. Note that for a large dataset in combination with vector-valued linear predictors the weights can take up a lot of memory. By default only means are stored in the simulation carried out using `MCMCsim`.
`block`	DEPRECATED, please use argument `control` instead, see also `sampler_control`. Note that this parameter is now by default set to `TRUE`.
`prior.only`	whether a sampler is set up only for sampling from the prior or for sampling from both prior and posterior distributions. Default `FALSE`. If `TRUE` there is no need to specify a response in `formula`. This is used by `generate_data`, which samples from the prior predictive distribution.
`control`	a list with further computational options. These options can be specified using function `sampler_control`.

Details

The right hand side of the formula argument to create_sampler can be used to specify additive model components. Currently four model components are supported: reg(...) for regression or 'fixed' effects, gen(...) for generic random effects, mec(...) for measurement in covariates effects, and brt(...) for a Bayesian additive regression trees component. Note that an offset can be added separately, in the usual way using offset(...).

For gaussian models, formula.V can be used to specify the variance structure of the model. Currently two specialised variance model components are supported, vreg(...) for regression effects predicting the log-variance and vfac(...) for modelled variance factors.

Value

A sampler object, which is the main input for the MCMC simulation function MCMCsim. The sampler object is an environment with precomputed quantities and functions. The main functions are rprior, which returns a sample from the prior distributions, draw, which returns a sample from the full conditional posterior distributions, and start, which returns a list with starting values for the Gibbs sampler. If prior.only is TRUE, functions draw and start are not created.

References

J.H. Albert and S. Chib (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association 88(422), 669-679.

D. Bates, M. Maechler, B. Bolker and S.C. Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67(1), 1-48.

S.W. Linderman, M.J. Johnson and R.P. Adams (2015). Dependent multinomial models made easy: Stick-breaking with the Polya-Gamma augmentation. Advances in Neural Information Processing Systems, 3456-3464.

P.A. Parker, S.H. Holan and R. Janicki (2024). Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure. Journal of Survey Statistics and Methodology 12(4), 1061-1080.

N. Polson, J.G. Scott and J. Windle (2013). Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association 108(504), 1339-1349.

H. Rue and L. Held (2005). Gaussian Markov Random Fields. Chapman & Hall/CRC.

Examples

# first generate some data
n <- 200
x <- rnorm(n)
y <- 0.5 + 2*x + 0.3*rnorm(n)
# create a sampler for a simple linear regression model
sampler <- create_sampler(y ~ x)
sim <- MCMCsim(sampler)
(summary(sim))

y <- rbinom(n, 1, 1 / (1 + exp(-(0.5 + 2*x))))
# create a sampler for a binary logistic regression model
sampler <- create_sampler(y ~ x, family="binomial")
sim <- MCMCsim(sampler)
(summary(sim))

mcmcsae documentation built on June 8, 2025, 10:55 a.m.