dmWAIC: WAIC for Dirichlet-Multinomial Regression Models

View source: R/dmWAIC.R

dmWAICR Documentation

WAIC for Dirichlet-Multinomial Regression Models

Description

Computes the widely applicable information criterion (WAIC) for Dirichlet-multinomial regression models. Serves as a wrapper for dmreg, dmpredict, ddirmult, and waic for convenient WAIC calculations. Installation of the rstan package is required to use this function.

Usage

dmWAIC(
  Y,
  X,
  H,
  ones = TRUE,
  method = 2,
  priors = c(B.mu = 0, B.sd = 1, theta.mu = 0, theta.sd = 1, sigma2.alpha = 0.01,
    sigma2.beta = 0.01),
  control = list(adapt_delta = 0.95, max_treedepth = 20),
  ...
)

Arguments

Y

Numeric response matrix. Each record represents an observation, and each field represents a response dimension. Matrix cells contain integer counts.

X

Numeric predictor matrix. Each record represents an observation, and each field represents a predictor variable. Matrix cells contain predictor values.

H

Numeric vector or matrix (optional). If provided, then hierarchical effects are included in the model. Vector or matrix elements contain integer identifiers for values of hierarchical variables. If vector, then a single hierarchical variable is included, with each element representing an observation. If matrix, then each record represents an observation, and each field represents a hierarchical variable. Up to four hierarchical variables are supported (each with an arbitrary number of hierarchical levels).

ones

Logical scalar. If TRUE (the default), then one is added to each cell of the response matrix. This avoids numerical errors which occur when distributional parameters in the model approach zero. For more information, see Harrison et al. (2020). If the response matrix contains no zeros, then ones may be set to FALSE.

method

Numeric scalar. Options are 1 or 2, representing the alternative WAIC bias correction formulas (pWAIC1 and pWAIC2, respectively) described in Gelman et al. (2014). As recommended by Gelman et al. (2014), the default method (2) uses the pWAIC2 bias correction formula.

priors

Named numeric vector. Elements represent the prior values of their respective named parameters. When predictors are centered and scaled, the defaults generally represent weakly informative priors. Regression coefficients (B) and the precision parameter (theta) receive normal priors (with standard normal as the default). If hierarchical variables (argument H) are provided, then the common variances receive inverse-gamma priors (with default alpha and beta parameters of 0.01).

control

Named list of parameters which control the behavior of the Stan sampler. Passed to the control argument of the rstan::sampling function.

...

Additional arguments passed to the rstan::sampling function.

Details

For convenience, wraps the steps involved in WAIC calculations for Bayesian Dirichlet-multinomial regression models. Begins by fitting a Bayesian Dirichlet-multinomial regression model with the dmreg function, then generates resubstitution posterior predictions using the dmpredict function. The pointwise log-likelihood is calculated with the ddirmult function given the response matrix, posterior predictions, and precision parameter. WAIC is calculated from the pointwise log-likelihood using the waic function.

Value

Returns numeric scalar of the widely applicable information criterion.

References

Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, and Riddell A. 2017. Stan: A probabilistic programming language. Journal of Statistical Software, 76: 1-32. DOI: 10.18637/jss.v076.i01

Gelman A, Hwang J, and Vehtari A. 2014. Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24(6): 997-1016. DOI: 10.1007/s11222-013-9416-2

Goodwin KB, Hutchinson JD, and Gompert Z. 2022. Spatiotemporal and ontogenetic variation, microbial selection, and predicted Bd-inhibitory function in the skin-associated microbiome of a Rocky Mountain amphibian. Frontiers in Microbiology, 13: 1020329. DOI: 10.3389/fmicb.2022.1020329

Harrison JG, Calder WJ, Shastry V, and Buerkle CA. Dirichlet-multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data. Molecular Ecology Resources, 20(2): 481-497. DOI: 10.1111/1755-0998.13128

Watanabe S. 2010. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11(116): 3571-3594.

See Also

dmreg for fitting Dirichlet-multinomial regression models.

dmpredict for generating predictions from Dirichlet-multinomial regression models.

ddirmult for probability mass function of the Dirichlet-multinomial distribution.

waic for generic function to compute widely applicable information criterion.

Examples


# Define example data file path.
path<-system.file("extdata",
                  "example_regression_data.rds",
                  package="LocaTT",
                  mustWork=TRUE)

# Read in example regression data.
data<-readRDS(file=path)

# Compute WAIC for Dirichlet-multinomial regression.
out<-dmWAIC(Y=data$Y,X=data$X,H=data$H)


LocaTT documentation built on June 14, 2026, 1:06 a.m.