dmreg: Fitting Dirichlet-Multinomial Regression Models

View source: R/dmreg.R

dmregR Documentation

Fitting Dirichlet-Multinomial Regression Models

Description

Fit a Bayesian Dirichlet-multinomial regression model. Both fixed and hierarchical effects are supported. Installation of the rstan package is required to use this function.

Usage

dmreg(
  Y,
  X,
  H,
  ones = TRUE,
  priors = c(B.mu = 0, B.sd = 1, theta.mu = 0, theta.sd = 1, sigma2.alpha = 0.01,
    sigma2.beta = 0.01),
  control = list(adapt_delta = 0.95, max_treedepth = 20),
  ...
)

Arguments

Y

Numeric response matrix. Each record represents an observation, and each field represents a response dimension. Matrix cells contain integer counts.

X

Numeric predictor matrix. Each record represents an observation, and each field represents a predictor variable. Matrix cells contain predictor values.

H

Numeric vector or matrix (optional). If provided, then hierarchical effects are included in the model. Vector or matrix elements contain integer identifiers for values of hierarchical variables. If vector, then a single hierarchical variable is included, with each element representing an observation. If matrix, then each record represents an observation, and each field represents a hierarchical variable. Up to four hierarchical variables are supported (each with an arbitrary number of hierarchical levels).

ones

Logical scalar. If TRUE (the default), then one is added to each cell of the response matrix. This avoids numerical errors which occur when distributional parameters in the model approach zero. For more information, see Harrison et al. (2020). If the response matrix contains no zeros, then ones may be set to FALSE.

priors

Named numeric vector. Elements represent the prior values of their respective named parameters. When predictors are centered and scaled, the defaults generally represent weakly informative priors. Regression coefficients (B) and the precision parameter (theta) receive normal priors (with standard normal as the default). If hierarchical variables (argument H) are provided, then the common variances receive inverse-gamma priors (with default alpha and beta parameters of 0.01).

control

Named list of parameters which control the behavior of the Stan sampler. Passed to the control argument of the rstan::sampling function.

...

Additional arguments passed to the rstan::sampling function.

Details

Fits the Bayesian Dirichlet-multinomial regression model of Goodwin et al. (2022) using the rstan interface to Stan (Carpenter et al. 2017). A stanfit object of the fitted model is returned, which can be used with standard rstan functions to evaluate model convergence (e.g., posterior trace plots, R-hat convergence diagnostics, and effective sample sizes). The model formulation is identical to that of Goodwin et al. (2022), except that the hard sum-to-zero constraint on hierarchical effects was removed to preserve the prior marginal variance of the final element. Up to four hierarchical variables are supported.

For each observation, counts are distributed according to the Dirichlet-multinomial distribution with alpha parameters defined as the product of an expected proportions vector and an exponentiated precision parameter. The precision parameter controls the degree of overdispersion relative to the multinomial distribution. The softmax function normalizes linear predictor combinations into expected proportions. For the model to be identifiable, the regression coefficients of the final dimension are set to zero. By default, weakly informative priors are used on the regression coefficients (B), precision parameter (theta), and hierarchical variances (sigma2). See the supplement of Goodwin et al. (2022) for details.

Value

Returns a stanfit object of the fitted Bayesian Dirichlet-multinomial regression model.

References

Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, and Riddell A. 2017. Stan: A probabilistic programming language. Journal of Statistical Software, 76: 1-32. DOI: 10.18637/jss.v076.i01

Goodwin KB, Hutchinson JD, and Gompert Z. 2022. Spatiotemporal and ontogenetic variation, microbial selection, and predicted Bd-inhibitory function in the skin-associated microbiome of a Rocky Mountain amphibian. Frontiers in Microbiology, 13: 1020329. DOI: 10.3389/fmicb.2022.1020329

Harrison JG, Calder WJ, Shastry V, and Buerkle CA. Dirichlet-multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data. Molecular Ecology Resources, 20(2): 481-497. DOI: 10.1111/1755-0998.13128

See Also

dmpredict for generating predictions from Dirichlet-multinomial regression models.

dmWAIC for computing widely applicable information criteria for Dirichlet-multinomial regression models.

Examples


# Define example data file path.
path<-system.file("extdata",
                  "example_regression_data.rds",
                  package="LocaTT",
                  mustWork=TRUE)

# Read in example regression data.
data<-readRDS(file=path)

# Fit Dirichlet-multinomial regression.
out<-dmreg(Y=data$Y,X=data$X,H=data$H)


LocaTT documentation built on June 14, 2026, 1:06 a.m.