bmdd: Bimodal Dirichlet Distribution Estimation
In MicrobiomeStat: Statistical Methods for Microbiome Compositional Data

bmdd	R Documentation

Bimodal Dirichlet Distribution Estimation

Description

Estimates parameters of the bimodal Dirichlet distribution using a variational EM algorithm. Automatically selects the optimal implementation: high-performance C++ (NLopt) when possible, or R when covariates are needed.

Usage

bmdd(W, type = c("count", "proportion"),
     Z = NULL, formula.Z = NULL, U = NULL, formula.U = NULL,
     Z.standardizing = TRUE, U.standardizing = TRUE,
     alp.eta = FALSE, alp.kap = FALSE,
     pi.xi = FALSE, pi.zeta = FALSE,
     para.alp.init = NULL, para.pi.init = NULL, gam.init = NULL,
     iterlim = 500, tol = 1e-6, trace = FALSE,
     method = c("auto", "nlopt", "R"),
     inner.loop = TRUE, inner.iterlim = 20, inner.tol = 1e-6,
     alp.iterlim = 100, alp.tol = 1e-6,
     alp.min = 1e-3, alp.max = 1e3, ...)

Arguments

`W`	Matrix of observed data (m taxa × n samples)
`type`	Data type: "count" or "proportion"
`Z`	Optional matrix of covariates for alpha (forces R implementation)
`formula.Z`	Optional formula for Z covariates
`U`	Optional matrix of covariates for pi (forces R implementation)
`formula.U`	Optional formula for U covariates
`Z.standardizing`	Standardize Z covariates (default TRUE)
`U.standardizing`	Standardize U covariates (default TRUE)
`alp.eta`	Model alpha0 as function of Z (default FALSE)
`alp.kap`	Model alpha1 as function of Z (default FALSE)
`pi.xi`	Model pi as function of U (default FALSE)
`pi.zeta`	Model pi variance as function of U (default FALSE)
`para.alp.init`	Initial alpha values
`para.pi.init`	Initial pi values
`gam.init`	Initial gamma values
`iterlim`	Maximum iterations (default 500)
`tol`	Convergence tolerance (default 1e-6)
`trace`	Print progress (default FALSE)
`method`	Force method: "auto", "nlopt", or "R" (default "auto")
`inner.loop`	Use inner loop for NLopt (default TRUE)
`inner.iterlim`	Inner loop max iterations (default 20)
`inner.tol`	Inner loop tolerance (default 1e-6)
`alp.iterlim`	Alpha optimization iterations (default 100)
`alp.tol`	Alpha tolerance (default 1e-6)
`alp.min`	Minimum alpha (default 1e-3)
`alp.max`	Maximum alpha (default 1e3)
`...`	Additional arguments (ignored)

Details

Automatically chooses best implementation:

NLopt C++: When no covariates. 50-90x faster using L-BFGS-B with analytical gradients. Scientifically equivalent to R (r > 0.999).
R: When covariates needed or explicitly requested. Supports full covariate modeling.

Requires NLopt library for high-performance mode:

macOS: brew install nlopt
Linux: sudo apt-get install libnlopt-dev

Refer to https://github.com/zhouhj1994/BMDD for detailed examples about zero imputation and posterior sample generation.

Value

A list containing:

`gamma`	Estimated gamma parameters (bimodality indicators)
`pi`	Estimated pi parameters (mixing proportions)
`beta`	Estimated posterior Dirichlet parameters
`alpha0`	Estimated alpha0 parameters (mode 0)
`alpha1`	Estimated alpha1 parameters (mode 1)
`converge`	Logical indicating convergence
`iter`	Number of iterations
`method`	Method used: "nlopt" or "R"

References

Zhou, H., Chen, J., & Zhang, X. (2025). BMDD: A probabilistic framework for accurate imputation of zero-inflated microbiome sequencing data. PLoOS Computational Biology, 21(10), e1013124.

Examples

## Not run: 
# Simulate data
m <- 100  # taxa
n <- 50   # samples
W <- matrix(rpois(m*n, 100), m, n)

# Auto-select method (uses NLopt for speed)
result <- bmdd(W, type = "count")

# Access results
head(result$beta)    # Posterior parameters
head(result$gamma)   # Bimodality indicators
result$method        # Shows which method was used

# Force specific method
result_nlopt <- bmdd(W, method = "nlopt")  # Force NLopt
result_r <- bmdd(W, method = "R")          # Force R

# With covariates (automatically uses R)
Z <- matrix(rnorm(m * 2), m, 2)
result_cov <- bmdd(W, Z = Z, alp.eta = TRUE)

## End(Not run)

MicrobiomeStat documentation built on Jan. 9, 2026, 1:07 a.m.