bmdd: Bimodal Dirichlet Distribution Estimation

View source: R/bmdd_wrapper.R

bmddR Documentation

Bimodal Dirichlet Distribution Estimation

Description

Estimates parameters of the bimodal Dirichlet distribution using a variational EM algorithm. Automatically selects the optimal implementation: high-performance C++ (NLopt) when possible, or R when covariates are needed.

Usage

bmdd(W, type = c("count", "proportion"),
     Z = NULL, formula.Z = NULL, U = NULL, formula.U = NULL,
     Z.standardizing = TRUE, U.standardizing = TRUE,
     alp.eta = FALSE, alp.kap = FALSE,
     pi.xi = FALSE, pi.zeta = FALSE,
     para.alp.init = NULL, para.pi.init = NULL, gam.init = NULL,
     iterlim = 500, tol = 1e-6, trace = FALSE,
     method = c("auto", "nlopt", "R"),
     inner.loop = TRUE, inner.iterlim = 20, inner.tol = 1e-6,
     alp.iterlim = 100, alp.tol = 1e-6,
     alp.min = 1e-3, alp.max = 1e3, ...)

Arguments

W

Matrix of observed data (m taxa × n samples)

type

Data type: "count" or "proportion"

Z

Optional matrix of covariates for alpha (forces R implementation)

formula.Z

Optional formula for Z covariates

U

Optional matrix of covariates for pi (forces R implementation)

formula.U

Optional formula for U covariates

Z.standardizing

Standardize Z covariates (default TRUE)

U.standardizing

Standardize U covariates (default TRUE)

alp.eta

Model alpha0 as function of Z (default FALSE)

alp.kap

Model alpha1 as function of Z (default FALSE)

pi.xi

Model pi as function of U (default FALSE)

pi.zeta

Model pi variance as function of U (default FALSE)

para.alp.init

Initial alpha values

para.pi.init

Initial pi values

gam.init

Initial gamma values

iterlim

Maximum iterations (default 500)

tol

Convergence tolerance (default 1e-6)

trace

Print progress (default FALSE)

method

Force method: "auto", "nlopt", or "R" (default "auto")

inner.loop

Use inner loop for NLopt (default TRUE)

inner.iterlim

Inner loop max iterations (default 20)

inner.tol

Inner loop tolerance (default 1e-6)

alp.iterlim

Alpha optimization iterations (default 100)

alp.tol

Alpha tolerance (default 1e-6)

alp.min

Minimum alpha (default 1e-3)

alp.max

Maximum alpha (default 1e3)

...

Additional arguments (ignored)

Details

Automatically chooses best implementation:

  • NLopt C++: When no covariates. 50-90x faster using L-BFGS-B with analytical gradients. Scientifically equivalent to R (r > 0.999).

  • R: When covariates needed or explicitly requested. Supports full covariate modeling.

Requires NLopt library for high-performance mode:

  • macOS: brew install nlopt

  • Linux: sudo apt-get install libnlopt-dev

Refer to https://github.com/zhouhj1994/BMDD for detailed examples about zero imputation and posterior sample generation.

Value

A list containing:

gamma

Estimated gamma parameters (bimodality indicators)

pi

Estimated pi parameters (mixing proportions)

beta

Estimated posterior Dirichlet parameters

alpha0

Estimated alpha0 parameters (mode 0)

alpha1

Estimated alpha1 parameters (mode 1)

converge

Logical indicating convergence

iter

Number of iterations

method

Method used: "nlopt" or "R"

References

Zhou, H., Chen, J., & Zhang, X. (2025). BMDD: A probabilistic framework for accurate imputation of zero-inflated microbiome sequencing data. PLoOS Computational Biology, 21(10), e1013124.

Examples

## Not run: 
# Simulate data
m <- 100  # taxa
n <- 50   # samples
W <- matrix(rpois(m*n, 100), m, n)

# Auto-select method (uses NLopt for speed)
result <- bmdd(W, type = "count")

# Access results
head(result$beta)    # Posterior parameters
head(result$gamma)   # Bimodality indicators
result$method        # Shows which method was used

# Force specific method
result_nlopt <- bmdd(W, method = "nlopt")  # Force NLopt
result_r <- bmdd(W, method = "R")          # Force R

# With covariates (automatically uses R)
Z <- matrix(rnorm(m * 2), m, 2)
result_cov <- bmdd(W, Z = Z, alp.eta = TRUE)

## End(Not run)

MicrobiomeStat documentation built on Jan. 9, 2026, 1:07 a.m.