sim_plnm: Simulate Microbial Absolute Abundance Data by Poisson...

View source: R/sim_data.R

sim_plnmR Documentation

Simulate Microbial Absolute Abundance Data by Poisson lognormal (PLN) model Based on a Real Dataset


Generate microbial absolute abundances using the Poisson lognormal (PLN) model based on the mechanism described in the LDM paper (supplementary text S2).


sim_plnm(abn_table, taxa_are_rows = TRUE, prv_cut = 0.1, n, lib_mean, disp)



the input microbial count table. It is used to obtain the estimated variance-covariance matrix, can be in either matrix or data.frame format.


logical. TRUE if the input dataset has rows represent taxa. Default is TRUE.


a numerical fraction between 0 and 1. Taxa with prevalences less than prv_cut will be excluded in the analysis. For instance, suppose there are 100 samples, if a taxon has nonzero counts presented in less than 10 samples, it will not be further analyzed. Default is 0.10.


numeric. The desired sample size for the simulated data.


numeric. Mean of the library size. Library sizes are generated from the negative binomial distribution with parameters lib_mean and disp. For details, see ?rnbinom.


numeric. The dispersion parameter for the library size. For details, see ?rnbinom.


The PLN model relates the abundance vector with a Gaussian latent vector. Because of the presence of a latent layer, the PLN model displays a larger variance than the Poisson model (over-dispersion). Also, the covariance (correlation) between abundances has the same sign as the covariance (correlation) between the corresponding latent variables. This property gives enormous flexibility in modeling the variance-covariance structure of microbial abundances since it is easy to specify different variance-covariance matrices in the multivariate Gaussian distribution.

However, instead of manually specifying the variance-covariance matrix, we choose to estimate the variance-covariance matrix from a real dataset, which will make the simulated data more resemble real data.


a matrix of microbial absolute abundances, where taxa are in rows and samples are in columns.


Huang Lin





abn_data = sim_plnm(abn_table = QMP, taxa_are_rows = FALSE, prv_cut = 0.05,
                    n = 100, lib_mean = 1e8, disp = 0.5)
rownames(abn_data) = paste0("Taxon", seq_len(nrow(abn_data)))
colnames(abn_data) = paste0("Sample", seq_len(ncol(abn_data)))

FrederickHuangLin/ANCOMBC documentation built on Feb. 23, 2023, 11:13 p.m.