sim_plnm: Simulate Microbial Absolute Abundance Data by Poisson...

View source: R/sim_data.R

sim_plnmR Documentation

Simulate Microbial Absolute Abundance Data by Poisson lognormal (PLN) model Based on a Real Dataset

Description

Generate microbial absolute abundances using the Poisson lognormal (PLN) model based on the mechanism described in the LDM paper (supplementary text S2).

Usage

sim_plnm(abn_table, taxa_are_rows = TRUE, prv_cut = 0.1, n, lib_mean, disp)

Arguments

abn_table

the input microbial count table. It is used to obtain the estimated variance-covariance matrix, can be in either matrix or data.frame format.

taxa_are_rows

logical. TRUE if the input dataset has rows represent taxa. Default is TRUE.

prv_cut

a numerical fraction between 0 and 1. Taxa with prevalences less than prv_cut will be excluded in the analysis. For instance, suppose there are 100 samples, if a taxon has nonzero counts presented in less than 10 samples, it will not be further analyzed. Default is 0.10.

n

numeric. The desired sample size for the simulated data.

lib_mean

numeric. Mean of the library size. Library sizes are generated from the negative binomial distribution with parameters lib_mean and disp. For details, see ?rnbinom.

disp

numeric. The dispersion parameter for the library size. For details, see ?rnbinom.

Details

The PLN model relates the abundance vector with a Gaussian latent vector. Because of the presence of a latent layer, the PLN model displays a larger variance than the Poisson model (over-dispersion). Also, the covariance (correlation) between abundances has the same sign as the covariance (correlation) between the corresponding latent variables. This property gives enormous flexibility in modeling the variance-covariance structure of microbial abundances since it is easy to specify different variance-covariance matrices in the multivariate Gaussian distribution.

However, instead of manually specifying the variance-covariance matrix, we choose to estimate the variance-covariance matrix from a real dataset, which will make the simulated data more resemble real data.

Value

a matrix of microbial absolute abundances, where taxa are in rows and samples are in columns.

Author(s)

Huang Lin

References

\insertRef

hu2020testingANCOMBC

Examples

library(ANCOMBC)
data(QMP)
abn_data = sim_plnm(abn_table = QMP, taxa_are_rows = FALSE, prv_cut = 0.05,
                    n = 100, lib_mean = 1e8, disp = 0.5)
rownames(abn_data) = paste0("Taxon", seq_len(nrow(abn_data)))
colnames(abn_data) = paste0("Sample", seq_len(ncol(abn_data)))


FrederickHuangLin/ANCOMBC documentation built on Oct. 22, 2024, 3:11 a.m.