linda: Linear (Lin) model for differential abundance (DA) analysis
In zhouhj1994/LinDA: Linear model for differential abundance analysis

View source: R/linda.R

linda

R Documentation

Linear (Lin) model for differential abundance (DA) analysis

Description

The function implements a simple, robust and highly scalable approach to tackle the compositional effects in differential abundance analysis. It fits linear regression models on the centered log2-ratio transformed data, identifies a bias term due to the transformation and compositional effect, and corrects the bias using the mode of the regression coefficients. It could fit mixed-effect models.

Usage

linda(
  otu.tab,
  meta,
  formula,
  type = "count",
  adaptive = TRUE,
  imputation = FALSE,
  pseudo.cnt = 0.5,
  corr.cut = 0.1,
  p.adj.method = "BH",
  alpha = 0.05,
  prev.cut = 0,
  lib.cut = 1,
  winsor.quan = NULL,
  n.cores = 1
)

Arguments

`otu.tab`	data frame or matrix representing observed OTU table. Row: taxa; column: samples. NAs are not expected in OTU tables so are not allowed in function `linda`.
`meta`	data frame of covariates. The rows of `meta` correspond to the columns of `otu.tab`. NAs are allowed. If there are NAs, the corresponding samples will be removed in the analysis.
`formula`	character. For example: `formula = '~x1*x2+x3+(1\|id)'`. At least one fixed effect is required.
`type`	character; the type of `otu.tab`. It could be "count" or "proportion". The default is "count".
`adaptive`	TRUE or FALSE. The default is TRUE. If TRUE, the parameter `imputation` will be treated as FALSE no matter what it is actually set to be. Then the significant correlations between the sequencing depth and explanatory variables will be tested via the linear regression between the log of the sequencing depths and `formula`. If any p-value is smaller than or equal to `corr.cut`, the imputation approach will be used; otherwise, the pseudo-count approach will be used. The information of whether the imputation or pseudo-count approach is used will be printed.
`imputation`	TRUE or FALSE. The default is FALSE. If TRUE, then we use the imputation approach, i.e., zeros in `otu.tab` will be imputed using the formula in the referenced paper.
`pseudo.cnt`	a positive real value. The default is 0.5. If `adaptive` and `imputation` are both FALSE, then we use the pseudo-count approach, i.e., we add `pseudo.cnt` to each value in `otu.tab`.
`corr.cut`	a real value between 0 and 1; significance level of correlations between the sequencing depth and explanatory variables. The default is 0.1.
`p.adj.method`	character; p-value adjusting approach. See R function `p.adjust`. The default is 'BH'.
`alpha`	a real value between 0 and 1; significance level of differential abundance. The default is 0.05.
`prev.cut`	a real value between 0 and 1; taxa with prevalence (percentage of nonzeros) less than prev.cut are excluded. The default is 0 (no taxa will be excluded).
`lib.cut`	a non-negative real value; samples with less than `lib.cut` read counts are excluded. The default is 1 (no samples will be excluded).
`winsor.quan`	a real value between 0 and 1; winsorization cutoff (quantile) for `otu.tab`, e.g., 0.97. The default is NULL. If NULL, winsorization process will not be conducted.
`n.cores`	a positive integer. If `n.cores > 1` and formula is in a form of mixed-effect model, `n.cores` parallels will be conducted. The default is 1.

Value

A list with the elements

`variables`	A vector of variable names of all fixed effects in `formula`. For example: `formula = '~x1*x2+x3+(1\|id)'`. Suppose `x1` and `x2` are numerical, and `x3` is a categorical variable of three levels: a, b and c. Then the elements of `variables` would be `('x1', 'x2', 'x3b', 'x3c', 'x1:x2')`.
`bias`	numeric vector; each element corresponds to one variable in `variables`; the estimated bias of the regression coefficients due to the compositional effect.
`output`	a list of data frames with columns 'baseMean', 'log2FoldChange', 'lfcSE', 'stat', 'pvalue', 'padj', 'reject', 'df'; `names(output)` is equal to `variables`; the rows of the data frame corresponds to taxa. Note: if there are taxa being excluded due to `prev.cut`, the number of the rows of the output data frame will be not equal to the number of the rows of `otu.tab`. Taxa are identified by the rownames. If the rownames of `otu.tab` are NULL, then `1 : nrow(otu.tab)` is set as the rownames of `otu.tab`. baseMean: 2 to the power of the intercept coefficients (normalized by one million) log2FoldChange: bias-corrected coefficients lfcSE: standard errors of the coefficients stat: `log2FoldChange / lfcSE` pvalue: `2 * pt(-abs(stat), df)` padj: `p.adjust(pvalue, method = p.adj.method)` reject: `padj <= alpha` df: degrees of freedom. The number of samples minus the number of explanatory variables (intercept included) for fixed-effect models; estimates from R package `lmerTest` with Satterthwaite method of approximation for mixed-effect models.
`covariance`	a list of data frames; the data frame records the covariances between a certain regression coefficient and other coefficients; `names(covariance)` is equal to `variables`; the rows of the data frame corresponds to taxa. If the length of `variables` is equal to 1, then the `covariance` is NULL.
`otu.tab.use`	the OTU table used in the abundance analysis (the `otu.tab` after the preprocessing: samples that have NAs in the variables in `formula` or have less than `lib.cut` read counts are removed; taxa with prevalence less than `prev.cut` are removed and data is winsorized if `!is.null(winsor.quan)`; and zeros are treated, i.e., imputed or pseudo-count added).
`meta.use`	the meta data used in the abundance analysis (only variables in `formula` are stored; samples that have NAs or have less than `lib.cut` read counts are removed; numerical variables are scaled).
`wald`	a list for use in Wald test. If the fitting model is a linear model, then it includes beta: a matrix of the biased regression coefficients including intercept and all fixed effects; the culumns correspond to taxa sig: the standard errors; the elements corresponding to taxa X: the design matrix of the fitting model bias: the estimated biases of the regression coefficients including intercept and all fixed effects If the fitting model is a linear mixed-effect model, then it includes beta: a matrix of the biased regression coefficients including intercept and all fixed effects; the culumns correspond to taxa beta.cov: a list of covariance matrices of `beta`; the elements corresponding to taxa rand.cov: a list with covariance matrices of variance parameters of random effects; the elements corresponding to taxa; see more details in the paper of 'lmerTest' Joc.beta.cov.rand: a list of a list of Jacobian matrices of `beta.cov` with respect to the variance parameters; the elements corresponding to taxa bias: the estimated biases of the regression coefficients including intercept and all fixed effects

Author(s)

Huijuan Zhou huijuanzhou2019@gmail.com Jun Chen Chen.Jun2@mayo.edu Xianyang Zhang zhangxiany@stat.tamu.edu

References

Huijuan Zhou, Kejun He, Jun Chen, and Xianyang Zhang. LinDA: Linear Models for Differential Abundance Analysis of Microbiome Compositional Data.

Examples


#install package "phyloseq" for importing "smokers" dataset
ind <- smokers$meta$AIRWAYSITE == 'Throat'
otu.tab <- as.data.frame(smokers$otu[, ind])
meta <- cbind.data.frame(Smoke = factor(smokers$meta$SMOKER[ind]),
                         Sex = factor(smokers$meta$SEX[ind]),
                         Site = factor(smokers$meta$SIDEOFBODY[ind]),
                         SubjectID = factor(smokers$meta$HOST_SUBJECT_ID[ind]))
ind1 <- which(meta$Site == 'Left')
res.left <- linda(otu.tab[, ind1], meta[ind1, ], formula = '~Smoke+Sex', alpha = 0.1,
                  prev.cut = 0.1, lib.cut = 1000, winsor.quan = 0.97)
ind2 <- which(meta$Site == 'Right')
res.right <- linda(otu.tab[, ind2], meta[ind2, ], formula = '~Smoke+Sex', alpha = 0.1,
                  prev.cut = 0.1, lib.cut = 1000, winsor.quan = 0.97)
rownames(res.left$output[[1]])[which(res.left$output[[1]]$reject)]
rownames(res.right$output[[1]])[which(res.right$output[[1]]$reject)]

linda.obj <- linda(otu.tab, meta, formula = '~Smoke+Sex+(1|SubjectID)', alpha = 0.1,
                   prev.cut = 0.1, lib.cut = 1000, winsor.quan = 0.97)
linda.plot(linda.obj, c('Smokey', 'Sexmale'),
           titles = c('Smoke: n v.s. y', 'Sex: female v.s. male'), alpha = 0.1, lfc.cut = 1,
           legend = TRUE, directory = NULL, width = 11, height = 8)

zhouhj1994/LinDA documentation built on Jan. 30, 2024, 5:54 p.m.