options(width=999, rmarkdown.html_vignette.check_title = FALSE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
```{css, echo=FALSE} body .main-container { max-width: 1280px !important; width: 1280px !important; } body { max-width: 1280px !important; }
# Overview The ZINQ package (v2.0) contains functions that conduct association testing between individual taxa of microbiome and clinical variable(s). The microbiome data can be unnormalized or normalized by any existing method, such as rarefaction, TSS, CSS, etc.. The clinical variable(s) can be a single binary, multi-class discrete or continuous variable, or a set of variables consisting of any type of the aformentioned variables. Both unadjusted and adjusted tests can be implemented. In addition, the ZINQ package (v2.0) updates the logistic component to Firth logistic regression, in order to mitigate the bias when the presence-absence status of taxa is imbalanced across the different groups of the interested clinical variable(s). The following packages are required for functions and examples in the ZINQ package: quantreg, MASS, logistf, all are available on CRAN. ```r library(ZINQ)
It is recommended to do sanity check using ZINQ_check before applying ZINQ. The inputs are the un-normalized taxa read count table and metadata. The function will print warnings when
Type warnings() after the sanity check to read the warnings. The warnings are recommendations for applying ZINQ to the data.
The sanity check is mainly about zero inflation. Most normalization methods will keep the original zeroes, thus investigating the un-normalized taxa read count table provides sufficient clues to use ZINQ. ZINQ will also return singularity errors when too many covariates are specified, as in other regression frameworks. For more details, refer to the .pdf manual.
We will use the sample data in the package to demonstrate the ZINQ test. The data contains normalized abundance of two taxa: rarefied abundance of taxon 1 and CSS normalized abundance of taxon 2. Also, it contains a binary clinical variable, which is of interest, and three continuous covariates for adjustment. In total, there are 531 subject. We want to determine whether the two taxa are associated with the binary clinical variable.
data(Sample_Data) summary(Sample_Data)
We first use ZINQ_tests to conduct marginal tests in the Firth logistic and quantile regression components, for each of the two taxa.
covariates = Sample_Data[, -c(1:2)] result = vector(mode = "list", length = 2) dat = cbind(Y=Sample_Data[, 1], covariates) result[[1]] = ZINQ_tests(formula.logistic=Y~X+Z1+Z2+Z3, formula.quantile=Y~X+Z1+Z2+Z3, C="X", y_CorD = "D", data=dat) dat = cbind(Y=Sample_Data[, 2], covariates) result[[2]] = ZINQ_tests(formula.logistic=Y~X+Z1+Z2+Z3, formula.quantile=Y~X+Z1+Z2+Z3, C="X", data=dat)
Next, we use the output from ZINQ_tests as the input of ZINQ_combination to obtain the final p-values.
ZINQ_combination(result[[1]], method="Cauchy", taus=c(0.25, 0.5, 0.75)) ZINQ_combination(result[[2]], method="MinP")
ZINQ can detect higher-order associations beyond the simple mean association. The two taxa in the sample data have typical abundance profiles that highlight the power of ZINQ. Their stratified quantile functions according to the two conditions of the clinical variable form a spindle shape or cross with each other. We will use linear regression to show the inadequacy of mean-based methods for the two cases.
taus= seq(0.01, 0.99, by=0.01) par(mfrow=c(1,2)) for (ii in 1:2){ id1 = which(Sample_Data$X == 1) id0 = which(Sample_Data$X == 0) abundance1 = Sample_Data[id1, ii] abundance0 = Sample_Data[id0, ii] q1 = quantile(abundance1, taus) q0 = quantile(abundance0, taus) plot(taus, q1, type="l", main=names(Sample_Data)[ii], ylab="quantile", xlab="", ylim=c(0, max(q1, q0))) mtext(text=expression(tau), side=1, cex=1.5, line=3) lines(taus, q0, col=2) abline(h=mean(abundance1), lty=2) abline(h=mean(abundance0), lty=2, col=2) legend('topleft', c('quantile X=1', 'mean X=1', 'quantile X=0', 'mean X=0'), col=c(1, 1, 2, 2), lty=c(1, 2, 1, 2), bty='n') } dat = cbind(Y=Sample_Data[, 1], covariates) summary(lm(Y~X+Z1+Z2+Z3, data=dat)) dat = cbind(Y=Sample_Data[, 2], covariates) summary(lm(Y~X+Z1+Z2+Z3, data=dat))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.