options(width=999,
        rmarkdown.html_vignette.check_title = FALSE)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

```{css, echo=FALSE} body .main-container { max-width: 1280px !important; width: 1280px !important; } body { max-width: 1280px !important; }

# Overview

The ZINQ package (v2.0) contains functions that conduct association testing between individual taxa of microbiome and clinical variable(s). The microbiome data can be unnormalized
or normalized by any existing method, such as rarefaction, TSS, CSS, etc.. The clinical variable(s) can be a single binary, multi-class discrete or continuous variable, or a set
of variables consisting of any type of the aformentioned variables. Both unadjusted and adjusted tests can be implemented. 

In addition, the ZINQ package (v2.0) updates the logistic component to Firth logistic regression, in order to mitigate the bias when the presence-absence status of taxa is imbalanced across the different groups of the interested clinical variable(s).

The following packages are required for functions and examples in the ZINQ package: quantreg, MASS, logistf, all are available on CRAN.

```r
library(ZINQ)

Implementation of ZINQ test

Sanity check

It is recommended to do sanity check using ZINQ_check before applying ZINQ. The inputs are the un-normalized taxa read count table and metadata. The function will print warnings when

Type warnings() after the sanity check to read the warnings. The warnings are recommendations for applying ZINQ to the data.

The sanity check is mainly about zero inflation. Most normalization methods will keep the original zeroes, thus investigating the un-normalized taxa read count table provides sufficient clues to use ZINQ. ZINQ will also return singularity errors when too many covariates are specified, as in other regression frameworks. For more details, refer to the .pdf manual.

Sample data

We will use the sample data in the package to demonstrate the ZINQ test. The data contains normalized abundance of two taxa: rarefied abundance of taxon 1 and CSS normalized abundance of taxon 2. Also, it contains a binary clinical variable, which is of interest, and three continuous covariates for adjustment. In total, there are 531 subject. We want to determine whether the two taxa are associated with the binary clinical variable.

data(Sample_Data)
summary(Sample_Data)

Use of ZINQ_tests

We first use ZINQ_tests to conduct marginal tests in the Firth logistic and quantile regression components, for each of the two taxa.

covariates = Sample_Data[, -c(1:2)]

result = vector(mode = "list", length = 2)

dat = cbind(Y=Sample_Data[, 1], covariates)
result[[1]] = ZINQ_tests(formula.logistic=Y~X+Z1+Z2+Z3, formula.quantile=Y~X+Z1+Z2+Z3, C="X", y_CorD = "D", data=dat)

dat = cbind(Y=Sample_Data[, 2], covariates)
result[[2]] = ZINQ_tests(formula.logistic=Y~X+Z1+Z2+Z3, formula.quantile=Y~X+Z1+Z2+Z3, C="X", data=dat)

Use of ZINQ_combination

Next, we use the output from ZINQ_tests as the input of ZINQ_combination to obtain the final p-values.

ZINQ_combination(result[[1]], method="Cauchy", taus=c(0.25, 0.5, 0.75))
ZINQ_combination(result[[2]], method="MinP")

Characteristics of ZINQ test

ZINQ can detect higher-order associations beyond the simple mean association. The two taxa in the sample data have typical abundance profiles that highlight the power of ZINQ. Their stratified quantile functions according to the two conditions of the clinical variable form a spindle shape or cross with each other. We will use linear regression to show the inadequacy of mean-based methods for the two cases.

taus= seq(0.01, 0.99, by=0.01)

par(mfrow=c(1,2))

for (ii in 1:2){
  id1 = which(Sample_Data$X == 1)
  id0 = which(Sample_Data$X == 0)

  abundance1 = Sample_Data[id1, ii]
  abundance0 = Sample_Data[id0, ii]

  q1 = quantile(abundance1, taus)
  q0 = quantile(abundance0, taus)

  plot(taus, q1, type="l", main=names(Sample_Data)[ii], ylab="quantile", xlab="", ylim=c(0, max(q1, q0)))
  mtext(text=expression(tau), side=1, cex=1.5, line=3)
  lines(taus, q0, col=2)

  abline(h=mean(abundance1), lty=2)
  abline(h=mean(abundance0), lty=2, col=2)

  legend('topleft', c('quantile X=1', 'mean X=1', 'quantile X=0', 'mean X=0'), col=c(1, 1, 2, 2), lty=c(1, 2, 1, 2), bty='n')
}

dat = cbind(Y=Sample_Data[, 1], covariates)
summary(lm(Y~X+Z1+Z2+Z3, data=dat))

dat = cbind(Y=Sample_Data[, 2], covariates)
summary(lm(Y~X+Z1+Z2+Z3, data=dat))


wdl2459/ZINQ-v2 documentation built on March 25, 2024, 6:23 p.m.