# fitZIBB: The main function to fit ZIBB model In ZIBBSeqDiscovery: Zero-Inflated Beta-Binomial Modeling of Microbiome Count Data

## Description

We use zero-inflated beta-binomial (ZIBB) to account for overdispersion and excessive zeros in the microbiome count data. The parameter estimation method is maximum likelihood. Two approaches are proposed to estimate the overdispersion parameters: free approach and constrained approach. For free approach, user does not need to provide initial values for unknown parameters because our program will come up with naive initial values automatically. For constrained approach, user should provide the estimations from free approach as the initial values. This function gives the estimations of the parameters, as well as the corresponding p values, which can be used to test the association between the composition of microbiome counts data and the interested covariates.

## Usage

 1 2 3 4 fitZIBB(dataMatrix, X, ziMatrix, mode = "free", gn = 3, betastart = matrix(NA, 0, 0), psi.start = vector(mode = "numeric", length = 0), eta.start = matrix(NA, 0, 0))

## Arguments

 dataMatrix The count matrix (m by n, m is the number of OTUs and n is the number of samples). X The design matrix (n by p, p is the number of covariates) for the count model (e.g., beta-binomial), and intercept is included. The second column is assumed to be the covariate of interest. ziMatrix The design matrix (n by q, q is the number of covariates) for the zero model, and intercept is included. mode Indicates which approach is used to estimate overdispersion parameters. mode can be set as either "free" or "constrained". gn In constrained approach, we use a polynomial with degree of freedom gn to fit the mean-overdispersion relationship. The default value for gn is 3. Note that gn is only valid when mode = "constrained". betastart Initial values for beta estimation matrix (p by m), where beta are the effects (or coefficients) for the count model. betastart is required only in constrained approach, and it should be assigned as the beta estimation matrix from free approach. psi.start Initial values for the logit of overdispersion parameters vector (with length m). psi.start is required only in constrained approach, and it should be assigned as the psi estimation vector from free approach. eta.start Initial values for eta estimation matrix (q by m), where eta are the effects (or coefficients) for the zero model. eta.start is required only in constrained approach, and it should be assigned as the eta estimation matrix from free approach.

## Details

In this package, we refer the covariate of interest as phenotype (only one phenotype is assumed currently, thouugh we can extend it to include multiple phenotypes), and the phenotype is assumed to corresonding to the second column of the degisn matrix X (note that the first column corresonds to the intercept). Assuming the parameters corresonding to the phenotype are \{β_{1i}\}_{i=1,…,m}, this function tests the null hypothesis H_0: β_{1i} = 0 for each OTU i.

## Value

 betahat Estimation matrix of beta (p by m) for count model. bvar Estimation matrix of the variance of estimated betahat (p by m). p The vector (with length of m) of p values corresponding to the phenotype (aka, covariate of interest). In this package, we assume it corresponds to the second column of design matrix X (because intercept is included), though you can always include multiple phenotypes or other covariates. The i'th p value corresponds to the hypothesis test of H_0: β_{1i} = 0 for OTU i psi Estimation vector of the logit of the overdispersion parameters (with length m). zeroCoef Estimation matrix of eta (q by m) for zero model. gamma Estimation vector of the coefficients in the mean-overdispersion relationship in constrained approach (with length gn+1). So gamma is only available when mode="constrained".

## Author(s)

Tao Hu, Yihui Zhou

## Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ## Load the data ## data.Y is a count matrix with 100 OTUs and 20 samples randomly selected ## from kostic data data(data.Y) ## set random seed set.seed(1) ## construct design matrix for count model ## data.X is a 20-by-2 matrix, phenotype is group, and the first 10 samples ## come from group 1 and the rest samples come from group 2 data.X <- matrix(c(rep(1, 20), rep(0,10), rep(1, 10)), 20, 2) ## construct design matrix for zero model ## data.ziMatrix is a 20-by-2 matrix, the covariate is log of library size data.ziMatrix <- matrix(1, 20, 2) data.ziMatrix[, 2] <- log(colSums(data.Y)) ## fit ZIBB with free approach out.free <- fitZIBB(data.Y, data.X, data.ziMatrix, mode = "free") ## fit ZIBB with constrained approach out.constrained <- fitZIBB(data.Y, data.X, data.ziMatrix, mode = "constrained", gn = 3, betastart = out.free\$betahat, psi.start = out.free\$psi, eta.start = out.free\$zeroCoef) ## print OTUs which has p values smaller than 0.05 out.constrained\$p[which(out.constrained\$p < 0.05)]

ZIBBSeqDiscovery documentation built on May 2, 2019, 4:21 a.m.