SDA: Semi-parametric differential abuandance analysis

Description Usage Arguments Details Value Author(s) Examples

Description

This function considers a two-part semi-parametric model for metabolomics and proteomics data. A kernel-smoothed method is applied to estimate the regression coefficients. And likelihood ratio test is constructed for differential abundance analysis.

Usage

1
SDA(sumExp, VOI = NULL, ...)

Arguments

sumExp

An object of 'SummarizedExperiment' class.

VOI

Variable of interest. Default is NULL, when there is only one covariate, otherwise it must be one of the column names in colData.

...

Additional arguments passed to qvalue.

Details

The differential abundance analysis is to compare metabolomic or proteomic profiles between different experimental groups, which utilizes a two-part model: a logistic regression model to characterize the zero proportion and a semi-parametric model to characterize non-zero values. Let Y_ig be the random variable representing the abundance of feature g in subject i. This two-part model has the following form:

log(pi_ig/(1-pi_ig))=gamma_0g + gamma_g*X_i

log(Y_ig)=beta_g*X_i+ epsilon_ig

where pi_ig=Pr(Y_ig=0) be the probability of point mass, X_i=(X_i1, X_i2,..., X_iQ)^T is a Q-vector covariates that specifies the treatment conditions applied to subject i. The corresponding Q-vector of model parameters gamma_g=(gamma_1g, gamma_2g,...,gamma_Qg)^T quantify the covariates effects on the fraction of zero values for feature g and gamma_0g is the intercept. beta_g=(beta_1g, beta_2g,..., beta_Qg) ^T is a Q-vector of model parameters quantifying the covariates effects on the non-zero values for the feature. And epsilon_ig are independent error terms with a common but completely unspecified density function f_g.

Hypothesis testing on the effect of the qth covariate on the gth feature is performed by assessing gamma_qg and beta_qg. Consider the null hypothesis H_0: gamma_qg and beta_qg against alternative hypothesis H_1: at least one of the two parameters is non-zero. The p-value is calculated based on a chi-square distribution with 2 degrees of freedom. To adjust for multiple comparisons across features, the false discovery discovery rate (FDR) q-value is calculated based on the qvalue function in R/Bioconductor.

Value

A list containing the following components:

gamma

a vector of point estimators for gamma_g in the logistic model (binary part)

beta

a vector of point estimators for beta_g in the semi-parametric model (non-zero part)

pv_gamma

a vector of one-part p-values for gamma_g

pv_beta

a vector of one-part p-values for beta_g

qv_gamma

a vector of one-part q-values for gamma_g

qv_beta

a vector of one-part q-values for beta_g

pv_2part

a vector of two-part p-values for overall test

qv_2part

a vector of two-part q-values for overall test

feat.names

a vector of feature names

Author(s)

Yuntong Li <yuntong.li@uky.edu>, Chi Wang <chi.wang@uky.edu>, Li Chen <lichenuky@uky.edu>

Examples

1
2
3
4
5
6
7
##--------- load data ------------
data(exampleSumExp)

results = SDA(exampleSumExp)

##------ two part q-values -------
results$qv_2part

Stat-Li/SDAMS documentation built on May 26, 2019, 11:58 p.m.