ancombc2  R Documentation 
Determine taxa whose absolute abundances, per unit volume, of
the ecosystem (e.g., gut) are significantly different with changes in the
covariate of interest (e.g., group). The current version of
ancombc2
function implements Analysis of Compositions of Microbiomes
with Bias Correction (ANCOMBC2) in crosssectional and repeated measurements
data. In addition to the twogroup comparison, ANCOMBC2 also supports
testing for continuous covariates and multigroup comparisons,
including the global test, pairwise directional test, Dunnett's type of
test, and trend test.
ancombc2( data, assay_name = "counts", tax_level = NULL, fix_formula, rand_formula = NULL, p_adj_method = "holm", pseudo = 0, pseudo_sens = TRUE, prv_cut = 0.1, lib_cut = 0, s0_perc = 0.05, group = NULL, struc_zero = FALSE, neg_lb = FALSE, alpha = 0.05, n_cl = 1, verbose = FALSE, global = FALSE, pairwise = FALSE, dunnet = FALSE, trend = FALSE, iter_control = list(tol = 0.01, max_iter = 20, verbose = FALSE), em_control = list(tol = 1e05, max_iter = 100), lme_control = lme4::lmerControl(), mdfdr_control = list(fwer_ctrl_method = "holm", B = 100), trend_control = list(contrast = NULL, node = NULL, solver = "ECOS", B = 100) )
data 
the input data. A

assay_name 
character. Name of the count table in the data object
(only applicable if data object is a 
tax_level 
character. The taxonomic level of interest. The input data
can be agglomerated at different taxonomic levels based on your research
interest. Default is NULL, i.e., do not perform agglomeration, and the
ANCOMBC2 anlysis will be performed at the lowest taxonomic level of the
input 
fix_formula 
the character string expresses how the microbial absolute abundances for each taxon depend on the fixed effects in metadata. 
rand_formula 
the character string expresses how the microbial absolute
abundances for each taxon depend on the random effects in metadata. ANCOMBC2
follows the 
p_adj_method 
character. method to adjust pvalues. Default is "holm".
Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
"fdr", "none". See 
pseudo 
numeric. Add pseudocounts to the data. Default is 0 (no pseudocount addition). 
pseudo_sens 
logical. Whether to perform the sensitivity analysis to
the pseudocount addition. Default is 
prv_cut 
a numerical fraction between 0 and 1. Taxa with prevalences
less than 
lib_cut 
a numerical threshold for filtering samples based on library
sizes. Samples with library sizes less than 
s0_perc 
a numerical fraction between 0 and 1. Inspired by
Significance
Analysis of Microarrays (SAM) methodology, a small positive constant is
added to the denominator of ANCOMBC2 test statistic corresponding to
each taxon to avoid the significance due to extremely small standard errors,
especially for rare taxa. This small positive constant is chosen as

group 
character. The name of the group variable in metadata.

struc_zero 
logical. Whether to detect structural zeros based on

neg_lb 
logical. Whether to classify a taxon as a structural zero using its asymptotic lower bound. Default is FALSE. 
alpha 
numeric. Level of significance. Default is 0.05. 
n_cl 
numeric. The number of nodes to be forked. For details, see

verbose 
logical. Whether to generate verbose output during the ANCOMBC2 fitting process. Default is FALSE. 
global 
logical. Whether to perform the global test. Default is FALSE. 
pairwise 
logical. Whether to perform the pairwise directional test. Default is FALSE. 
dunnet 
logical. Whether to perform the Dunnett's type of test. Default is FALSE. 
trend 
logical. Whether to perform trend test. Default is FALSE. 
iter_control 
a named list of control parameters for the iterative
MLE or RMEL algorithm, including 1) 
em_control 
a named list of control parameters for the EM algorithm,
including 1) 
lme_control 
a list of control parameters for mixed model fitting.
See 
mdfdr_control 
a named list of control parameters for mixed directional
false discover rate (mdFDR), including 1) 
trend_control 
a named list of control parameters for the trend test,
including 1) 
A taxon is considered to have structural zeros in some (>=1)
groups if it is completely (or nearly completely) missing in these groups.
For instance, suppose there are three groups: g1, g2, and g3.
If the counts of taxon A in g1 are 0 but nonzero in g2 and g3,
then taxon A will be considered to contain structural zeros in g1.
In this example, taxon A is declared to be differentially abundant between
g1 and g2, g1 and g3, and consequently, it is globally differentially
abundant with respect to this group variable.
Such taxa are not further analyzed using ANCOMBC2, but the results are
summarized in the overall summary. For more details about the structural
zeros, please go to the
ANCOMII paper.
Setting neg_lb = TRUE
indicates that you are using both criteria
stated in section 3.2 of
ANCOMII
to detect structural zeros; otherwise, the algorithm will only use the
equation 1 in section 3.2 for declaring structural zeros. Generally, it is
recommended to set neg_lb = TRUE
when the sample size per group is
relatively large (e.g. > 30).
Like other differential abundance analysis methods, ANCOMBC2 log transforms the observed counts. However, to deal with zero counts, a pseudocount is added before the log transformation. Several studies have shown that differential abundance results could be sensitive to the choice of pseudocount (Costea et al. (2014); Paulson, Bravo, and Pop (2014)), resulting in an inflated false positive rate. To avoid such false positives, we conduct a sensitivity analysis and provide a sensitivity score for each taxon to determine if a particular taxon is sensitive to the choice of pseudocount. The larger the score, the more likely the significant result is a false positive.
When performning pairwise directional (or Dunnett's type of) test, the mixed directional false discover rate (mdFDR) should be taken into account. The mdFDR is the combination of false discovery rate due to multiple testing, multiple pairwise comparisons, and directional tests within each pairwise comparison. For example, suppose we have five taxa and three experimental groups: g1, g2, and g3. Thus, we are performing five tests corresponding to five taxa. For each taxon, we are also conducting three pairwise comparisons (g1 vs. g2, g2 vs. g3, and g1 vs. g3). Within each pairwise comparison, we wish to determine if the abundance has increased or decreased or did not change (direction of the effect size). Errors could occur in each step. The overall false discovery rate is controlled by the mdFDR methodology we adopted from Guo, Sarkar, and Peddada (2010) and Grandhi, Guo, and Peddada (2016).
a list
with components:
feature_table
, a data.frame
of preprocessed
(based on prv_cut
and lib_cut
) microbial count table.
zero_ind
, a logical data.frame
with TRUE
indicating the taxon is detected to contain structural zeros in
some specific groups.
samp_frac
, a numeric vector of estimated sampling
fractions in log scale (natural log).
delta_em
, estimated samplespecific biases
through EM algorithm.
delta_wls
, estimated samplespecific biases through
weighted least squares (WLS) algorithm.
pseudo_sens_tab
, the results of sensitivity analysis
for the pseudocount addition.
res
, a data.frame
containing ANCOMBC2 primary
result:
columns started with lfc
: log fold changes
obtained from the ANCOMBC2 loglinear (natural log) model.
columns started with se
: standard errors (SEs) of
lfc
.
columns started with W
: test statistics.
W = lfc/se
.
columns started with p
: pvalues. Pvalues are
obtained from twosided Ztest using the test statistic W
.
columns started with q
: adjusted pvalues.
Adjusted pvalues are obtained by applying p_adj_method
to p
.
columns started with diff
: TRUE if the
taxon is significant (has q
less than alpha
).
res_global
, a data.frame
containing ANCOMBC2
global test result for the variable specified in group
,
each column is:
W
, test statistics.
p_val
, pvalues, which are obtained from twosided
Chisquare test using W
.
q_val
, adjusted pvalues. Adjusted pvalues are
obtained by applying p_adj_method
to p_val
.
diff_abn
, A logical vector. TRUE if the taxon has
q_val
less than alpha
.
res_pair
, a data.frame
containing ANCOMBC2
pairwise directional test result for the variable specified in
group
:
columns started with lfc
: log fold changes.
columns started with se
: standard errors (SEs).
columns started with W
: test statistics.
columns started with p
: pvalues.
columns started with q
: adjusted pvalues.
columns started with diff
: TRUE if the
taxon is significant (has q
less than alpha
).
res_dunn
, a data.frame
containing ANCOMBC2
Dunnett's type of test result for the variable specified in
group
:
columns started with lfc
: log fold changes.
columns started with se
: standard errors (SEs).
columns started with W
: test statistics.
columns started with p
: pvalues.
columns started with q
: adjusted pvalues.
columns started with diff
: TRUE if the
taxon is significant (has q
less than alpha
).
res_trend
, a data.frame
containing ANCOMBC2
trend test result for the variable specified in
group
:
columns started with lfc
: log fold changes.
columns started with se
: standard errors (SEs).
W
: test statistics.
p_val
: pvalues.
q_val
: adjusted pvalues.
diff_abn
: TRUE if the
taxon is significant (has q
less than alpha
).
Huang Lin
kaul2017analysisANCOMBC
\insertReflin2020analysisANCOMBC
\insertReftusher2001significanceANCOMBC
\insertRefcostea2014fairANCOMBC
\insertRefpaulson2014replyANCOMBC
\insertRefguo2010controllingANCOMBC
\insertRefgrandhi2016multipleANCOMBC
ancom
ancombc
#===========Build a TreeSummarizedExperiment Object from Scratch============= library(mia) # microbial count table otu_mat = matrix(sample(1:100, 100, replace = TRUE), nrow = 10, ncol = 10) rownames(otu_mat) = paste0("taxon", 1:nrow(otu_mat)) colnames(otu_mat) = paste0("sample", 1:ncol(otu_mat)) assays = SimpleList(counts = otu_mat) # sample metadata smd = data.frame(group = sample(LETTERS[1:4], size = 10, replace = TRUE), row.names = paste0("sample", 1:ncol(otu_mat)), stringsAsFactors = FALSE) smd = DataFrame(smd) # taxonomy table tax_tab = matrix(sample(letters, 70, replace = TRUE), nrow = nrow(otu_mat), ncol = 7) rownames(tax_tab) = rownames(otu_mat) colnames(tax_tab) = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species") tax_tab = DataFrame(tax_tab) # create TSE tse = TreeSummarizedExperiment(assays = assays, colData = smd, rowData = tax_tab) # convert TSE to phyloseq pseq = makePhyloseqFromTreeSummarizedExperiment(tse) #=======================Run ANCOMBC2 Using a Real Data======================= library(ANCOMBC) data(dietswap) colData(dietswap)$bmi_group = factor(colData(dietswap)$bmi_group, levels = c("obese", "overweight", "lean")) set.seed(123) # Note that setting pseudo_sens = FALSE, max_iter = 1, and B = 1 is # only for the sake of speed # Set pseudo_sens = TRUE, and use default or larger values for max_iter and B # for better performance out = ancombc2(data = dietswap, assay_name = "counts", tax_level = "Phylum", fix_formula = "nationality + timepoint + bmi_group", rand_formula = "(timepoint  subject)", p_adj_method = "holm", pseudo = 0, pseudo_sens = FALSE, prv_cut = 0.10, lib_cut = 1000, s0_perc = 0.05, group = "bmi_group", struc_zero = TRUE, neg_lb = TRUE, alpha = 0.05, n_cl = 1, verbose = TRUE, global = TRUE, pairwise = TRUE, dunnet = TRUE, trend = TRUE, iter_control = list(tol = 1e2, max_iter = 1, verbose = TRUE), em_control = list(tol = 1e5, max_iter = 1), lme_control = lme4::lmerControl(), mdfdr_control = list(fwer_ctrl_method = "holm", B = 1), trend_control = list(contrast = list(matrix(c(1, 0, 1, 1), nrow = 2, byrow = TRUE)), node = list(2), solver = "ECOS", B = 1)) res_prim = out$res res_global = out$res_global res_pair = out$res_pair res_dunn = out$res_dunn res_trend = out$res_trend
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.