ancombc2 | R Documentation |
Determine taxa whose absolute abundances, per unit volume, of
the ecosystem (e.g., gut) are significantly different with changes in the
covariate of interest (e.g., group). The current version of
ancombc2
function implements Analysis of Compositions of Microbiomes
with Bias Correction (ANCOM-BC2) in cross-sectional and repeated measurements
data. In addition to the two-group comparison, ANCOM-BC2 also supports
testing for continuous covariates and multi-group comparisons,
including the global test, pairwise directional test, Dunnett's type of
test, and trend test.
ancombc2(
data,
taxa_are_rows = TRUE,
assay.type = assay_name,
assay_name = "counts",
rank = tax_level,
tax_level = NULL,
aggregate_data = NULL,
meta_data = NULL,
fix_formula,
rand_formula = NULL,
p_adj_method = "holm",
pseudo = 0,
pseudo_sens = TRUE,
prv_cut = 0.1,
lib_cut = 0,
s0_perc = 0.05,
group = NULL,
struc_zero = FALSE,
neg_lb = FALSE,
alpha = 0.05,
n_cl = 1,
verbose = TRUE,
global = FALSE,
pairwise = FALSE,
dunnet = FALSE,
trend = FALSE,
iter_control = list(tol = 0.01, max_iter = 20, verbose = FALSE),
em_control = list(tol = 1e-05, max_iter = 100),
lme_control = lme4::lmerControl(),
mdfdr_control = list(fwer_ctrl_method = "holm", B = 100),
trend_control = list(contrast = NULL, node = NULL, solver = "ECOS", B = 100)
)
data |
the input data. The |
taxa_are_rows |
logical. Whether taxa are positioned in the rows of the feature table. Default is TRUE. |
assay.type |
alias for |
assay_name |
character. Name of the count table in the data object
(only applicable if data object is a |
rank |
alias for |
tax_level |
character. The taxonomic or non taxonomic(rowData) level of interest. The input data
can be analyzed at any taxonomic or rowData level without prior agglomeration.
Note that |
aggregate_data |
The abundance data that has been aggregated to the desired
taxonomic level. This parameter is required only when the input data is in
|
meta_data |
a |
fix_formula |
the character string expresses how the microbial absolute
abundances for each taxon depend on the fixed effects in metadata. When
specifying the |
rand_formula |
the character string expresses how the microbial absolute
abundances for each taxon depend on the random effects in metadata. ANCOM-BC2
follows the |
p_adj_method |
character. method to adjust p-values. Default is "holm".
Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
"fdr", "none". See |
pseudo |
numeric. Add pseudo-counts to the data. Please note that this option is not recommended in ANCOM-BC2. The software will utilize the complete data (nonzero counts) as its default analysis input. |
pseudo_sens |
logical. Whether to perform the sensitivity analysis to
the pseudo-count addition. Default is |
prv_cut |
a numerical fraction between 0 and 1. Taxa with prevalences
(the proportion of samples in which the taxon is present)
less than |
lib_cut |
a numerical threshold for filtering samples based on library
sizes. Samples with library sizes less than |
s0_perc |
a numerical fraction between 0 and 1. Inspired by
Significance
Analysis of Microarrays (SAM) methodology, a small positive constant is
added to the denominator of ANCOM-BC2 test statistic corresponding to
each taxon to avoid the significance due to extremely small standard errors,
especially for rare taxa. This small positive constant is chosen as
|
group |
character. the name of the group variable in metadata.
The |
struc_zero |
logical. Whether to detect structural zeros based on
|
neg_lb |
logical. Whether to classify a taxon as a structural zero using its asymptotic lower bound. Default is FALSE. |
alpha |
numeric. Level of significance. Default is 0.05. |
n_cl |
numeric. The number of nodes to be forked. For details, see
|
verbose |
logical. Whether to generate verbose output during the ANCOM-BC2 fitting process. Default is FALSE. |
global |
logical. Whether to perform the global test. Default is FALSE. |
pairwise |
logical. Whether to perform the pairwise directional test. Default is FALSE. |
dunnet |
logical. Whether to perform the Dunnett's type of test. Default is FALSE. |
trend |
logical. Whether to perform trend test. Default is FALSE. |
iter_control |
a named list of control parameters for the iterative
MLE or RMEL algorithm, including 1) |
em_control |
a named list of control parameters for the E-M algorithm,
including 1) |
lme_control |
a list of control parameters for mixed model fitting.
See |
mdfdr_control |
a named list of control parameters for mixed directional
false discover rate (mdFDR), including 1) |
trend_control |
a named list of control parameters for the trend test,
including 1) |
A taxon is considered to have structural zeros in some (>=1)
groups if it is completely (or nearly completely) missing in these groups.
For instance, suppose there are three groups: g1, g2, and g3.
If the counts of taxon A in g1 are 0 but nonzero in g2 and g3,
then taxon A will be considered to contain structural zeros in g1.
In this example, taxon A is declared to be differentially abundant between
g1 and g2, g1 and g3, and consequently, it is globally differentially
abundant with respect to this group variable.
Such taxa are not further analyzed using ANCOM-BC2, but the results are
summarized in the overall summary. For more details about the structural
zeros, please go to the
ANCOM-II paper.
Setting neg_lb = TRUE
indicates that you are using both criteria
stated in section 3.2 of
ANCOM-II
to detect structural zeros; otherwise, the algorithm will only use the
equation 1 in section 3.2 for declaring structural zeros. Generally, it is
recommended to set neg_lb = TRUE
when the sample size per group is
relatively large (e.g. > 30).
Like other differential abundance analysis methods, ANCOM-BC2 applies a log transformation to the observed counts. However, the presence of zero counts poses a challenge, and researchers often consider adding a pseudo-count before the log transformation. However, it has been shown that the choice of pseudo-count can impact the results and lead to an inflated false positive rate (Costea et al. (2014); Paulson, Bravo, and Pop (2014)). To address this issue, we conduct a sensitivity analysis to assess the impact of different pseudo-counts on zero counts for each taxon. This analysis involves adding a series of pseudo-counts (ranging from 0.01 to 0.5 in increments of 0.01) to the zero counts of each taxon. Linear regression models are then performed on the bias-corrected log abundance table using the different pseudo-counts. The sensitivity score for each taxon is calculated as the proportion of times that the p-value exceeds the specified significance level (alpha). If all p-values consistently show significance or nonsignificance across different pseudo-counts and are consistent with the results obtained without adding pseudo-counts to zero counts (using the default settings), then the taxon is considered not sensitive to the pseudo-count addition.
When performning pairwise directional (or Dunnett's type of) test, the mixed directional false discover rate (mdFDR) should be taken into account. The mdFDR is the combination of false discovery rate due to multiple testing, multiple pairwise comparisons, and directional tests within each pairwise comparison. For example, suppose we have five taxa and three experimental groups: g1, g2, and g3. Thus, we are performing five tests corresponding to five taxa. For each taxon, we are also conducting three pairwise comparisons (g1 vs. g2, g2 vs. g3, and g1 vs. g3). Within each pairwise comparison, we wish to determine if the abundance has increased or decreased or did not change (direction of the effect size). Errors could occur in each step. The overall false discovery rate is controlled by the mdFDR methodology we adopted from Guo, Sarkar, and Peddada (2010) and Grandhi, Guo, and Peddada (2016).
a list
with components:
feature_table
, a data.frame
of pre-processed
(based on prv_cut
and lib_cut
) microbial count table.
bias_correct_log_table
, a data.frame
of
bias-corrected log abundance table.
ss_tab
, a data.frame
of sensitivity scores for
pseudo-count addition to 0s.
zero_ind
, a logical data.frame
with TRUE
indicating the taxon is detected to contain structural zeros in
some specific groups.
samp_frac
, a numeric vector of estimated sampling
fractions in log scale (natural log).
delta_em
, estimated sample-specific biases
through E-M algorithm.
delta_wls
, estimated sample-specific biases through
weighted least squares (WLS) algorithm.
res
, a data.frame
containing ANCOM-BC2 primary
result:
columns started with lfc
: log fold changes
obtained from the ANCOM-BC2 log-linear (natural log) model.
columns started with se
: standard errors (SEs) of
lfc
.
columns started with W
: test statistics.
W = lfc/se
.
columns started with p
: p-values. P-values are
obtained from two-sided Z-test using the test statistic W
.
columns started with q
: adjusted p-values.
Adjusted p-values are obtained by applying p_adj_method
to p
.
columns started with diff
: TRUE if the
taxon is significant (has q
less than alpha
).
columns started with passed_ss
: TRUE if the
taxon passed the sensitivity analysis, i.e., adding different
pseudo-counts to 0s would not change the results.
res_global
, a data.frame
containing ANCOM-BC2
global test result for the variable specified in group
,
each column is:
W
, test statistics.
p_val
, p-values, which are obtained from two-sided
Chi-square test using W
.
q_val
, adjusted p-values. Adjusted p-values are
obtained by applying p_adj_method
to p_val
.
diff_abn
, A logical vector. TRUE if the taxon has
q_val
less than alpha
.
passed_ss
, A logical vector. TRUE if the taxon has
passed the sensitivity analysis.
res_pair
, a data.frame
containing ANCOM-BC2
pairwise directional test result for the variable specified in
group
:
columns started with lfc
: log fold changes.
columns started with se
: standard errors (SEs).
columns started with W
: test statistics.
columns started with p
: p-values.
columns started with q
: adjusted p-values.
columns started with diff
: TRUE if the
taxon is significant (has q
less than alpha
).
columns started with passed_ss
: TRUE if the
taxon has passed the sensitivity analysis.
res_dunn
, a data.frame
containing ANCOM-BC2
Dunnett's type of test result for the variable specified in
group
:
columns started with lfc
: log fold changes.
columns started with se
: standard errors (SEs).
columns started with W
: test statistics.
columns started with p
: p-values.
columns started with q
: adjusted p-values.
columns started with diff
: TRUE if the
taxon is significant (has q
less than alpha
).
columns started with passed_ss
: TRUE if the
taxon has passed the sensitivity analysis.
res_trend
, a data.frame
containing ANCOM-BC2
trend test result for the variable specified in
group
:
columns started with lfc
: log fold changes.
columns started with se
: standard errors (SEs).
W
: test statistics.
p_val
: p-values.
q_val
: adjusted p-values.
diff_abn
: TRUE if the
taxon is significant (has q
less than alpha
).
passed_ss
, A logical vector. TRUE if the taxon has
passed the sensitivity analysis.
Huang Lin
kaul2017analysisANCOMBC
\insertReflin2020analysisANCOMBC
\insertReftusher2001significanceANCOMBC
\insertRefcostea2014fairANCOMBC
\insertRefpaulson2014replyANCOMBC
\insertRefguo2010controllingANCOMBC
\insertRefgrandhi2016multipleANCOMBC
ancom
ancombc
library(ANCOMBC)
if (requireNamespace("microbiome", quietly = TRUE)) {
data(dietswap, package = "microbiome")
# run ancombc2 function
set.seed(123)
# Note that setting max_iter = 1 and B = 1 is only for the sake of speed
# Use default or larger values for max_iter and B for better performance
out = ancombc2(data = dietswap, tax_level = "Family",
fix_formula = "nationality + timepoint + bmi_group",
rand_formula = NULL,
p_adj_method = "holm", pseudo_sens = TRUE,
prv_cut = 0.10, lib_cut = 1000, s0_perc = 0.05,
group = "bmi_group", struc_zero = TRUE, neg_lb = TRUE,
alpha = 0.05, n_cl = 1, verbose = TRUE,
global = TRUE, pairwise = TRUE, dunnet = TRUE, trend = TRUE,
iter_control = list(tol = 1e-2, max_iter = 1, verbose = TRUE),
em_control = list(tol = 1e-5, max_iter = 1),
lme_control = lme4::lmerControl(),
mdfdr_control = list(fwer_ctrl_method = "holm", B = 1),
trend_control = list(contrast =
list(matrix(c(1, 0, -1, 1),
nrow = 2,
byrow = TRUE)),
node = list(2),
solver = "ECOS",
B = 1))
res_prim = out$res
res_global = out$res_global
res_pair = out$res_pair
res_dunn = out$res_dunn
res_trend = out$res_trend
} else {
message("The 'microbiome' package is not installed. Please install it to use this example.")
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.