ancombc  R Documentation 
Determine taxa whose absolute abundances, per unit volume, of
the ecosystem (e.g., gut) are significantly different with changes in the
covariate of interest (e.g., group). The current version of
ancombc
function implements Analysis of Compositions of Microbiomes
with Bias Correction (ANCOMBC) in crosssectional data while allowing
for covariate adjustment.
ancombc( data = NULL, assay_name = "counts", tax_level = NULL, phyloseq = NULL, formula, p_adj_method = "holm", prv_cut = 0.1, lib_cut = 0, group = NULL, struc_zero = FALSE, neg_lb = FALSE, tol = 1e05, max_iter = 100, conserve = FALSE, alpha = 0.05, global = FALSE, n_cl = 1, verbose = FALSE )
data 
the input data. A

assay_name 
character. Name of the count table in the data object
(only applicable if data object is a 
tax_level 
character. The taxonomic level of interest. The input data
can be agglomerated at different taxonomic levels based on your research
interest. Default is NULL, i.e., do not perform agglomeration, and the
ANCOMBC anlysis will be performed at the lowest taxonomic level of the
input 
phyloseq 
a 
formula 
the character string expresses how microbial absolute abundances for each taxon depend on the variables in metadata. 
p_adj_method 
character. method to adjust pvalues. Default is "holm".
Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
"fdr", "none". See 
prv_cut 
a numerical fraction between 0 and 1. Taxa with prevalences
less than 
lib_cut 
a numerical threshold for filtering samples based on library
sizes. Samples with library sizes less than 
group 
character. the name of the group variable in metadata.

struc_zero 
logical. whether to detect structural zeros based on

neg_lb 
logical. whether to classify a taxon as a structural zero using its asymptotic lower bound. Default is FALSE. 
tol 
numeric. the iteration convergence tolerance for the EM algorithm. Default is 1e05. 
max_iter 
numeric. the maximum number of iterations for the EM algorithm. Default is 100. 
conserve 
logical. whether to use a conservative variance estimator for the test statistic. It is recommended if the sample size is small and/or the number of differentially abundant taxa is believed to be large. Default is FALSE. 
alpha 
numeric. level of significance. Default is 0.05. 
global 
logical. whether to perform the global test. Default is FALSE. 
n_cl 
numeric. The number of nodes to be forked. For details, see

verbose 
logical. Whether to generate verbose output during the ANCOMBC fitting process. Default is FALSE. 
A taxon is considered to have structural zeros in some (>=1)
groups if it is completely (or nearly completely) missing in these groups.
For instance, suppose there are three groups: g1, g2, and g3.
If the counts of taxon A in g1 are 0 but nonzero in g2 and g3,
then taxon A will be considered to contain structural zeros in g1.
In this example, taxon A is declared to be differentially abundant between
g1 and g2, g1 and g3, and consequently, it is globally differentially
abundant with respect to this group variable.
Such taxa are not further analyzed using ANCOMBC, but the results are
summarized in the overall summary. For more details about the structural
zeros, please go to the
ANCOMII paper.
Setting neg_lb = TRUE
indicates that you are using both criteria
stated in section 3.2 of
ANCOMII
to detect structural zeros; otherwise, the algorithm will only use the
equation 1 in section 3.2 for declaring structural zeros. Generally, it is
recommended to set neg_lb = TRUE
when the sample size per group is
relatively large (e.g. > 30).
a list
with components:
feature_table
, a data.frame
of preprocessed
(based on prv_cut
and lib_cut
) microbial count table.
zero_ind
, a logical data.frame
with TRUE
indicating the taxon is detected to contain structural zeros in
some specific groups.
samp_frac
, a numeric vector of estimated sampling
fractions in log scale (natural log).
delta_em
, estimated samplespecific biases
through EM algorithm.
delta_wls
, estimated samplespecific biases through
weighted least squares (WLS) algorithm.
res
, a list
containing ANCOMBC primary result,
which consists of:
lfc
, a data.frame
of log fold changes
obtained from the ANCOMBC loglinear (natural log) model.
se
, a data.frame
of standard errors (SEs) of
lfc
.
W
, a data.frame
of test statistics.
W = lfc/se
.
p_val
, a data.frame
of pvalues. Pvalues are
obtained from twosided Ztest using the test statistic W
.
q_val
, a data.frame
of adjusted pvalues.
Adjusted pvalues are obtained by applying p_adj_method
to p_val
.
diff_abn
, a logical data.frame
. TRUE if the
taxon has q_val
less than alpha
.
res_global
, a data.frame
containing ANCOMBC
global test result for the variable specified in group
,
each column is:
W
, test statistics.
p_val
, pvalues, which are obtained from twosided
Chisquare test using W
.
q_val
, adjusted pvalues. Adjusted pvalues are
obtained by applying p_adj_method
to p_val
.
diff_abn
, A logical vector. TRUE if the taxon has
q_val
less than alpha
.
Huang Lin
kaul2017analysisANCOMBC
\insertReflin2020analysisANCOMBC
ancom
ancombc2
#===========Build a TreeSummarizedExperiment Object from Scratch============= library(mia) # microbial count table otu_mat = matrix(sample(1:100, 100, replace = TRUE), nrow = 10, ncol = 10) rownames(otu_mat) = paste0("taxon", 1:nrow(otu_mat)) colnames(otu_mat) = paste0("sample", 1:ncol(otu_mat)) assays = SimpleList(counts = otu_mat) # sample metadata smd = data.frame(group = sample(LETTERS[1:4], size = 10, replace = TRUE), row.names = paste0("sample", 1:ncol(otu_mat)), stringsAsFactors = FALSE) smd = DataFrame(smd) # taxonomy table tax_tab = matrix(sample(letters, 70, replace = TRUE), nrow = nrow(otu_mat), ncol = 7) rownames(tax_tab) = rownames(otu_mat) colnames(tax_tab) = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species") tax_tab = DataFrame(tax_tab) # create TSE tse = TreeSummarizedExperiment(assays = assays, colData = smd, rowData = tax_tab) # convert TSE to phyloseq pseq = makePhyloseqFromTreeSummarizedExperiment(tse) #========================Run ANCOMBC Using a Real Data======================= library(ANCOMBC) data(hitchip1006) # subset to baseline tse = hitchip1006[, hitchip1006$time == 0] # run ancombc function set.seed(123) out = ancombc(data = tse, assay_name = "counts", tax_level = "Family", phyloseq = NULL, formula = "age + nationality + bmi_group", p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000, group = "bmi_group", struc_zero = TRUE, neg_lb = FALSE, tol = 1e5, max_iter = 100, conserve = TRUE, alpha = 0.05, global = TRUE, n_cl = 1, verbose = TRUE) res_prim = out$res res_global = out$res_global # to run ancombc using the phyloseq object tse_alt = agglomerateByRank(tse, "Family") pseq = makePhyloseqFromTreeSummarizedExperiment(tse_alt) set.seed(123) out = ancombc(data = NULL, assay_name = NULL, tax_level = "Family", phyloseq = pseq, formula = "age + nationality + bmi_group", p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000, group = "bmi_group", struc_zero = TRUE, neg_lb = FALSE, tol = 1e5, max_iter = 100, conserve = TRUE, alpha = 0.05, global = TRUE, n_cl = 1, verbose = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.