ancom: Analysis of Composition of Microbiomes (ANCOM)

View source: R/ancom.R

ancomR Documentation

Analysis of Composition of Microbiomes (ANCOM)

Description

Determine taxa whose absolute abundances, per unit volume, of the ecosystem (e.g. gut) are significantly different with changes in the covariate of interest (e.g. group). The current version of ancom function implements ANCOM in cross-sectional and repeated measurements data while allowing for covariate adjustment.

Usage

ancom(
  data = NULL,
  taxa_are_rows = TRUE,
  assay.type = NULL,
  assay_name = "counts",
  rank = NULL,
  tax_level = NULL,
  aggregate_data = NULL,
  meta_data = NULL,
  p_adj_method = "holm",
  prv_cut = 0.1,
  lib_cut = 0,
  main_var,
  adj_formula = NULL,
  rand_formula = NULL,
  lme_control = lme4::lmerControl(),
  struc_zero = FALSE,
  neg_lb = FALSE,
  alpha = 0.05,
  n_cl = 1,
  verbose = TRUE
)

Arguments

data

the input data. The data parameter should be either a matrix, data.frame, phyloseq or a TreeSummarizedExperiment object. Both phyloseq and TreeSummarizedExperiment objects consist of a feature table (microbial count table), a sample metadata table, a taxonomy table (optional), and a phylogenetic tree (optional). If a matrix or data.frame is provided, ensure that the row names of the metadata match the sample names (column names if taxa_are_rows is TRUE, and row names otherwise) in data. if a phyloseq or a TreeSummarizedExperiment is used, this standard has already been enforced. For detailed information, refer to ?phyloseq::phyloseq or ?TreeSummarizedExperiment::TreeSummarizedExperiment. It is recommended to use low taxonomic levels, such as OTU or species level, as the estimation of sampling fractions requires a large number of taxa.

taxa_are_rows

logical. Whether taxa are positioned in the rows of the feature table. Default is TRUE. It is recommended to use low taxonomic levels, such as OTU or species level, as the estimation of sampling fractions requires a large number of taxa.

assay.type

alias for assay_name.

assay_name

character. Name of the count table in the data object (only applicable if data object is a (Tree)SummarizedExperiment). Default is "counts". See ?SummarizedExperiment::assay for more details.

rank

alias for tax_level.

tax_level

character. The taxonomic or non taxonomic(rowData) level of interest. The input data can be analyzed at any taxonomic or rowData level without prior agglomeration. Note that tax_level must be a value from taxonomyRanks or rowData, which includes "Kingdom", "Phylum" "Class", "Order", "Family" "Genus" "Species" etc. See ?mia::taxonomyRanks for more details. Default is NULL, i.e., do not perform agglomeration, and the ANCOM-BC2 analysis will be performed at the lowest taxonomic level of the input data.

aggregate_data

The abundance data that has been aggregated to the desired taxonomic level. This parameter is required only when the input data is in matrix or data.frame format. For phyloseq or TreeSummarizedExperiment data, aggregation is performed by specifying the tax_level parameter.

meta_data

a data.frame containing sample metadata. This parameter is mandatory when the input data is a generic data.frame. Ensure that the row names of the metadata match the sample names (column names if taxa_are_rows is TRUE, and row names otherwise) in data.

p_adj_method

character. method to adjust p-values. Default is "holm". Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". See ?stats::p.adjust for more details.

prv_cut

a numerical fraction between 0 and 1. Taxa with prevalences (the proportion of samples in which the taxon is present) less than prv_cut will be excluded in the analysis. For example, if there are 100 samples, and a taxon has nonzero counts present in less than 100*prv_cut samples, it will not be considered in the analysis. Default is 0.10.

lib_cut

a numerical threshold for filtering samples based on library sizes. Samples with library sizes less than lib_cut will be excluded in the analysis. Default is 0, i.e. do not discard any sample.

main_var

character. The name of the main variable of interest.

adj_formula

character string representing the formula for covariate adjustment. Please note that you should NOT include the main_var in the formula. Default is NULL.

rand_formula

the character string expresses how the microbial absolute abundances for each taxon depend on the random effects in metadata. ANCOM follows the lmerTest package in formulating the random effects. See ?lmerTest::lmer for more details. Default is NULL.

lme_control

a list of control parameters for mixed model fitting. See ?lme4::lmerControl for details.

struc_zero

logical. whether to detect structural zeros based on main_var. main_var should be discrete. Default is FALSE.

neg_lb

logical. whether to classify a taxon as a structural zero using its asymptotic lower bound. Default is FALSE.

alpha

numeric. level of significance. Default is 0.05.

n_cl

numeric. The number of nodes to be forked. For details, see ?parallel::makeCluster. Default is 1 (no parallel computing).

verbose

logical. Whether to display detailed progress messages.

Details

A taxon is considered to have structural zeros in some (>=1) groups if it is completely (or nearly completely) missing in these groups. For instance, suppose there are three groups: g1, g2, and g3. If the counts of taxon A in g1 are 0 but nonzero in g2 and g3, then taxon A will be considered to contain structural zeros in g1. In this example, taxon A is declared to be differentially abundant between g1 and g2, g1 and g3, and consequently, it is globally differentially abundant with respect to this group variable. Such taxa are not further analyzed using ANCOM, but the results are summarized in the overall summary. For more details about the structural zeros, please go to the ANCOM-II paper. Setting neg_lb = TRUE indicates that you are using both criteria stated in section 3.2 of ANCOM-II to detect structural zeros; otherwise, the algorithm will only use the equation 1 in section 3.2 for declaring structural zeros. Generally, it is recommended to set neg_lb = TRUE when the sample size per group is relatively large (e.g. > 30).

Value

a list with components:

  • res, a data.frame containing ANCOM result for the variable specified in main_var, each column is:

    • W, test statistics.

    • detected_0.9, detected_0.8, detected_0.7, detected_0.6, logical vectors representing whether a taxon is differentially abundant under a series of cutoffs. For example, TRUE in detected_0.7 means the number of ALR transformed models where the taxon is differentially abundant with regard to the main variable outnumbers 0.7 * (n_tax - 1). detected_0.7 is commonly used. Choose detected_0.8 or detected_0.9 for more conservative results, or choose detected_0.6 for more liberal results.

  • zero_ind, a logical data.frame with TRUE indicating the taxon is detected to contain structural zeros in some specific groups.

  • beta_data, a numeric matrix containing pairwise coefficients for the main variable of interest in ALR transformed regression models.

  • p_data, a numeric matrix containing pairwise p-values for the main variable of interest in ALR transformed regression models.

  • q_data, a numeric matrix containing adjusted p-values by applying the p_adj_method to the p_data matrix.

Author(s)

Huang Lin

References

\insertRef

mandal2015analysisANCOMBC

\insertRef

kaul2017analysisANCOMBC

See Also

ancombc ancombc2

Examples

library(ANCOMBC)
if (requireNamespace("microbiome", quietly = TRUE)) {
    data(atlas1006, package = "microbiome")
    # subset to baseline
    pseq = phyloseq::subset_samples(atlas1006, time == 0)

    # run ancom function
    set.seed(123)
    out = ancom(data = pseq, tax_level = "Family",
                p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000,
                main_var = "bmi_group", adj_formula = "age + nationality",
                rand_formula = NULL, lme_control = NULL,
                struc_zero = TRUE, neg_lb = TRUE, alpha = 0.05, n_cl = 1)

    res = out$res
} else {
    message("The 'microbiome' package is not installed. Please install it to use this example.")
}


FrederickHuangLin/ANCOMBC documentation built on Oct. 22, 2024, 3:11 a.m.