run_ancom: Perform differential analysis using ANCOM

View source: R/da_ancom.R

run_ancomR Documentation

Perform differential analysis using ANCOM

Description

Perform significant test by comparing the pairwise log ratios between all features.

Usage

run_ancom(
  ps,
  group,
  confounders = character(0),
  taxa_rank = "all",
  transform = c("identity", "log10", "log10p", "SquareRoot", "CubicRoot", "logit"),
  norm = "TSS",
  norm_para = list(),
  p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"),
  pvalue_cutoff = 0.05,
  W_cutoff = 0.75
)

Arguments

ps

a phyloseq-class object.

group

character, the variable to set the group.

confounders

character vector, the confounding variables to be adjusted. default character(0), indicating no confounding variable.

taxa_rank

character to specify taxonomic rank to perform differential analysis on. Should be one of phyloseq::rank_names(phyloseq), or "all" means to summarize the taxa by the top taxa ranks (summarize_taxa(ps, level = rank_names(ps)[1])), or "none" means perform differential analysis on the original taxa (taxa_names(phyloseq), e.g., OTU or ASV).

transform

character, the methods used to transform the microbial abundance. See transform_abundances() for more details. The options include:

  • "identity", return the original data without any transformation.

  • "log10", the transformation is log10(object), and if the data contains zeros the transformation is log10(1 + object).

  • "log10p", the transformation is log10(1 + object).

  • "SquareRoot", the transformation is ⁠Square Root⁠.

  • "CubicRoot", the transformation is ⁠Cubic Root⁠.

  • "logit", the transformation is ⁠Zero-inflated Logit Transformation⁠ (Does not work well for microbiome data).

norm

the methods used to normalize the microbial abundance data. See normalize() for more details. Options include:

  • "none": do not normalize.

  • "rarefy": random subsampling counts to the smallest library size in the data set.

  • "TSS": total sum scaling, also referred to as "relative abundance", the abundances were normalized by dividing the corresponding sample library size.

  • "TMM": trimmed mean of m-values. First, a sample is chosen as reference. The scaling factor is then derived using a weighted trimmed mean over the differences of the log-transformed gene-count fold-change between the sample and the reference.

  • "RLE", relative log expression, RLE uses a pseudo-reference calculated using the geometric mean of the gene-specific abundances over all samples. The scaling factors are then calculated as the median of the gene counts ratios between the samples and the reference.

  • "CSS": cumulative sum scaling, calculates scaling factors as the cumulative sum of gene abundances up to a data-derived threshold.

  • "CLR": centered log-ratio normalization.

  • "CPM": pre-sample normalization of the sum of the values to 1e+06.

norm_para

named list. other arguments passed to specific normalization methods. Most users will not need to pass any additional arguments here.

p_adjust

method for multiple test correction, default none, for more details see stats::p.adjust.

pvalue_cutoff

significance level for each of the statistical tests, default 0.05.

W_cutoff

lower bound for the proportion for the W-statistic, default 0.7.

Details

In an experiment with only two treatments, this tests the following hypothesis for feature i:

H_{0i}: E(log(\mu_i^1)) = E(log(\mu_i^2))

where \mu_i^1 and \mu_i^2 are the mean abundances for feature i in the two groups.

The developers of this method recommend the following significance tests if there are 2 groups, use non-parametric Wilcoxon rank sum test stats::wilcox.test(). If there are more than 2 groups, use nonparametric stats::kruskal.test() or one-way ANOVA stats::aov().

Value

a microbiomeMarker object, in which the slot of marker_table contains four variables:

  • feature, significantly different features.

  • enrich_group, the class of the differential features enriched.

  • effect_size, differential means for two groups, or F statistic for more than two groups.

  • W, the W-statistic, number of features that a single feature is tested to be significantly different against.

Author(s)

Huang Lin, Yang Cao

References

Mandal et al. "Analysis of composition of microbiomes: a novel method for studying microbial composition", Microbial Ecology in Health & Disease, (2015), 26.

Examples



data(enterotypes_arumugam)
ps <- phyloseq::subset_samples(
    enterotypes_arumugam,
    Enterotype %in% c("Enterotype 3", "Enterotype 2")
)
run_ancom(ps, group = "Enterotype")



HuaZou/MicrobiomeAnalysis documentation built on May 13, 2024, 11:10 a.m.