run_metagenomeseq: metagenomeSeq differential analysis
In yiluheihei/microbiomeMarker: microbiome biomarker analysis toolkit

run_metagenomeseq

R Documentation

metagenomeSeq differential analysis

Description

Differential expression analysis based on the Zero-inflated Log-Normal mixture model or Zero-inflated Gaussian mixture model using metagenomeSeq.

Usage

run_metagenomeseq(
  ps,
  group,
  confounders = character(0),
  contrast = NULL,
  taxa_rank = "all",
  transform = c("identity", "log10", "log10p"),
  norm = "CSS",
  norm_para = list(),
  method = c("ZILN", "ZIG"),
  p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"),
  pvalue_cutoff = 0.05,
  ...
)

Arguments

`ps`	ps a `phyloseq::phyloseq` object.
`group`	character, the variable to set the group, must be one of the var of the sample metadata.
`confounders`	character vector, the confounding variables to be adjusted. default `character(0)`, indicating no confounding variable.
`contrast`	this parameter only used for two groups comparison while there are multiple groups. For more please see the following details.
`taxa_rank`	character to specify taxonomic rank to perform differential analysis on. Should be one of `phyloseq::rank_names(ps)`, or "all" means to summarize the taxa by the top taxa ranks (`summarize_taxa(ps, level = rank_names(ps)[1])`), or "none" means perform differential analysis on the original taxa (`taxa_names(ps)`, e.g., OTU or ASV).
`transform`	character, the methods used to transform the microbial abundance. See `transform_abundances()` for more details. The options include: "identity", return the original data without any transformation (default). "log10", the transformation is `log10(object)`, and if the data contains zeros the transformation is `log10(1 + object)`. "log10p", the transformation is `log10(1 + object)`.
`norm`	the methods used to normalize the microbial abundance data. See `normalize()` for more details. Options include: "none": do not normalize. "rarefy": random subsampling counts to the smallest library size in the data set. "TSS": total sum scaling, also referred to as "relative abundance", the abundances were normalized by dividing the corresponding sample library size. "TMM": trimmed mean of m-values. First, a sample is chosen as reference. The scaling factor is then derived using a weighted trimmed mean over the differences of the log-transformed gene-count fold-change between the sample and the reference. "RLE", relative log expression, RLE uses a pseudo-reference calculated using the geometric mean of the gene-specific abundances over all samples. The scaling factors are then calculated as the median of the gene counts ratios between the samples and the reference. "CSS": cumulative sum scaling, calculates scaling factors as the cumulative sum of gene abundances up to a data-derived threshold. "CLR": centered log-ratio normalization. "CPM": pre-sample normalization of the sum of the values to 1e+06.
`norm_para`	arguments passed to specific normalization methods.
`method`	character, which model used for differential analysis, "ZILN" (Zero-inflated Log-Normal mixture model)" or "ZIG" (Zero-inflated Gaussian mixture model). And the zero-inflated log-normal model is preferred due to the high sensitivity and low FDR.
`p_adjust`	method for multiple test correction, default `none`, for more details see stats::p.adjust.
`pvalue_cutoff`	numeric, p value cutoff, default 0.05
`...`	extra arguments passed to the model. more details see `metagenomeSeq::fitFeatureModel()` and `metagenomeSeq::fitZig()`, e.g. `control` (can be setted using `metagenomeSeq::zigControl()`) for `metagenomeSeq::fitZig()`.

Details

metagnomeSeq provides two differential analysis methods, zero-inflated log-normal mixture model (implemented in metagenomeSeq::fitFeatureModel()) and zero-inflated Gaussian mixture model (implemented in metagenomeSeq::fitZig()). We recommend fitFeatureModel over fitZig due to high sensitivity and low FDR. Both metagenomeSeq::fitFeatureModel() and metagenomeSeq::fitZig() require the abundance profiles before normalization.

For metagenomeSeq::fitZig(), the output column is the coefficient of interest, and logFC column in the output of metagenomeSeq::fitFeatureModel() is analogous to coefficient. Thus, logFC is really just the estimate the coefficient of interest in metagenomeSeq::fitFeatureModel(). For more details see these question Difference between fitFeatureModel and fitZIG in metagenomeSeq.

contrast must be a two length character or NULL (default). It is only required to set manually for two groups comparison when there are multiple groups. The order determines the direction of comparison, the first element is used to specify the reference group (control). This means that, the first element is the denominator for the fold change, and the second element is used as baseline (numerator for fold change). Otherwise, users do required to concern this paramerter (set as default NULL), and if there are two groups, the first level of groups will set as the reference group; if there are multiple groups, it will perform an ANOVA-like testing to find markers which difference in any of the groups.

Of note, metagenomeSeq::fitFeatureModel() is not allows for multiple groups comparison.

Value

a microbiomeMarker object.

Author(s)

Yang Cao

References

Paulson, Joseph N., et al. "Differential abundance analysis for microbial marker-gene surveys." Nature methods 10.12 (2013): 1200-1202.

Examples

data(enterotypes_arumugam)
ps <- phyloseq::subset_samples(
    enterotypes_arumugam,
    Enterotype %in% c("Enterotype 3", "Enterotype 2")
)
run_metagenomeseq(ps, group = "Enterotype")

yiluheihei/microbiomeMarker documentation built on Feb. 21, 2025, 7:44 a.m.