run_ord: Ordination for microbiota data

View source: R/gv_ordination.R

run_ordR Documentation

Ordination for microbiota data

Description

The primary goal of ordination was considered “exploratory” (Gauch 1982a, b), with the introduction of canonical correspondence analysis (CCA), ordination has gone beyond mere “exploratory” analysis (ter Braak 1985) and become hypothesis testing as well.

Usage

run_ord(
    object,
    level = NULL,
    variable,
    transform = c("identity", "log10", "log10p",
                  "SquareRoot", "CubicRoot", "logit"),
    norm = c("none", "rarefy", "TSS", "TMM",
             "RLE", "CSS", "CLR", "CPM"),
    method = c("PCA", "PCoA", "tSNE", "UMAP", "NMDS",
               "CA", "RDA", "CCA", "CAP"),
    distance = c("bray", "unifrac", "wunifrac",
                 "GUniFrac", "dpcoa", "jsd"),
    para = list(Perplexity = NULL,
                Y_vars = NULL,
                Z_vars = NULL,
                scale = TRUE,
                center = TRUE,),
    ...)

Arguments

object

(Required). a phyloseq::phyloseq or SummarizedExperiment::SummarizedExperiment object.

level

(Optional). character. Summarization level (from rank_names(pseq), default: NULL).

variable

(Required). character. grouping variable for test.

transform

character, the methods used to transform the microbial abundance. See transform_abundances() for more details. The options include:

  • "identity", return the original data without any transformation (default).

  • "log10", the transformation is log10(object), and if the data contains zeros the transformation is log10(1 + object).

  • "log10p", the transformation is log10(1 + object).

  • "SquareRoot", the transformation is ⁠Square Root⁠.

  • "CubicRoot", the transformation is ⁠Cubic Root⁠.

  • "logit", the transformation is ⁠Zero-inflated Logit Transformation⁠ (Does not work well for microbiome data).

norm

the methods used to normalize the microbial abundance data. See normalize() for more details. Options include:

  • "none": do not normalize.

  • "rarefy": random subsampling counts to the smallest library size in the data set.

  • "TMM": trimmed mean of m-values. First, a sample is chosen as reference. The scaling factor is then derived using a weighted trimmed mean over the differences of the log-transformed gene-count fold-change between the sample and the reference.

  • "RLE", relative log expression, RLE uses a pseudo-reference calculated using the geometric mean of the gene-specific abundances over all samples. The scaling factors are then calculated as the median of the gene counts ratios between the samples and the reference.

  • "CSS": cumulative sum scaling, calculates scaling factors as the cumulative sum of gene abundances up to a data-derived threshold.

method

(Optional). character. Ordination method (default: "PCoA"), options include:

  • "PCA": Principal Component Analysis.

  • "PCoA": Principal Coordinate Analysis.

  • "tSNE": t-distributed stochastic neighbor embedding.

  • "UMAP": Uniform Manifold Approximation and Projection.

  • "NMDS": Non-metric Multidimensional Scaling.

distance

(Optional). character. Provide one of the currently supported options. See vegan::vegdist for a detailed list of the supported options and links to accompanying documentation (default: "bray"). Options include:

  • "bray": bray crutis distance.

  • "unifrac" : unweighted UniFrac distance.

  • "wunifrac": weighted-UniFrac distance.

  • "GUniFrac": The variance-adjusted weighted UniFrac distances (default: alpha=0.5).

  • "dpcoa": sample-wise distance used in Double Principle Coordinate Analysis.

  • "jsd": Jensen-Shannon Divergence. Alternatively, you can provide a character string that defines a custom distance method, if it has the form described in designdist.

para

(Optional). list. the additional parameters for methods.

  • "Perplexity": numeric; Perplexity parameter (should not be bigger than 3 perplexity < nrow(X) - 1.

  • "Y_vars": Constraining matrix, typically of environmental variables.

  • "Z_vars": Conditioning matrix, the effect of which is removed ("partial out") before next step.

  • "scale": Scale features to unit variance (like correlations).

  • "center": Scale features to unit variance (like correlations).

...

(Optional). additional parameters.

Details

The primary aim of ordination is to represent multiple samples (subjects) in a reduced number of orthogonal (i.e., independent) axes, where the total number of axes is less than or equal to the number of samples

Value

A list of the ordination's results.

Author(s)

Created by Hua Zou (8/9/2023 Shenzhen China)

References

Xia, Y., Sun, J., & Chen, D. G. (2018). Statistical analysis of microbiome data with R (Vol. 847). Singapore: Springer.

Examples


## Not run: 

# phyloseq object
data("Zeybel_2022_gut")
ps_zeybel <- summarize_taxa(Zeybel_2022_gut, level = "Genus")
ord_result <- run_ord(
  object = ps_zeybel,
  variable = "LiverFatClass",
  method = "PCoA")

# SummarizedExperiment object
data("Zeybel_2022_protein")
Zeybel_2022_protein_imp <- impute_abundance(
  Zeybel_2022_protein,
  group = "LiverFatClass",
  ZerosAsNA = TRUE,
  RemoveNA = TRUE,
  cutoff = 20,
  method = "knn")
ord_result <- run_ord(
  object = Zeybel_2022_protein_imp,
  variable = "LiverFatClass",
  method = "PCA")


## End(Not run)


HuaZou/MicrobiomeAnalysis documentation built on May 13, 2024, 11:10 a.m.