plot_Ordination: Creates ordination plots based on PCA, tSNE or tUMAP

View source: R/plot_Ordination.R

plot_OrdinationR Documentation

Creates ordination plots based on PCA, tSNE or tUMAP

Description

Creates ordination plots based on PCA, tSNE or tUMAP

Usage

plot_Ordination(
  ExpObj = NULL,
  glomby = NULL,
  subsetby = NULL,
  samplesToKeep = NULL,
  samplesToHighlight = NULL,
  featuresToKeep = NULL,
  ignoreunclassified = TRUE,
  applyfilters = NULL,
  featcutoff = NULL,
  GenomeCompletenessCutoff = NULL,
  discard_SDoverMean_below = NULL,
  asPPM = TRUE,
  normalization = "relabund",
  PPM_normalize_to_bases_sequenced = FALSE,
  assay_for_matrix = "BaseCounts",
  algorithm = "tUMAP",
  PCA_Components = c(1, 2),
  distmethod = "bray",
  compareby = NULL,
  colourby = NULL,
  colorby = NULL,
  shapeby = NULL,
  use_letters_as_shapes = FALSE,
  sizeby = NULL,
  connectby = NULL,
  connection_orderby = NULL,
  textby = NULL,
  ellipseby = NULL,
  dotsize = 2,
  log2tran = TRUE,
  tsne_perplx = NULL,
  max_neighbors = 15,
  permanova = TRUE,
  plotcentroids = FALSE,
  highlight_centroids = TRUE,
  show_centroid_distances = FALSE,
  calculate_centroid_distances_in_all_dimensions = FALSE,
  addtit = NULL,
  cdict = NULL,
  grid = TRUE,
  forceaspectratio = 1,
  numthreads = 8,
  return_coordinates_matrix = FALSE,
  permanova_permutations = 10000,
  include_components_variance_plot = FALSE,
  class_to_ignore = "N_A",
  ...
)

Arguments

ExpObj

JAMS-style SummarizedExperiment object

glomby

String giving the taxonomic level at which to agglomerate counts. This argument should only be used with taxonomic SummarizedExperiment objects. When NULL (the default), there is no agglomeration

subsetby

String specifying the metadata variable name for subsetting samples. If passed, multiple plots will be drawn, one plot for samples within each different class contained within the variable. If NULL, data is not subset. Default is NULL.

samplesToKeep

Vector with sample names to keep. If NULL, all samples within the SummarizedExperiment object are kept. Default is NULL.

samplesToHighlight

Vector with sample names to highlight in respect to all other sample names in the plot. If NULL, all samples within the SummarizedExperiment object are shown at the same luminocity. Default is NULL.

featuresToKeep

Vector with feature names to keep. If NULL, all features within the SummarizedExperiment object are kept. Default is NULL. Please note that when agglomerating features with the glomby argument (see above), feature names passed to featuresToKeep must be post-agglomeration feature names. For example, if glomby="Family", featuresToKeep must be family names, such as "f__Enterobacteriaceae", etc.

ignoreunclassified

Requires a logical value. If set to TRUE, for taxonomical SummarizedExperiment objects, the feature "LKT__Unclassified" will be omitted from being shown. In the case of non-taxonomical SummarizedExperiment objects, the completely unannotated features will be omitted. For example, for an ECNumber SummarizedExperiment object, genes *without* an Enzyme Commission Number annotation (feature "EC_none") will not be shown. Statistics are, however, computed taking the completely unclassifed feature into account, so p-values will not change.

applyfilters

Optional string specifying filtration setting "combos", used as a shorthand for setting the featcutoff, GenomeCompletenessCutoff, minl2fc and minabscorrcoeff arguments in JAMS plotting functions. If NULL, none of these arguments are set if not specified. Permissible values for applyfilters are "light", "moderate" or "stringent". The actual values vary whether the SummarizedExperiment object is taxonomical (LKT) or not. For a taxonomical SummarizedExperiment object, using "light" will set featcutoff=c(50, 5), GenomeCompletenessCutoff=c(5, 5), minl2fc=1, minabscorrcoeff=0.4; using "moderate" will set featcutoff=c(250, 15), GenomeCompletenessCutoff=c(10, 5), minl2fc=1, minabscorrcoeff=0.6; and using "stringent" will set featcutoff=c(2000, 15), GenomeCompletenessCutoff=c(30, 10), minl2fc=2, minabscorrcoeff=0.8. For non-taxonomical (i.e. functional) SummarizedExperiment objects, using "light" will set featcutoff=c(0, 0), minl2fc=1, minabscorrcoeff=0.4; using "moderate" will set featcutoff=c(5, 5), minl2fc=1, minabscorrcoeff=0.6; and using "stringent" will set featcutoff=c(50, 15), minl2fc=2.5, minabscorrcoeff=0.8. When using applyfilters, one can still set one or more of featcutoff, GenomeCompletenessCutoff, minl2fc and minabscorrcoeff, which will then take the user set value in lieu of those set by the applyfilters shorthand. Default is light.

featcutoff

Requires a numeric vector of length 2 for specifying how to filter out features by relative abundance. The first value of the vector specifies the minimum relative abundance in Parts per Million (PPM) and the second value is the percentage of samples which must have at least that relative abundance. Thus, passing c(250, 10) to featcutoff would filter out any feature which does not have at least 250 PPM (= 0.025 percent) of relative abundance in at least 10 percent of all samples being plot. Please note that when using the subsetby option (q.v.) to automatically plot multiple plots of sample subsets, the featcutoff parameters are applied within the subset. The default is c(0, 0), meaning no feature is filtered. If NULL is passed, then the value defaults to c(0, 0). See also applyfilters for a shorthand way of applying multiple filtration settings.

GenomeCompletenessCutoff

Requires a numeric vector of length 2 for specifying how to filter out features by genome completeness. This is, of course, only applicble for taxonomic shotgun SummarizedExperiment objects. When passed to non-taxonomic shotgun SummarizedExperiment objects, GenomeCompletenessCutoff will be ignored. The first value of the vector specifies the minimum genome completeness in percentage and the second value is the percentage of samples which must have at least that genome completeness. Thus, passing c(50, 5) to GenomeCompletenessCutoff would filter out any taxonomic feature which does not have at least 50 percent of genome completeness in at least 5 percent of all samples being plot. Please note that when using the subsetby option (q.v.) to automatically plot multiple plots of sample subsets, the GenomeCompletenessCutoff parameters are applied within the subset. The default is c(0, 0), meaning no feature is filtered. If NULL is passed, then the value defaults to c(0, 0). See also applyfilters for a shorthand way of applying multiple filtration settings.

asPPM

Requires a logical value. When set to TRUE, the base counts matrix will be normalized to relative abundance in parts per million (PPM). See also PPM_normalize_to_bases_sequenced. Default is TRUE. If assay_for_matrix is set to "GeneCounts", asPPM will default to FALSE.

PPM_normalize_to_bases_sequenced

Requires a logical value. Non-filtered JAMS feature counts tables (the BaseCounts assay within SummarizedExperiment objects) always includes unclassified taxonomical features (for taxonomical SummarizedExperiment objects) or unknown/unattributed functional features (for non-taxonomical SummarizedExperiment objects), so the relative abundance for each feature (see normalization) will be calculated in Parts per Million (PPM) by dividing the number of bases covering each feature by the sum of each sample column **previous to any filtration**. Relative abundances are thus representative of the entirety of the genomic content for taxonomical objects, whereas for non-taxonomical objects, strictly speaking, it is the abundance of each feature relative to only the coding regions present in the metagenome, even if these are annotationally unatributed. In other words, intergenic regions are not taken into account. In order to relative-abundance-normalize a **non-taxonomical** SummarizedExperiment object with the total genomic sequencing content, including non-coding regions, set PPM_normalize_to_bases_sequenced = TRUE. Default is FALSE.

assay_for_matrix

String specifying the SummarizedExperiment assay to be used for the heatmap. Permissible values are "BaseCounts" or "GeneCounts". "BaseCounts" (the default) will use the basepair counts for each feature (either taxonomical or functional). These values will be normalized into relative abundance in PPM unless specified by the normalization argument (see normalization and PPM_normalize_to_bases_sequenced). When using "GeneCounts" (only available in non-taxonomical SummarizedExperiment objects) the *number of genes* annotated as each feature will be used. The heatmap will be plot with a scale of 0 to the maximum number of genes for a single feature on the heatmap. For instance, using "GeneCounts" for, let's say, an ECNumber SummarizedExperiment will plot the number of genes bearing each Enzyme Commission Number annotation within each sample. Default is "BaseCounts".

algorithm

String giving the algorithm to be used for dimensionality reduction. Permissible values are "tUMAP", "PCoA", "PCA" or "tSNE". For "tUMAP", the sample-by-feature matrix is processed using Uniform Manifold Approximation and Projection as implemented by the uwot package. For "PCA", the compositional dissimilarity (bray-curtis by default - see distmethod) of the sample-by-feature matrix is processed by Principal Component Analysis as implemented by the stats package. For "PCoA", the compositional dissimilarity (bray-curtis by default - see distmethod) of the sample-by-feature matrix is processed by Principal Component Analysis as implemented by the stats package. For "tSNE", the sample-by-feature matrix is processed using An R t-Distributed Stochastic Neighbor Embedding as implemented by the rtsne package.

PCA_Components

Numerical vector of length 2 specifying which two components to plot on the 2-D ordination plot when using "PCoA" or "PCA" (see algotithm). Default is c(1, 2), meaning the first two components. The variance for each component is included on the axis.

distmethod

String giving the dissimilarity index method for calculating the compositional dissimilarity of the sample-by-feature matrix. Default is "bray" (Bray-Curtis dissimilarity). For permissible values see the vegdist function of the vegan package ("euclidean", "bray", "jaccard", etc...).

compareby

String specifying the metadata variable name for grouping samples. This will define which metadata variable grouping to calculate PERMANOVA p-value. If not specified, and argument permanova is set to TRUE, (see permanova), the compareby argument will be set by colourby or shapeby. If these latter two are also NULL, and permanova is TRUE, permanova will be set to FALSE. Default is NULL.

colourby

String specifying the metadata variable name for colouring in samples. If NULL, all samples will be black. Default is NULL.

colorby

Alternative US spelling for the colourby argument. Use either, but not both. At some point, a side must be taken.

shapeby

String specifying the metadata variable name for attributing shapes to samples. If NULL, all samples will be a round dot (pch = 19). Default is NULL. If there are more than 27 classes within the variable, samples will be attributed letters (A-Z, then a-z) automatically. See also use_letters_as_shapes.

use_letters_as_shapes

Requires a logical value. If set to TRUE, then force sample point shapes as being letters (A-Z, then a-z) independent of how many classes there are within the variable passed to shapeby. Default is FALSE.

sizeby

String specifying the metadata variable name for attributing point size to samples. If NULL, all samples are plot with the same size, specified by dotsize. Default is NULL.

connectby

String specifying the metadata variable name for drawing a line connecting samples belonging to the same class. If NULL, samples are not connected. Default is NULL.

connection_orderby

String specifying the metadata variable name for determining the order in which samples connected by the variable specified in connectby should be drawn. This is only applicable, of course, if connectby is not NULL. If NULL, the classes in the metadata variable specified in connectby will be sorted either numerically from low to high, if the variable contains numeric classes, or sorted alphabetically if the variable contains discrete classes.

textby

String specifying the metadata variable containing classes for annotating samples with text next to each sample point. Default is NULL.

ellipseby

String specifying the metadata variable containing classes for encircling with an ellipse samples belonging to each class. Default is NULL.

dotsize

Numeric value for attributing point size to all samples. Default is 2.

log2tran

Requires a logical value. When set to TRUE, distance and ordination calculations will be performed starting with the log2 transformed values of the count matrix, either normalized to relative abundance (when argument asPPM = TRUE, the default) or not. When set to FALSE, the raw values of the count matrix will be used. See asPPM. Default is TRUE, meaning ordinations are done on log2 transformed space. If assay_for_matrix is set to "GeneCounts", log2tran will default to FALSE.

tsne_perplx

Numerical value with perplexity for tSNE. Default is NULL.

max_neighbors

Numerical value specifying the maximum number of neighbors when using the "tUMAP" algorithm for dimensionality reduction (see algorithm). Default is 15.

permanova

Requires a logical value. If set to TRUE, will include in the title plot the PERMANOVA stats for groups set with compareby. Default is TRUE.

plotcentroids

Requires a logical value. If set to TRUE, centroids of samples within each sample group belonging to classes within the variable specified with compareby will be plot and lines will be drawn from each sample to the group centroid. Default is FALSE.

highlight_centroids

Requires a logical value. If set to TRUE, when using plotcentroids, the centroids will be highlighted and marked with a slightly larger relevant group shape. Default is TRUE.

show_centroid_distances

Requires a logical value. If set to TRUE, if centroids are to be plot (see plotcentroids), will include at the bottom of the plot a matrix showing the euclidean distance between the centroids of each group. Default is FALSE.

calculate_centroid_distances_in_all_dimensions

Requires a logical value. If set to TRUE, when plotcentroids and how_centroid_distances are both also set to TRUE, the euclidean

addtit

Optional string with text to append to heatmap main title. Default is NULL.

grid

Requires a logical value. If set to FALSE, background will be one solid color within the plot, rather than include a grid behind the plot. Default is TRUE, meaning that the background will display a grid.

forceaspectratio

Numeric value setting the desired plot aspect ratio. Default is 1, meaning the plot will be drawn as a square box.

numthreads

Numeric value setting the number of threads to use for any multi-threaded process within this function. The default is 1.

return_coordinates_matrix

Requires a logical value. If set to TRUE, the list of objects returned by this plot_Ordination will include, in addition to the ggplot2 ordination plots, a matrix with the x,y plot positions for each sample after ordination. Default is FALSE.

permanova_permutations

Numerical value specifying the number of permutations for PERMANOVA. Default is 10000.

class_to_ignore

String or vector specifying any classes which should lead to samples being excluded from the comparison within the variable passed to compareby. Default is N_A. This means that within any metadata variable passed to compareby containing the "N_A" string within that specific variable, the sample will be dropped from that comparison.


johnmcculloch/JAMS_BW documentation built on March 29, 2024, 7:56 p.m.