clrDR: DR plot on CLR of proportions

View source: R/clrDR.R

clrDRR Documentation

DR plot on CLR of proportions

Description

Computes centered log-ratios (CLR) on cluster/sample proportions across samples/clusters, and visualizes them in a lower-dimensional space, highlighting differences in composition between samples/clusters.

Usage

clrDR(
  x,
  dr = c("PCA", "MDS", "UMAP", "TSNE", "DiffusionMap"),
  by = c("sample_id", "cluster_id"),
  k = "meta20",
  dims = c(1, 2),
  base = 2,
  arrows = TRUE,
  point_col = switch(by, sample_id = "condition", "cluster_id"),
  arrow_col = switch(by, sample_id = "cluster_id", "condition"),
  arrow_len = 0.5,
  arrow_opa = 0.5,
  label_by = NULL,
  size_by = TRUE,
  point_pal = NULL,
  arrow_pal = NULL
)

Arguments

x

a SingleCellExperiment.

dr

character string specifying which dimension reduction to use.

by

character string specifying across which IDs to compute CLRs

  • by = "sample_id" compute CLRs across relative abundances of samples across clusters; each point in the embedded space represents a sample.

  • by = "cluster_id" compute CLRs across relative abundances of clusters across samples; each point in the embedded space represents a cluster.

k

character string specifying which clustering to use; valid values are names(cluster_codes(x)).

dims

two numeric scalars indicating which dimensions to plot.

base

integer scalar specifying the logarithm base to use.

arrows

logical specifying whether to include arrows for PC loadings.

point_col, arrow_col

character string specifying a non-numeric cell metadata column to color points and PC loading arrows by; valid values are names(colData(x)).

arrow_len

non-zero single numeric specifying the length of loading vectors relative to the largest xy-coordinate in the embedded space; NULL for no re-sizing (see details).

arrow_opa

single numeric in [0,1] specifying the opacity (alpha) of PC loading arrows when they are grouped; 0 will hide individual arrows.

label_by

character string specifying a non-numeric sample metadata variable to label points by; valid values are names(colData(x)).

size_by

logical specifying whether to scale point sizes by the number of cells in a given sample/cluster (for by = "sample/cluster_id").

point_pal, arrow_pal

character string of colors to use for points and PC loading arrows. Arguments default to .cluster_cols for clusters (defined internally), and brewer.pal's "Set3" for samples.

Details

The centered log-ratio (CLR)

Let k be one of S samples, k one of K clusters, and p(s,k) be the proportion of cells from s in k. The centered log-ratio (CLR) is defined as

clr(sk) = log p(s,k) - \sum p(s,k) / K

and analogous for clusters replacing s by k and K by S. Thus, each sample/cluster gives a vector with length K/S and mean 0, and the CLRs computed across all instances can be represented as a matrix with dimensions S x K (or K x S for clusters) that we embed into a lower dimensional space.

Dimensionality reduction

In principle, clrDR allows any dimension reduction to be applied on the CLRs. The default method (dr = "PCA") will include the percentage of variance explained by each principal component (PC) in the axis labels.

Noteworthily, distances between points in the lower-dimensional space are meaningful only for linear DR methods (PCA and MDS), and results obtained from other methods should be interpreted with caution. Thus, the output plot's aspect ratio should be kept as is for PCA and MDS; non-linear DR methods can use aspect.ratio = 1, rendering a square plot.

Interpreting PC loadings

For dr = "PCA", PC loadings will be represented as arrows that may be interpreted as follows: 0° (180°) between vectors indicates a strong positive (negative) relation between them, while vectors that are orthogonal to each another (90°) are roughly independent.

When a vector points towards a given quadrant, the variability in proportions for the points within this quadrant are largely driven by the corresponding variable. Here, only the relative orientation of vectors to one another and to the PC axes is meaningful; however, the sign of loadings (i.e., whether an arrow points left or right) can be flipped when re-computing PCs.

When arrow_len is specified, PC loading vectors will be re-scaled to improve their visibility. Here, a value of 1 will stretch vectors such that the largest loading will touch on the outer most point. Importantly, while absolute arrow lengths are not interpretable, their relative length is.

Value

a ggplot object.

Author(s)

Helena L Crowell helena.crowell@uzh.ch

Examples

data(PBMC_fs, PBMC_panel, PBMC_md)
sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
sce <- cluster(sce)

# CLR on sample proportions across clusters
# (1st vs. 3rd PCA; include sample labels)
clrDR(sce, by = "sample_id", k = "meta12",
  dims = c(1, 3), label_by = "sample_id")

# CLR on cluster proportions across samples
# (use custom colors for both points & loadings)
clrDR(sce, by = "cluster_id",
  point_pal = hcl.colors(10, "Spectral"),
  arrow_pal = c("royalblue", "orange"))


HelenaLC/CATALYST documentation built on Nov. 30, 2024, 4:04 a.m.