clrDR: DR plot on CLR of proportions
In CATALYST: Cytometry dATa anALYSis Tools

Description Usage Arguments Details Value Author(s) Examples

Computes centered log-ratios (CLR) on cluster/sample proportions across samples/clusters, and visualizes them in a lower-dimensional space, highlighting differences in composition between samples/clusters.

clrDR(
  x,
  dr = c("PCA", "MDS", "UMAP", "TSNE", "DiffusionMap"),
  by = c("sample_id", "cluster_id"),
  k = "meta20",
  dims = c(1, 2),
  base = 2,
  arrows = TRUE,
  point_col = switch(by, sample_id = "condition", "cluster_id"),
  arrow_col = switch(by, sample_id = "cluster_id", "condition"),
  arrow_len = 0.5,
  arrow_opa = 0.5,
  label_by = NULL,
  size_by = TRUE,
  point_pal = NULL,
  arrow_pal = NULL
)

`x`	a `SingleCellExperiment`.
`dr`	character string specifying which dimension reduction to use.
`by`	character string specifying across which IDs to compute CLRs `by = "sample_id"` compute CLRs across relative abundances of samples across clusters; each point in the embedded space represents a sample. `by = "cluster_id"` compute CLRs across relative abundances of clusters across samples; each point in the embedded space represents a cluster.
`k`	character string specifying which clustering to use; valid values are `names(cluster_codes(x))`.
`dims`	length 2 numeric specifying which dimensions to plot.
`base`	integer scalar specifying the logarithm base to use.
`arrows`	logical specifying whether to include arrows for PC loadings.
`point_col, arrow_col`	character string specifying a non-numeric cell metadata column to color points and PC loading arrows by; valid values are `names(colData(x))`.
`arrow_len`	non-zero single numeric specifying the length of loading vectors relative to the largest xy-coordinate in the embedded space; NULL for no re-sizing (see details).
`arrow_opa`	single numeric in [0,1] specifying the opacity (alpha) of PC loading arrows when they are grouped; 0 will hide individual arrows.
`label_by`	character string specifying a non-numeric sample metadata variable to label points by; valid values are `names(colData(x))`.
`size_by`	logical specifying whether to scale point sizes by the number of cells in a given sample/cluster (for `by = "sample/cluster_id"`).
`point_pal, arrow_pal`	character string of colors to use for points and PC loading arrows. Arguments default to `CATALYST:::.cluster_cols` for clusters, and `brewer.pal`'s `"Set3"` for samples.

The centered log-ratio (CLR)

Let k be one of S samples, k one of K clusters, and p(s,k) be the proprtion of cells from s in k. The centered log-ratio (CLR) is defined as

clr(sk) = log p(s,k) - ∑ p(s,k) / K

and analogous for clusters replacing s by k and K by S. Thus, each sample/cluster gives a vector with length K/S and mean 0, and the CLRs computed across all instances can be represented as a matrix with dimensions S x K (or K x S for clusters) that we embed into a lower dimensional space.

Dimensionality reduction

In principle, clrDR allows any dimension reduction to be applied on the CLRs. The default method (dr = "PCA") will include the percentage of variance explained by each principal component (PC) in the axis labels.

Noteworthily, distances between points in the lower-dimensional space are meaningful only for linear DR methods (PCA and MDS), and results obtained from other methods should be interpreted with caution. Thus, the output plot's aspect ratio should be kept as is for PCA and MDS; non-linear DR methods can use aspect.ratio = 1, rendering a square plot.

Interpreting PC loadings

For dr = "PCA", PC loadings will be represented as arrows that may be interpreted as follows: 0<c2><b0> (180<c2><b0>) between vectors indicates a strong positive (negative) relation between them, while vectors that are orthogonal to each another (90<c2><b0>) are roughly independent.

When a vector points towards a given quadrant, the variability in proportions for the points within this quadrant are largely driven by the corresponding variable. Here, only the relative orientation of vectors to one another and to the PC axes is meaningful; however, the sign of loadings (i.e., whether an arrow points left or right) can be flipped when re-computing PCs.

When arrow_len is specified, PC loading vectors will be re-scaled to improve their visibility. Here, a value of 1 will stretch vectors such that the largest loading will touch on the outer most point. Importantly, while absolute arrow lengths are not interpretable, their relative length is.

a ggplot object.

Helena L Crowell helena.crowell@uzh.ch

data(PBMC_fs, PBMC_panel, PBMC_md)
sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
sce <- cluster(sce)

# CLR on sample proportions across clusters
# (1st vs. 3rd PCA; include sample labels)
clrDR(sce, by = "sample_id", k = "meta12",
  dims = c(1, 3), label_by = "sample_id")

# CLR on cluster proportions across samples
# (use custom colors for both points & loadings)
clrDR(sce, by = "cluster_id",
  point_pal = hcl.colors(10, "Spectral"),
  arrow_pal = c("royalblue", "orange"))