plot_drivers: Plot Drivers of Omic Variation

View source: R/plot_drivers.R

plot_driversR Documentation

Plot Drivers of Omic Variation

Description

This function visualizes the strength of associations between the principal components of an omic data matrix and a set of biological and/or technical features.

Usage

plot_drivers(
  dat,
  clin,
  stat = "p",
  bivariate = TRUE,
  block = NULL,
  unblock = NULL,
  parametric = TRUE,
  kernel = NULL,
  kpar = NULL,
  top = NULL,
  n_pc = 5L,
  alpha = NULL,
  p_adj = NULL,
  r_adj = FALSE,
  label = FALSE,
  pal_tiles = "PiRdBr",
  lim = NULL,
  coord_equal = FALSE,
  title = "Variation By Feature",
  legend = "right",
  hover = FALSE
)

Arguments

dat

Omic data matrix or matrix-like object with rows corresponding to probes and columns to samples. It is strongly recommended that data be filtered and normalized prior to plotting. Raw counts stored in DGEList or DESeqDataSet objects are automatically extracted and transformed to the log2-CPM scale, with a warning.

clin

Data frame or matrix with rows corresponding to samples and columns to technical and/or biological features to test for associations with omic data.

stat

Association statistic of choice. Currently supports "p" (-log p-values) and "r2" (R-squared). Interpretations vary depending on whether covariates are included. See Details.

bivariate

Test associations in isolation, or after adjusting for all remaining covariates? If FALSE, then clin is treated as a design matrix against which each PC is sequentially regressed. See Details.

block

String specifying the name of the column in which to find the blocking variable, should one be accounted for. See Details.

unblock

Column name(s) of one or more features for which the block covariate should not be applied, if one was supplied. See Details.

parametric

Compute statistics using parametric association tests? If FALSE, rank-based alternatives are used instead. Either a single logical value, in which case it applies to all tests, or a logical vector of length equal to ncol(clin). See Details.

kernel

The kernel generating function, if using KPCA. Options include "rbfdot", "polydot", "tanhdot", "vanilladot", "laplacedot", "besseldot", "anovadot", and "splinedot". To run normal PCA, set to NULL.

kpar

A named list of arguments setting parameters for the kernel function. Only relevant if kernel is not NULL.

top

Optional number (if > 1) or proportion (if < 1) of most variable probes to be used for PCA.

n_pc

Number of principal components to include in the figure.

alpha

Optional significance threshold to impose on associations. Those with p-values (optionally adjusted) less than or equal to alpha are outlined in black.

p_adj

Optional p-value adjustment for multiple testing. Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", and "fdr". See p.adjust.

r_adj

Adjust partial R-squared? Only relevant if stat = "r2" and either bivariate = FALSE or block is non-NULL.

label

Print association statistics over tiles?

pal_tiles

String specifying the color palette to use for heatmap tiles. Options include the complete collection of viridis palettes, as well as all sequential and divergent color schemes available in RColorBrewer. Alternatively, a character vector of at least two colors.

lim

Optional vector of length two defining lower and upper bounds for the scale range. Default is observed extrema for stat = "p" and the unit interval for stat = "r2".

coord_equal

Plot tiles of equal width and height?

title

Optional plot title.

legend

Legend position. Must be one of "bottom", "left", "top", "right", "bottomright", "bottomleft", "topleft", or "topright".

hover

Show association statistics by hovering mouse over tiles? If TRUE, the plot is rendered in HTML and will either open in your browser's graphic display or appear in the RStudio viewer.

Details

Strength of association may be measured either by –log p-values (if stat = "p") or R-squared (if stat = "r2"). The former may be adjusted for multiple testing, while the latter can be adjusted for covariates.

If bivariate = TRUE, then association tests are performed between each PC and each clinical covariate, optionally adjusting for a blocking variable (if block is non-NULL). If bivariate = FALSE, then all tests are partial association tests, in the sense that they control for all remaining covariates.

When bivariate = TRUE, block = NULL, and parametric = TRUE, significance is computed from Pearson correlation tests (for continuous features) or ANOVA F-tests (for categorical features). When parametric = FALSE, significance is computed from rank-based alternatives, i.e. Spearman correlation tests (for continuous features) or Kruskal-Wallis tests (for categorical features).

When bivariate = FALSE or block is non-NULL, significance is computed from partial correlation tests for continuous data (Pearson if parametric = TRUE, Spearman if parametric = FALSE) or repeated measures ANOVA F-tests (under rank-transformation if parametric = FALSE). In all cases, the alternative hypothesis assumes a monotonic relationship between variables.

A blocking variable may be provided if samples violate the assumption of independence, e.g. for studies in which subjects are observed at multiple time points. If a blocking variable is identified, it will be regressed out prior to testing for all variables except those explicitly exempted by the unblock argument. When supplying a blocking variable, be careful to consider potential collinearities in the data. For instance, clinical features may be invariant with respect to subject, while subject may be nested within other variables like batch or treatment group. The block and unblock arguments are intended to help parse out these relationships.

Numeric and categorical features are tested differently. To protect against potential mistakes (e.g., one-hot encoding a Boolean variable), plot_drivers automatically prints a data frame listing the class of each feature.

If kernel is non-NULL, then KPCA is used instead of PCA. See plot_kpca for more info. Details on kernel functions and their input parameters can be found in kernlab::dots.

See Also

plot_pca, plot_kpca

Examples

library(SummarizedExperiment)
library(edgeR)
library(dplyr)
data(airway)
cnts <- assay(airway)
keep <- rowSums(cpm(cnts) > 1) >= 4
mat <- cpm(cnts[keep, ], log = TRUE)
clin <- colData(airway) %>%
  as_tibble(.) %>%
  select(cell, dex)
plot_drivers(mat, clin)


dswatson/bioplotr documentation built on March 3, 2023, 9:43 p.m.