bio_drivers: Plot Drivers of Omic Variation

bio_driversR Documentation

Plot Drivers of Omic Variation

Description

This function was adapted from dswatsons function and visualizes the strength of associations between the principal components of an omic data matrix and a set of biological and/or technical features.

Usage

bio_drivers(
  pcs,
  clin,
  block = NULL,
  unblock = NULL,
  kernel = NULL,
  kpar = NULL,
  top = NULL,
  n.pc = 5L,
  label = FALSE,
  alpha = 0.05,
  p.adj = NULL,
  title = "Variation By Feature",
  legend = "right",
  hover = FALSE
)

Arguments

pcs

PCA on expression data

clin

Data frame or matrix with rows correponding to samples and columns to technical and/or biological features to test for associations with omic data.

block

String specifying the name of the column in which to find the blocking variable, should one be accounted for. See Details.

unblock

Column name(s) of one or more features for which the block covariate should not be applied, if one was supplied. See Details.

kernel

The kernel generating function, if using KPCA. Options include "rbfdot", "polydot", "tanhdot", "vanilladot", "laplacedot", "besseldot", "anovadot", and "splinedot". To run normal PCA, set to NULL. See Details.

kpar

A named list of arguments setting parameters for the kernel function. Only relevant if kernel is not NULL. See Details.

top

Optional number (if > 1) or proportion (if < 1) of most variable probes to be used for PCA.

label

Print association statistics over tiles?

alpha

Optional significance threshold to impose on associations. Those with p-values (optionally adjusted) less than or equal to alpha are outlined in black.

title

Optional plot title.

legend

Legend position. Must be one of "bottom", "left", "top", "right", "bottomright", "bottomleft", "topleft", or "topright".

hover

Show p-values by hovering mouse over tiles? If TRUE, the plot is rendered in HTML and will either open in your browser's graphic display or appear in the RStudio viewer.

parametric

Compute p-values using parametric association tests? If FALSE, rank-based alternatives are used instead. See Details.

n_pc

Number of principal components to include in the figure.

p_adj

Optional p-value adjustment for multiple testing. Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", and "fdr". See p.adjust.

Details

Strength of association is measured by -log p-values, optionally adjusted for multiple testing. When parametric = TRUE, significance is computed from Pearson correlation tests (for continuous features) or ANOVA F-tests (for categorical features). When parametric = FALSE, significance is computed from rank-based alternatives, i.e. Spearman correlation tests (for continuous features) or Kruskal-Wallis tests (for categorical features).

An optional blocking variable may be provided if samples violate the assumption of independence, e.g. for studies in which subjects are observed at multiple time points. If a blocking variable is identified, it will be regressed out prior to testing for all variables except those explicitly exempted by the unblock argument. Significance is then computed from partial correlation tests for continuous data (Pearson if parametric = TRUE, Spearman if parametric = FALSE) or repeated measures ANOVA F-tests (under rank-transformation if parametric = FALSE).

When supplying a blocking variable, be careful to consider potential confounding effects. For instance, features like sex and age are usually nested within subject, while subject may be nested within other variables like batch or treatment group. The block and unblock arguments are designed to help parse out these relationships.

Numeric and categorical features are tested differently. To protect against potential mistakes (e.g., one-hot encoding a Boolean variable), plot_drivers automatically prints a data frame listing the class of each feature.

If kernel is non-NULL, then KPCA is used instead of PCA. See plot_kpca for more info. Details on kernel functions and their input parameters can be found in kernlab::dots. #'

See Also

plot_pca, plot_kpca

Examples

library(SummarizedExperiment)
library(edgeR)
library(dplyr)
data(airway)
cnts <- assay(airway)
keep <- rowSums(cpm(cnts) > 1) >= 4
mat <- cpm(cnts[keep, ], log = TRUE)
clin <- colData(airway) %>%
  as_tibble(.) %>%
  select(Run, cell, dex)
plot_drivers(mat, clin)


KatrionaGoldmann/BioOutputs documentation built on May 21, 2022, 1:24 p.m.