bio_drivers: Plot Drivers of Omic Variation
In KatrionaGoldmann/BioOutputs: Visualisation of Biological Data

bio_drivers

R Documentation

Plot Drivers of Omic Variation

Description

This function was adapted from dswatsons function and visualizes the strength of associations between the principal components of an omic data matrix and a set of biological and/or technical features.

Usage

bio_drivers(
  pcs,
  clin,
  block = NULL,
  unblock = NULL,
  kernel = NULL,
  kpar = NULL,
  top = NULL,
  n.pc = 5L,
  label = FALSE,
  alpha = 0.05,
  p.adj = NULL,
  title = "Variation By Feature",
  legend = "right",
  hover = FALSE
)

Arguments

`pcs`	PCA on expression data
`clin`	Data frame or matrix with rows correponding to samples and columns to technical and/or biological features to test for associations with omic data.
`block`	String specifying the name of the column in which to find the blocking variable, should one be accounted for. See Details.
`unblock`	Column name(s) of one or more features for which the `block` covariate should not be applied, if one was supplied. See Details.
`kernel`	The kernel generating function, if using KPCA. Options include `"rbfdot"`, `"polydot"`, `"tanhdot"`, `"vanilladot"`, `"laplacedot"`, `"besseldot"`, `"anovadot"`, and `"splinedot"`. To run normal PCA, set to `NULL`. See Details.
`kpar`	A named list of arguments setting parameters for the kernel function. Only relevant if `kernel` is not `NULL`. See Details.
`top`	Optional number (if > 1) or proportion (if < 1) of most variable probes to be used for PCA.
`label`	Print association statistics over tiles?
`alpha`	Optional significance threshold to impose on associations. Those with p-values (optionally adjusted) less than or equal to `alpha` are outlined in black.
`title`	Optional plot title.
`legend`	Legend position. Must be one of `"bottom"`, `"left"`, `"top"`, `"right"`, `"bottomright"`, `"bottomleft"`, `"topleft"`, or `"topright"`.
`hover`	Show p-values by hovering mouse over tiles? If `TRUE`, the plot is rendered in HTML and will either open in your browser's graphic display or appear in the RStudio viewer.
`parametric`	Compute p-values using parametric association tests? If `FALSE`, rank-based alternatives are used instead. See Details.
`n_pc`	Number of principal components to include in the figure.
`p_adj`	Optional p-value adjustment for multiple testing. Options include `"holm"`, `"hochberg"`, `"hommel"`, `"bonferroni"`, `"BH"`, `"BY"`, and `"fdr"`. See `p.adjust`.

Details

Strength of association is measured by -log p-values, optionally adjusted for multiple testing. When parametric = TRUE, significance is computed from Pearson correlation tests (for continuous features) or ANOVA F-tests (for categorical features). When parametric = FALSE, significance is computed from rank-based alternatives, i.e. Spearman correlation tests (for continuous features) or Kruskal-Wallis tests (for categorical features).

An optional blocking variable may be provided if samples violate the assumption of independence, e.g. for studies in which subjects are observed at multiple time points. If a blocking variable is identified, it will be regressed out prior to testing for all variables except those explicitly exempted by the unblock argument. Significance is then computed from partial correlation tests for continuous data (Pearson if parametric = TRUE, Spearman if parametric = FALSE) or repeated measures ANOVA F-tests (under rank-transformation if parametric = FALSE).

When supplying a blocking variable, be careful to consider potential confounding effects. For instance, features like sex and age are usually nested within subject, while subject may be nested within other variables like batch or treatment group. The block and unblock arguments are designed to help parse out these relationships.

Numeric and categorical features are tested differently. To protect against potential mistakes (e.g., one-hot encoding a Boolean variable), plot_drivers automatically prints a data frame listing the class of each feature.

If kernel is non-NULL, then KPCA is used instead of PCA. See plot_kpca for more info. Details on kernel functions and their input parameters can be found in kernlab::dots. #'

Examples

library(SummarizedExperiment)
library(edgeR)
library(dplyr)
data(airway)
cnts <- assay(airway)
keep <- rowSums(cpm(cnts) > 1) >= 4
mat <- cpm(cnts[keep, ], log = TRUE)
clin <- colData(airway) %>%
  as_tibble(.) %>%
  select(Run, cell, dex)
plot_drivers(mat, clin)

KatrionaGoldmann/BioOutputs documentation built on May 21, 2022, 1:24 p.m.