Statistical Test Plotting with 'ggasym
In ggasym: Asymmetric Matrix Plotting in 'ggplot2'

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  fig.align = "center"
)
set.seed(0)

Purpose

One of the great uses of 'ggasym' is to plot two values from the results of a multi-way statistical test. Each comparison is a cell, and two values can be used for the fills. Below I give brief examples and plot the differences in mean and the p-value on the symmetric matrix.

library(dplyr)
library(ggplot2)
library(tibble)
library(purrr)
library(broom)
library(ggasym)

Data

The data will be modeled as expression values of 6 genes, each with 10 measurements. I will then as if any of them have different levels of expression.

n_reps <- 10 # number of measurements per gene
expt_std_dev <- 1.5 # std. dev. of measurements
genes <- c("FAK", "talin", "paxillin", "vinculin", "B1integrin", "kindlin")
# "real" expression levels to be used as the mean in `rnorm`
real_expression_levels <- sample(seq(1, 5, 0.1), length(genes), replace = TRUE)
# create a tibble
expr_data <- tibble(
  gene = rep(genes, n_reps),
  real_expr = rep(real_expression_levels, n_reps),
  rep_num = sort(rep(1:n_reps, length(genes)))
)
# add in the measured expression values as a normal distribution around the mean
expr_data <- expr_data %>%
  mutate(expt_expr = rnorm(nrow(expr_data),
    mean = real_expr,
    sd = expt_std_dev
  ))
head(expr_data)

Statistical Tests

I then use an ANOVA to test if there are any differences between any of the comparisons of gene expression.

res_aov <- aov(expt_expr ~ gene, data = expr_data)
broom::tidy(res_aov)

With a very low p-value from the ANOVA, I run the Tukey post-hoc test to find which genes are at different levels.

tukey_res <- TukeyHSD(res_aov)
tukey_res

Plotting

Now I want to plot the estimate in the top-left and p adj in the bottom right. First, I must prepare the data for use with geom_asymmat() by passing the results of the Tukey post-hoc test to asymmetrise_stats(). You can see that it returns the data in a tibble with new columns x and y that are the result of splitting comparison.

asymmat_tib <- asymmetrise_stats(tukey_res)
head(asymmat_tib)

Finally, I can plot the data using geom_asymmat().

ggplot(asymmat_tib, aes(x = x, y = y)) +
  geom_asymmat(aes(fill_tl = estimate, fill_br = -log(adj.p.value))) +
  scale_fill_tl_gradient2(low = "dodgerblue", high = "tomato") +
  scale_fill_br_distiller(type = "seq", palette = "Greens", direction = 1)

And add a few styling changes with normal 'ggplot2' semantics.

ggplot(asymmat_tib, aes(x = x, y = y)) +
  geom_asymmat(aes(fill_tl = estimate, fill_br = -log(adj.p.value))) +
  scale_fill_tl_gradient2(
    low = "dodgerblue", high = "tomato",
    guide = guide_colourbar(order = 1)
  ) +
  scale_fill_br_distiller(
    type = "seq", palette = "Greens", direction = 1,
    guide = guide_colourbar(order = 2)
  ) +
  theme_bw() +
  theme(
    panel.background = element_rect(fill = "grey50"),
    panel.grid = element_blank(),
    axis.title = element_blank(),
    plot.title = element_text(hjust = 0.5)
  ) +
  labs(
    title = "Differential Gene Expression",
    fill_tl = "diff. in\nmean expr.",
    fill_br = "-log( adj. p-value )"
  ) +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0))

One of the conclusions that can be drawn here is that the difference in expression of kindlin and FAK is the greatest and has a very low adjusted p-value. Thus, one of the conclusion is that kindlin is expressed at a lower level than FAK.

For more information, see the complete documentation at the 'ggasym' site.