Statistical Test Plotting with 'ggasym

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  fig.align = "center"
)
set.seed(0)

Purpose

One of the great uses of 'ggasym' is to plot two values from the results of a multi-way statistical test. Each comparison is a cell, and two values can be used for the fills. Below I give brief examples and plot the differences in mean and the p-value on the symmetric matrix.

library(dplyr)
library(ggplot2)
library(tibble)
library(purrr)
library(broom)
library(ggasym)

Data

The data will be modeled as expression values of 6 genes, each with 10 measurements. I will then as if any of them have different levels of expression.

n_reps <- 10 # number of measurements per gene
expt_std_dev <- 1.5 # std. dev. of measurements
genes <- c("FAK", "talin", "paxillin", "vinculin", "B1integrin", "kindlin")
# "real" expression levels to be used as the mean in `rnorm`
real_expression_levels <- sample(seq(1, 5, 0.1), length(genes), replace = TRUE)
# create a tibble
expr_data <- tibble(
  gene = rep(genes, n_reps),
  real_expr = rep(real_expression_levels, n_reps),
  rep_num = sort(rep(1:n_reps, length(genes)))
)
# add in the measured expression values as a normal distribution around the mean
expr_data <- expr_data %>%
  mutate(expt_expr = rnorm(nrow(expr_data),
    mean = real_expr,
    sd = expt_std_dev
  ))
head(expr_data)

Statistical Tests

I then use an ANOVA to test if there are any differences between any of the comparisons of gene expression.

res_aov <- aov(expt_expr ~ gene, data = expr_data)
broom::tidy(res_aov)

With a very low p-value from the ANOVA, I run the Tukey post-hoc test to find which genes are at different levels.

tukey_res <- TukeyHSD(res_aov)
tukey_res

Plotting

Now I want to plot the estimate in the top-left and p adj in the bottom right. First, I must prepare the data for use with geom_asymmat() by passing the results of the Tukey post-hoc test to asymmetrise_stats(). You can see that it returns the data in a tibble with new columns x and y that are the result of splitting comparison.

asymmat_tib <- asymmetrise_stats(tukey_res)
head(asymmat_tib)

Finally, I can plot the data using geom_asymmat().

ggplot(asymmat_tib, aes(x = x, y = y)) +
  geom_asymmat(aes(fill_tl = estimate, fill_br = -log(adj.p.value))) +
  scale_fill_tl_gradient2(low = "dodgerblue", high = "tomato") +
  scale_fill_br_distiller(type = "seq", palette = "Greens", direction = 1)

And add a few styling changes with normal 'ggplot2' semantics.

ggplot(asymmat_tib, aes(x = x, y = y)) +
  geom_asymmat(aes(fill_tl = estimate, fill_br = -log(adj.p.value))) +
  scale_fill_tl_gradient2(
    low = "dodgerblue", high = "tomato",
    guide = guide_colourbar(order = 1)
  ) +
  scale_fill_br_distiller(
    type = "seq", palette = "Greens", direction = 1,
    guide = guide_colourbar(order = 2)
  ) +
  theme_bw() +
  theme(
    panel.background = element_rect(fill = "grey50"),
    panel.grid = element_blank(),
    axis.title = element_blank(),
    plot.title = element_text(hjust = 0.5)
  ) +
  labs(
    title = "Differential Gene Expression",
    fill_tl = "diff. in\nmean expr.",
    fill_br = "-log( adj. p-value )"
  ) +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0))

One of the conclusions that can be drawn here is that the difference in expression of kindlin and FAK is the greatest and has a very low adjusted p-value. Thus, one of the conclusion is that kindlin is expressed at a lower level than FAK.


For more information, see the complete documentation at the 'ggasym' site.



Try the ggasym package in your browser

Any scripts or data that you put into this service are public.

ggasym documentation built on May 16, 2021, 1:07 a.m.