knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, fig.align = "center" ) set.seed(0)
One of the great uses of 'ggasym' is to plot two values from the results of a multi-way statistical test. Each comparison is a cell, and two values can be used for the fills. Below I give brief examples and plot the differences in mean and the p-value on the symmetric matrix.
library(dplyr) library(ggplot2) library(tibble) library(purrr) library(broom) library(ggasym)
The data will be modeled as expression values of 6 genes, each with 10 measurements. I will then as if any of them have different levels of expression.
n_reps <- 10 # number of measurements per gene expt_std_dev <- 1.5 # std. dev. of measurements genes <- c("FAK", "talin", "paxillin", "vinculin", "B1integrin", "kindlin") # "real" expression levels to be used as the mean in `rnorm` real_expression_levels <- sample(seq(1, 5, 0.1), length(genes), replace = TRUE) # create a tibble expr_data <- tibble( gene = rep(genes, n_reps), real_expr = rep(real_expression_levels, n_reps), rep_num = sort(rep(1:n_reps, length(genes))) ) # add in the measured expression values as a normal distribution around the mean expr_data <- expr_data %>% mutate(expt_expr = rnorm(nrow(expr_data), mean = real_expr, sd = expt_std_dev )) head(expr_data)
I then use an ANOVA to test if there are any differences between any of the comparisons of gene expression.
res_aov <- aov(expt_expr ~ gene, data = expr_data) broom::tidy(res_aov)
With a very low p-value from the ANOVA, I run the Tukey post-hoc test to find which genes are at different levels.
tukey_res <- TukeyHSD(res_aov) tukey_res
Now I want to plot the estimate
in the top-left and p adj
in the bottom right. First, I must prepare the data for use with geom_asymmat()
by passing the results of the Tukey post-hoc test to asymmetrise_stats()
. You can see that it returns the data in a tibble with new columns x
and y
that are the result of splitting comparison
.
asymmat_tib <- asymmetrise_stats(tukey_res) head(asymmat_tib)
Finally, I can plot the data using geom_asymmat()
.
ggplot(asymmat_tib, aes(x = x, y = y)) + geom_asymmat(aes(fill_tl = estimate, fill_br = -log(adj.p.value))) + scale_fill_tl_gradient2(low = "dodgerblue", high = "tomato") + scale_fill_br_distiller(type = "seq", palette = "Greens", direction = 1)
And add a few styling changes with normal 'ggplot2' semantics.
ggplot(asymmat_tib, aes(x = x, y = y)) + geom_asymmat(aes(fill_tl = estimate, fill_br = -log(adj.p.value))) + scale_fill_tl_gradient2( low = "dodgerblue", high = "tomato", guide = guide_colourbar(order = 1) ) + scale_fill_br_distiller( type = "seq", palette = "Greens", direction = 1, guide = guide_colourbar(order = 2) ) + theme_bw() + theme( panel.background = element_rect(fill = "grey50"), panel.grid = element_blank(), axis.title = element_blank(), plot.title = element_text(hjust = 0.5) ) + labs( title = "Differential Gene Expression", fill_tl = "diff. in\nmean expr.", fill_br = "-log( adj. p-value )" ) + scale_x_discrete(expand = c(0, 0)) + scale_y_discrete(expand = c(0, 0))
One of the conclusions that can be drawn here is that the difference in expression of kindlin and FAK is the greatest and has a very low adjusted p-value. Thus, one of the conclusion is that kindlin is expressed at a lower level than FAK.
For more information, see the complete documentation at the 'ggasym' site.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.