knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=7, fig.height=4.5 )
psiplot is an R package for generating plots of percent spliced-in (PSI) values of
alternatively-spliced exons that were computed by
vast-tools, an RNA-Seq pipeline for
alternative splicing analysis. The plots are generated using ggplot2
.
See Releases for the latest stable release or get the most up-to-date development version via devtools:
if (!require("devtools")) install.packages("devtools") devtools::install_github("kcha/psiplot")
psiplot takes as input the PSI and/or cRPKM results generated by vast-tools
(e.g. after running vast-tools combine
or vast-tools diff
). For example,
psi <- read.table("INCLUSION_LEVELS_FULL-Mmu53.tab", header = TRUE, sep = "\t", stringsAsFactors = FALSE) crpkm <- read.table("cRPKM-Mmu53.tab", header = TRUE, sep = "\t", stringsAsFactors = FALSE)
This README uses the provided sample datasets, psi
, crpkm
and crpkm_counts
,
as example input data. We also provide an optional sample configuration table, config
,
which is used for customizing psiplots.
Events can be plotted individually as a scatterplot (a.k.a "psiplots") or collectively as a heatmap. A function to produce psiplots for multiple events is also provided.
The function plot_event()
generates a plot of a single event:
library(psiplot) # Plot an event using provided sample dataset plot_event(psi[1,], config = config)
Alternatively, to plot an event by gene name (for example, MTA1):
plot_event(psi[psi$GENE == "MTA1",], config = config)
Several events can be plotted together with the function plot_multievent()
. This
allows users to compare the inclusion patterns of small groups of events:
plot_multievent(psi[c(1,2),], config = config, event_col = c("black", "red"))
In addition, cRPKM expression values generated by vast-tools can also be plotted
in a similar manner using the function plot_expr()
:
plot_expr(crpkm[1,], config = config)
The function plot_multi
generates a heatmap of multiple events using geom_tile
. Alternatively,
if you have the R package pheatmap
or gplots
installed, you can set usepkgs = 'pheatmap'
or usepkgs = 'gplots'
to generate a heatmap. By default, rows and columns
will be ordered by hierarchical clustering.
One benefit of using pheatmap
or gplots
is the ability to include
group annotations from the config file:
# example using pheatmap plot_multi(psi, config=config, usepkg = 'pheatmap')
Other use cases:
## Not run # For cRPKM, use expr = TRUE plot_multi(crpkm, expr = TRUE) # To disable clustering of events plot_multi(psi, cluster_rows = FALSE) # To disable clustering of samples plot_multi(psi, cluster_cols = FALSE) # To generate a pheatmap-based heatmap plot_multi(psi, usepkg = "pheatmap")
There are two ways to customize the plots: using a configuration file or using R arguments.
config
fileIn vast-tools plot
, an optional config file can be used to customize the
plots' visual appearance. The same config file can be supplied here as well.
plot_event(psi[1,], config = "/path/to/config") # config can also be pre-loaded into a data frame cfg <- read.table("/path/to/config", header = TRUE, sep = "\t", stringsAsFactor = FALSE) plot_event(psi[1,], config = cfg) plot_multi(psi, config = cfg) plot_expr(crpkm[1,], config = cfg)
This file is tab-delimited and must be manually created. For example:
# sample config data
config
The header is required, but the order of the columns is flexible. There are two required columns and three optional:
Required:
groupmean=TRUE
).Optional:
subg=TRUE
). See [Subgrouping samples] for more details.colors()
).Tips:
col
option.NA
,
then a default ggplot2 color palette will be used for annotating sample groups.
For example:config2 <- config config2$RColorCode <- NA head(config2) plot_event(psi[1,], config2)
plot_multi
, specifying a config file will disable clustering of samples
and instead use the config order.subg
is set to TRUE
. Similarly, if the config file includes subgroups,
but subg
is set to FALSE
, they will not be used.The colors and other graphical parameters can also be configured in R via arguments.
The plot_*
family of methods provide a limited set of graphical arguments and can be
used in conjunction with a config file. See the help pages for each method
(e.g. ?plot_event
) for more details on the available options.
For example, the following command uses config
to customize the colors and groupings, sets the
point symbol, and restricts the y-axis to (20, 80):
plot_event(psi[1,], config = config, pch = 9, ylim = c(20, 80))
Note: Samples 6, 7, 8 fall outside of this range and thus are not shown.
Because the plot_*
methods return ggplot2 objects, you can further customize
the plots by appending additional aesthetics or themes (if the option is not supported
within the method itself). For example, to increase the text size of the legend:
library(ggplot2) plot_event(psi[1,], config = config) + theme(legend.text = element_text(size = 20))
Certain options may interfere or overwite those already
set within the plot_*
methods, so please use at your discretion.
The qual
argument can be used to indicate the minimum value of the first vast-tools
quality score that a PSI value needs to have in order to be accepted. PSI values
with lower quality scores will be coerced to NA.
The possible values for this score are, from worse to better: N
, VLOW
, LOW
,
OK
, and SOK
. By default, the minimum quality accepted is VLOW
.
For more information on vast-tools quality scores, please see
the description of vast-tools output
(column 8).
The subg
argument allows pooling samples into subgroups if there is a SubgroupName
column included in the supplied config
file (see [Customizing plots]). The PSI
of a subgroup is the mean of the PSIs of its individual samples, after filtering
out the samples with quality score below the threshold indicated by qual
(see [Filtering samples by quality score]).
plot_event(psi[1,], config = config, subg = TRUE)
Alternatively, rather than plotting the average PSI, the individual values can
be plotted by setting the option subj.show
to "all" or "beeswarm" (default is
"mean"). The latter
uses ggbeeswarm to make beeswarm
scatterplots. One limitation of beeswarm plots is that error bars cannot be
plotted, at least to my knowledge. If error bars are needed, choose "all":
library(dplyr) config2 <- config %>% mutate( SubgroupName = ifelse(Order <= 5, "Subgroup1", "Subgroup2") ) plot_event(psi[1,], config = config2, subg = TRUE, subg.show = "all")
To illustrate a subgrouped plot using beeswarm plots, we'll constrat a larger PSI table with 100 samples:
set.seed(416) large_psi <- tibble( GENE = "TEST", EVENT = "HsaEx0000", COORD = "chr1:100-200", LENGTH = 100, FullCO = "chr1:1,100-200,210", COMPLEX = "S", SampleName = paste0("Sample", sprintf("%0.3d", 1:100)), PSI = c(rbeta(30, 10, 90), rbeta(30, 10, 11), rbeta(40, 9,2))*100, Q = "OK,OK,OK,OK,S@50,50" # dummy quality score - not used ) psi_temp <- tidyr::spread(large_psi, SampleName, PSI) %>% select(-Q) qual_temp <- large_psi %>% mutate(SampleName = paste0(SampleName, "-Q")) %>% select(-PSI) %>% tidyr::spread(SampleName, Q) # Put everything together large_config <- tibble( SampleName = large_psi$SampleName, GroupName = c(rep("ESC", 30), rep("Muscle", 30), rep("Brain", 40)), SubgroupName = c(rep("Subgroup1", 10), rep("Subgroup2", 20), rep("Subgroup3", 30), rep("Subgroup4", 40)) ) large_psi <- inner_join(psi_temp, qual_temp) large_psi <- large_psi[, c(names(large_psi)[1:6], sort(names(large_psi)[7:ncol(large_psi)]))] plot_event(large_psi[1,], large_config, subg = T, subg.show = "beeswarm")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.