knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=7, fig.height=4.5
)

psiplot is an R package for generating plots of percent spliced-in (PSI) values of alternatively-spliced exons that were computed by vast-tools, an RNA-Seq pipeline for alternative splicing analysis. The plots are generated using ggplot2.

Installation

See Releases for the latest stable release or get the most up-to-date development version via devtools:

if (!require("devtools")) install.packages("devtools")
devtools::install_github("kcha/psiplot")

Usage

Input

psiplot takes as input the PSI and/or cRPKM results generated by vast-tools (e.g. after running vast-tools combine or vast-tools diff). For example,

psi <- read.table("INCLUSION_LEVELS_FULL-Mmu53.tab", header = TRUE, sep = "\t",
                  stringsAsFactors = FALSE)
crpkm <- read.table("cRPKM-Mmu53.tab", header = TRUE, sep = "\t",
                    stringsAsFactors = FALSE)

This README uses the provided sample datasets, psi, crpkm and crpkm_counts, as example input data. We also provide an optional sample configuration table, config, which is used for customizing psiplots.

Plotting

Events can be plotted individually as a scatterplot (a.k.a "psiplots") or collectively as a heatmap. A function to produce psiplots for multiple events is also provided.

Individual events

The function plot_event() generates a plot of a single event:

library(psiplot)

# Plot an event using provided sample dataset
plot_event(psi[1,], config = config)

Alternatively, to plot an event by gene name (for example, MTA1):

plot_event(psi[psi$GENE == "MTA1",], config = config)


Multi-event PSI plots

Several events can be plotted together with the function plot_multievent(). This allows users to compare the inclusion patterns of small groups of events:

plot_multievent(psi[c(1,2),], config = config, event_col = c("black", "red"))

Plotting gene expression values

In addition, cRPKM expression values generated by vast-tools can also be plotted in a similar manner using the function plot_expr():

plot_expr(crpkm[1,], config = config)

Plotting heatmaps

The function plot_multi generates a heatmap of multiple events using geom_tile. Alternatively, if you have the R package pheatmap or gplots installed, you can set usepkgs = 'pheatmap' or usepkgs = 'gplots' to generate a heatmap. By default, rows and columns will be ordered by hierarchical clustering. One benefit of using pheatmap or gplots is the ability to include group annotations from the config file:

# example using pheatmap
plot_multi(psi, config=config, usepkg = 'pheatmap')

Other use cases:

## Not run
# For cRPKM, use expr = TRUE
plot_multi(crpkm, expr = TRUE)

# To disable clustering of events
plot_multi(psi, cluster_rows = FALSE)

# To disable clustering of samples
plot_multi(psi, cluster_cols = FALSE)

# To generate a pheatmap-based heatmap 
plot_multi(psi, usepkg = "pheatmap")

Customizing plots

There are two ways to customize the plots: using a configuration file or using R arguments.

Using a custom config file

In vast-tools plot, an optional config file can be used to customize the plots' visual appearance. The same config file can be supplied here as well.

plot_event(psi[1,], config = "/path/to/config")

# config can also be pre-loaded into a data frame
cfg <- read.table("/path/to/config", header = TRUE, sep = "\t", stringsAsFactor = FALSE)
plot_event(psi[1,], config = cfg)
plot_multi(psi, config = cfg)
plot_expr(crpkm[1,], config = cfg)

This file is tab-delimited and must be manually created. For example:

# sample config data
config

The header is required, but the order of the columns is flexible. There are two required columns and three optional:

Required:

  1. SampleName: Name of the sample. MUST match sample name in input table.
  2. GroupName: Group name. Use for plotting the average PSI of samples belonging to the same group (enable by setting groupmean=TRUE).

Optional:

  1. Order: A specific ordering of the samples from left to right of the plot. When not specified, then the natural order of the config file will be used.
  2. SubgroupName: Subgroup name. Use for plotting the average PSI of samples belonging to the same subgroup INSTEAD of the individual samples (enable by setting subg=TRUE). See [Subgrouping samples] for more details.
  3. RColorCode: An R color specification:
    1. color name (as specified by colors()).
    2. hex color code (#rrggbb).

Tips:

config2 <- config
config2$RColorCode <- NA
head(config2)
plot_event(psi[1,], config2)

Using R

The colors and other graphical parameters can also be configured in R via arguments. The plot_* family of methods provide a limited set of graphical arguments and can be used in conjunction with a config file. See the help pages for each method (e.g. ?plot_event) for more details on the available options.

For example, the following command uses config to customize the colors and groupings, sets the point symbol, and restricts the y-axis to (20, 80):

plot_event(psi[1,], config = config, pch = 9, ylim = c(20, 80))

Note: Samples 6, 7, 8 fall outside of this range and thus are not shown.

Because the plot_* methods return ggplot2 objects, you can further customize the plots by appending additional aesthetics or themes (if the option is not supported within the method itself). For example, to increase the text size of the legend:

library(ggplot2)
plot_event(psi[1,], config = config) + theme(legend.text = element_text(size = 20))

Certain options may interfere or overwite those already set within the plot_* methods, so please use at your discretion.

Options available for all plotting functions

Filtering samples by quality score

The qual argument can be used to indicate the minimum value of the first vast-tools quality score that a PSI value needs to have in order to be accepted. PSI values with lower quality scores will be coerced to NA.

The possible values for this score are, from worse to better: N, VLOW, LOW, OK, and SOK. By default, the minimum quality accepted is VLOW. For more information on vast-tools quality scores, please see the description of vast-tools output (column 8).

Subgrouping samples

The subg argument allows pooling samples into subgroups if there is a SubgroupName column included in the supplied config file (see [Customizing plots]). The PSI of a subgroup is the mean of the PSIs of its individual samples, after filtering out the samples with quality score below the threshold indicated by qual (see [Filtering samples by quality score]).

plot_event(psi[1,], config = config, subg = TRUE)

Alternatively, rather than plotting the average PSI, the individual values can be plotted by setting the option subj.show to "all" or "beeswarm" (default is "mean"). The latter uses ggbeeswarm to make beeswarm scatterplots. One limitation of beeswarm plots is that error bars cannot be plotted, at least to my knowledge. If error bars are needed, choose "all":

library(dplyr)
config2 <- config %>% 
  mutate(
    SubgroupName = ifelse(Order <= 5, "Subgroup1", "Subgroup2")
  )
plot_event(psi[1,], config = config2, subg = TRUE, subg.show = "all")

To illustrate a subgrouped plot using beeswarm plots, we'll constrat a larger PSI table with 100 samples:

set.seed(416)
large_psi <- tibble(
  GENE = "TEST",
  EVENT = "HsaEx0000",
  COORD = "chr1:100-200",
  LENGTH = 100,
  FullCO = "chr1:1,100-200,210",
  COMPLEX = "S",
  SampleName = paste0("Sample", sprintf("%0.3d", 1:100)),
  PSI = c(rbeta(30, 10, 90), rbeta(30, 10, 11), rbeta(40, 9,2))*100,
  Q = "OK,OK,OK,OK,S@50,50"  # dummy quality score - not used
)
psi_temp <- tidyr::spread(large_psi, SampleName, PSI) %>% 
  select(-Q)
qual_temp <- large_psi %>% 
  mutate(SampleName = paste0(SampleName, "-Q")) %>% 
  select(-PSI) %>% 
  tidyr::spread(SampleName, Q)

# Put everything together
large_config <- tibble(
  SampleName = large_psi$SampleName,
  GroupName = c(rep("ESC", 30), rep("Muscle", 30), rep("Brain", 40)),
  SubgroupName = c(rep("Subgroup1", 10), 
                   rep("Subgroup2", 20), 
                   rep("Subgroup3", 30),
                   rep("Subgroup4", 40))
)
large_psi <- inner_join(psi_temp, qual_temp)
large_psi <- large_psi[, c(names(large_psi)[1:6], 
                           sort(names(large_psi)[7:ncol(large_psi)]))]

plot_event(large_psi[1,], large_config, subg = T, subg.show = "beeswarm")


kcha/psiplot documentation built on March 27, 2022, 4:20 a.m.