In jrboyd/ssvRecipes: Recipes using seqsetvis

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(cache = TRUE)

Intro

ssvHeatmap2() is meant to replace my heatmap.3 series of functions.

It works in a friendly way with seqsetvis. In fact the data in heatmap_demo_matrix is a slice of seqsetvis::CTCF_in_10a_profiles_dt which was made with seqsetvis::ssvFetchBam(). ssvHeatmap2() uses multiple functions from seqsetvis, particularly the clustering.

Installation

#you need seqsetvis
source("https://bioconductor.org/biocLite.R")
biocLite("seqsetvis")
#if still on R 3.4 :
#devtools::install("jrboyd/seqsetvis", ref = "R3.4")
devtools::install_github("jrboyd/ssvRecipes")

Setup

library(ggplot2)
library(cowplot)
library(seqsetvis)
library(ssvRecipes)
library(data.table)

Quick demos of output before getting into details.

ChIP-seq:

dt = seqsetvis::CTCF_in_10a_profiles_dt
#trimming the existing "_" is critical for ssvHeatmap2 to work
dt[, sample := tstrsplit(sample, split = "_", keep = 1)]
dt[, FE := y]
resChIP = ssvHeatmap2(dt, fill_ = "FE", main_title = "ChIP-seq demo")
plot(resChIP)

RNA-seq

dt_rna = melt(ssvRecipes::dt_rna, id.vars = "Id")
#Fix SF style colnames
dt_rna[, sample := tstrsplit(variable, "\\.", keep = 2)]
dt_rna[, sample := sub("dF4", "", sample)]
dt_rna[, sample := sub("r", "_r", sample)]
dt_rna[, log2Value := log2(value+1)]

resRna = ssvHeatmap2(dt_rna, main_title = "RNA-seq demo", treatment_space_size = 0,
                     row_ = "Id", 
                     treatment_ = "treat", 
                     column_ = "sample", 
                     fill_ = "log2Value", 
                     heatmap_colors = c("slategray", safeBrew(5, "reds")[4:5]))
plot(resRna)

20 random genes from cluster 1

sample(h.cluster_members(resRna)[[1]], 20)

matrix as input

mat = ssvRecipes::heatmap_demo_matrix
head(mat)

Some colname adjustments to make plots look more "real"

colnames(mat) = sub("A", "ctrl", colnames(mat))
colnames(mat) = sub("B", "drug1", colnames(mat))
colnames(mat) = sub("C", "drug2", colnames(mat))
rownames(mat) = paste0("gene_", rownames(mat))
head(mat)

Note that mat is a matrix with rownames and colnames set. Format for colnames is very specific - they must have a single underscore "_".

The contents preceding the "_" define how samples are grouped.

The contents following the "_" are mostly for tracking but should be unique per sample group.

res = ssvHeatmap2(mat)
res

Reusing cluster results.

This avoids potentially slow and unnecessary clustering if you're tweaking graphical paramters.

You could have also set the "cluster_id" variable manually or used ssvSignalClustering() to set it before calling ssvHeatmap2() at all.

clust = h.cluster_data(resChIP)
resChIP_replot = ssvHeatmap2(clust[sample == "MCF10A"], fill_ = "FE")
plot(resChIP_replot)

Thanks for the help! TLDR.

But where is my heatmap!?

You just need to call plot()

plot(res)

Parts of Output

But I want to know what's in cluster 3!

h.cluster_members(res)[[3]]

But I want to rearrange things!

The intermediate figure assembly objects are accessible.

p_groups = h.plot_parts_grouped(res)
pg_top = plot_grid(p_groups$cluster_bars, p_groups$blank, p_groups$heatmap, rel_widths = c(.2, .1, 1), nrow = 1)
pg_bottom = plot_grid(p_groups$cluster_bars, p_groups$cluster_connector, p_groups$cluster_aggregation, rel_widths = c(.2, .1, 1), nrow = 1)

plot_grid(pg_top, pg_bottom, ncol = 1, scale = .8)

But I just want aggregate plots for cluster 3!

You can extract the individual ggplots before any assembly.

pp = h.plot_parts_individual(res)
pp$clust_agg_3

I want to do my own thing!

You can extract the data used to make these plots and do whatever.

Here's a weird example.

cdata = h.cluster_data(res)
cdata = cdata[order(cluster_id)]
grps = cdata[, paste(column, cluster_id)]
grps = factor(grps, levels = unique(grps))
boxplot(cdata$y ~ grps)

Side Plot Methods

There are different built in side plot methods.

The default:

plot(ssvHeatmap2(mat, side_plot_type = "lines1"))

A line alternative:

plot(ssvHeatmap2(mat, side_plot_type = "lines2", 
            side_plot_colors = safeBrew(8, "blues")[c(4,6,8)],
            heatmap_colors = c("white", "black", "purple")))

Barplots:

plot(ssvHeatmap2(mat, side_plot_type = "bars1",
            heatmap_colors = c("darkgreen", "white", "darkorange"),
            side_plot_colors = safeBrew(3, "Set1")))

Another barplot style:

plot(ssvHeatmap2(mat, side_plot_type = "bars2",
            heatmap_colors = c("darkblue", "white", "darkred"),
            side_plot_colors = safeBrew(3, "Set2")))

Tweaking The Figure

The basic, again

plot(ssvHeatmap2(mat))

This figure is composed of 4 rows and 5 columns. The relative size of each is controlled by 2 numberic vectors main_heights and main_widths.

Here's how it breaks down:

Vertically Smoosh the plot to make more room for legends

plot(ssvHeatmap2(mat, main_heights = c(.6, .1, .2, .8)))

Horizontally smoosh heatmap to increase side plot size.

plot(ssvHeatmap2(mat, main_widths = c(.6, .1, .2, .8, 2)))

Ordering Columns

Two paramters, treatment_ordering and replicate_ordering allow manual specification of column order.

resRna = ssvHeatmap2(dt_rna, 
                     treatment_ordering = c("M", "U", "M40"),
                     replicate_ordering = c("r2", "r1", "r3"),
                     main_title = "RNA-seq demo - manual order", 
                     treatment_space_size = 0,
                     row_ = "Id", 
                     treatment_ = "treat", 
                     column_ = "sample", 
                     fill_ = "log2Value", 
                     heatmap_colors = c("slategray", safeBrew(5, "reds")[4:5]))
plot(resRna)

When these ordering parameters are omitted, the default is to use the order in which data appears in the input. For instance, reversing the order of data.table rows reverses the heatmap and sideplot columns.

resRna = ssvHeatmap2(dt_rna[rev(seq_len(nrow(dt_rna)))], 
                     main_title = "RNA-seq demo - flipped order", 
                     treatment_space_size = 0,
                     row_ = "Id", 
                     treatment_ = "treat", 
                     column_ = "sample", 
                     fill_ = "log2Value", 
                     heatmap_colors = c("slategray", safeBrew(5, "reds")[4:5]))
plot(resRna)