In talgalili/heatmaplyExamples: 'Heatmaply' Examples

library(knitr)
knitr::opts_chunk$set(
   # cache = TRUE,
   dpi = 60,
  comment = '#>',
  tidy = FALSE)

Author: Alan O'Callaghan (alan.b.ocallaghan@gmail.com)

Introduction

Due to the size of the objects, this file is seperated to three fils. You can view this series in the following links:

Using heatmaply with gene expression data - Visualization of raw and voom-transformed data (all genes)
Using heatmaply with gene expression data - Visualization of raw data (median-centered data, PAM50 genes only)
Using heatmaply with gene expression data - Visualization of voom-transformed data (median-centered data, PAM50 genes only)

This is file 2 in the series.

# Let's load the packages
library(heatmaply)
library(heatmaplyExamples)

Visualization of raw data (median-centered data, PAM50 genes only)

Breast cancer has been studied extensively using gene expression profiling methods (see Breast cancer gene expression). This has lead to the identification of a number of gene sets which stratify patients into molecular subgroups. One such gene set is commonly known as the PAM50 gene set. In this example, we will visualize gene expression patterns using heatmaply.

Centering data before performing clustering tends to result in more meaningful cluster assignment. This is particularly true when the measure of interest is the similarity in patterns across features, rather than the total distance between values. Furthermore, it is typically the difference between samples which is of interest, rather than the difference between measures. Non-centered data may show that all samples measure high for one variable, and low for another, while centered data shows relative differences. Alternatively, one could use a distance measure which is invariant to total distance, such as correlation. Heatmaps of non-centered are shown in another vignette within this package. It can be seen that the concordance between the centered and non-centered heatmaps is mediocre, and the clustering of triple negative samples is not as definite.

In the heatmaps shown below, it is clear that samples appear to cluster loosely based on PAM50 subtype more than the previous examples. Concordance with the assigned labels shown in the row annotation is not complete, however this may be expected, given that a different clustering method was used here (hierarchical clustering, rather than k-medioids).

pam50_genes <- intersect(pam50_genes, rownames(raw_expression))
raw_pam50_expression <- raw_expression[pam50_genes, ]
voomed_pam50_expression <- voomed_expression[pam50_genes, ]

center_raw_mat <- cor_mat_raw_logged - 
    apply(cor_mat_raw_logged, 1, median)

raw_max <- max(abs(center_raw_mat), na.rm=TRUE)
raw_limits <- c(-raw_max, raw_max)

heatmaply(t(center_raw_mat), 
    row_side_colors = tcga_brca_clinical,
    showticklabels = c(FALSE, FALSE),
    fontsize_col = 7.5,
    col = cool_warm(100),
    main = 'Centred log2 read counts, PAM50 genes',
    limits = raw_limits,
    plot_method = 'plotly')

Note that heatmaply_cor is just like heatmaply but with defaults that are better suited for correlation matrix (limits from -1 to 1, and a cold-warm color scheme).

heatmaply_cor(cor(center_raw_mat), 
    row_side_colors = tcga_brca_clinical,
    showticklabels = c(FALSE, FALSE),
    main = 'Sample-sample correlation based on centred, log2 PAM50 read counts',
    plot_method = 'plotly')

Discussion

It may be useful when examining expression heatmaps to identify particularly high or low measures for a single gene in a group of patients, or a gene which shows unusually high or low variance. The mouse-over text available in the heatmaply package allows visual assessment of measures of interest and quick identification of samples or genes with unusual gene expression patterns. Similarly, visualizing correlation heatmaps with heatmaply allows the user to rapidly identify samples with unusually high or low pairwise correlation.