semiPheatmap: A function to draw clustered heatmaps.

View source: R/semi_pheatmap.R

semiPheatmapR Documentation

A function to draw clustered heatmaps.

Description

A function to draw clustered heatmaps where one has better control over some graphical parameters such as cell size, etc.

The function also allows to aggregate the rows using kmeans clustering. This is advisable if number of rows is so big that R cannot handle their hierarchical clustering anymore, roughly more than 1000. Instead of showing all the rows separately one can cluster the rows in advance and show only the cluster centers. The number of clusters can be tuned with parameter kmeansK.

Usage

semiPheatmap(
  mat,
  color = colorRampPalette(rev(brewer.pal(n = 7, name = "RdYlBu")))(100),
  kmeansK = NA,
  breaks = NA,
  borderColor = "grey60",
  cellWidth = NA,
  cellHeight = NA,
  scale = "none",
  clusterRows = TRUE,
  clusterCols = TRUE,
  clusteringDistanceRows = "euclidean",
  clusteringDistanceCols = "euclidean",
  clusteringMethod = "complete",
  clusteringCallback = .identity2,
  cutreeRows = NA,
  cutreeCols = NA,
  treeHeightRow = ifelse(clusterRows, 50, 0),
  treeHeightCol = ifelse(clusterCols, 50, 0),
  legend = TRUE,
  legendBreaks = NA,
  legendLabels = NA,
  annotationRow = NA,
  annotationCol = NA,
  annotation = NA,
  annotationColors = NA,
  annotationLegend = TRUE,
  annotationNamesRow = TRUE,
  annotationNamesCol = TRUE,
  dropLevels = TRUE,
  showRownames = TRUE,
  showColnames = TRUE,
  main = NA,
  fontSize = 10,
  fontSizeRow = fontSize,
  fontSizeCol = fontSize,
  displayNumbers = FALSE,
  numberFormat = "%.2f",
  numberColor = "grey30",
  fontSizeNumber = 0.8 * fontSize,
  gapsRow = NULL,
  gapsCol = NULL,
  labelsRow = NULL,
  labelsCol = NULL,
  fileName = NA,
  width = NA,
  height = NA,
  silent = FALSE,
  rowLabel,
  colLabel,
  rowGroupOrder = NULL,
  colGroupOrder = NULL,
  ...
)

Arguments

mat

numeric matrix of the values to be plotted.

color

vector of colors used in heatmap.

kmeansK

the number of kmeans clusters to make, if we want to agggregate the rows before drawing heatmap. If NA then the rows are not aggregated.

breaks

Numeric vector. A sequence of numbers that covers the range of values in the normalized 'counts'. Values in the normalized 'matrix' are assigned to each bin in 'breaks'. Each break is assigned to a unique color from 'col'. If NULL, then breaks are calculated automatically. Default NULL.

borderColor

color of cell borders on heatmap, use NA if no border should be drawn.

cellWidth

individual cell width in points. If left as NA, then the values depend on the size of plotting window.

cellHeight

individual cell height in points. If left as NA, then the values depend on the size of plotting window.

scale

character indicating if the values should be centered and scaled in either the row direction or the column direction, or none. Corresponding values are "row", "column" and "none".

clusterRows

boolean values determining if rows should be clustered or hclust object,

clusterCols

boolean values determining if columns should be clustered or hclust object.

clusteringDistanceRows

distance measure used in clustering rows. Possible values are "correlation" for Pearson correlation and all the distances supported by dist, such as "euclidean", etc. If the value is none of the above it is assumed that a distance matrix is provided.

clusteringDistanceCols

distance measure used in clustering columns. Possible values the same as for clusteringDistanceRows.

clusteringMethod

clustering method used. Accepts the same values as hclust.

clusteringCallback

callback function to modify the clustering. Is called with two parameters: original hclust object and the matrix used for clustering. Must return a hclust object.

cutreeRows

number of clusters the rows are divided into, based on the hierarchical clustering (using cutree), if rows are not clustered, the argument is ignored

cutreeCols

similar to cutreeRows, but for columns

treeHeightRow

the height of a tree for rows, if these are clustered. Default value 50 points.

treeHeightCol

the height of a tree for columns, if these are clustered. Default value 50 points.

legend

logical to determine if legend should be drawn or not.

legendBreaks

vector of breakpoints for the legend.

legendLabels

vector of labels for the legendBreaks.

annotationRow

data frame that specifies the annotations shown on left side of the heatmap. Each row defines the features for a specific row. The rows in the data and in the annotation are matched using corresponding row names. Note that color schemes takes into account if variable is continuous or discrete.

annotationCol

similar to annotationRow, but for columns.

annotation

deprecated parameter that currently sets the annotationCol if it is missing.

annotationColors

list for specifying annotationRow and annotationCol track colors manually. It is possible to define the colors for only some of the features. Check examples for details.

annotationLegend

boolean value showing if the legend for annotation tracks should be drawn.

annotationNamesRow

boolean value showing if the names for row annotation tracks should be drawn.

annotationNamesCol

boolean value showing if the names for column annotation tracks should be drawn.

dropLevels

logical to determine if unused levels are also shown in the legend.

showRownames

boolean specifying if column names are be shown.

showColnames

boolean specifying if column names are be shown.

main

the title of the plot

fontSize

base fontsize for the plot

fontSizeRow

fontsize for rownames (Default: fontsize)

fontSizeCol

fontsize for colnames (Default: fontsize)

displayNumbers

logical determining if the numeric values are also printed to the cells. If this is a matrix (with same dimensions as original matrix), the contents of the matrix are shown instead of original values.

numberFormat

format strings (C printf style) of the numbers shown in cells. For example "%.2f" shows 2 decimal places and "%.1e" shows exponential notation (see more in sprintf).

numberColor

color of the text

fontSizeNumber

fontsize of the numbers displayed in cells

gapsRow

vector of row indices that show shere to put gaps into heatmap. Used only if the rows are not clustered. See cutreeRow to see how to introduce gaps to clustered rows.

gapsCol

similar to gapsRow, but for columns.

labelsRow

custom labels for rows that are used instead of rownames.

labelsCol

similar to labelsRow, but for columns.

fileName

file path where to save the picture. Filetype is decided by the extension in the path. Currently following formats are supported: png, pdf, tiff, bmp, jpeg. Even if the plot does not fit into the plotting window, the file size is calculated so that the plot would fit there, unless specified otherwise.

width

manual option for determining the output file width in inches.

height

manual option for determining the output file height in inches.

silent

do not draw the plot (useful when using the gtable output)

rowLabel

row cluster labels for semi-clustering

colLabel

column cluster labels for semi-clustering

rowGroupOrder

Vector. Specifies the order of feature clusters when semisupervised clustering is performed on the y labels.

colGroupOrder

Vector. Specifies the order of cell clusters when semisupervised clustering is performed on the z labels.

...

graphical parameters for the text used in plot. Parameters passed to grid.text, see gpar.

Value

Invisibly a list of components

  • treeRow the clustering of rows as hclust object

  • treeCol the clustering of columns as hclust object

  • kmeans the kmeans clustering of rows if parameter kmeansK was specified

Author(s)

Raivo Kolde <rkolde@gmail.com> #@examples # Create test matrix test = matrix(rnorm(200), 20, 10) test[seq(10), seq(1, 10, 2)] = test[seq(10), seq(1, 10, 2)] + 3 test[seq(11, 20), seq(2, 10, 2)] = test[seq(11, 20), seq(2, 10, 2)] + 2 test[seq(15, 20), seq(2, 10, 2)] = test[seq(15, 20), seq(2, 10, 2)] + 4 colnames(test) = paste("Test", seq(10), sep = "") rownames(test) = paste("Gene", seq(20), sep = "")

# Draw heatmaps pheatmap(test) pheatmap(test, kmeansK = 2) pheatmap(test, scale = "row", clusteringDistanceRows = "correlation") pheatmap(test, color = colorRampPalette(c("navy", "white", "firebrick3"))(50)) pheatmap(test, cluster_row = FALSE) pheatmap(test, legend = FALSE)

# Show text within cells pheatmap(test, displayNumbers = TRUE) pheatmap(test, displayNumbers = TRUE, numberFormat = "%.1e") pheatmap(test, displayNumbers = matrix(ifelse(test > 5, "*", ""), nrow(test))) pheatmap(test, cluster_row = FALSE, legendBreaks = seq(-1, 4), legendLabels = c("0", "1e-4", "1e-3", "1e-2", "1e-1", "1"))

# Fix cell sizes and save to file with correct size pheatmap(test, cellWidth = 15, cellHeight = 12, main = "Example heatmap") pheatmap(test, cellWidth = 15, cellHeight = 12, fontSize = 8, fileName = "test.pdf")

# Generate annotations for rows and columns annotationCol = data.frame(CellType = factor(rep(c("CT1", "CT2"), 5)), Time = seq(5)) rownames(annotationCol) = paste("Test", seq(10), sep = "")

annotationRow = data.frame(GeneClass = factor(rep(c("Path1", "Path2", "Path3"), c(10, 4, 6)))) rownames(annotationRow) = paste("Gene", seq(20), sep = "")

# Display row and color annotations pheatmap(test, annotationCol = annotationCol) pheatmap(test, annotationCol = annotationCol, annotationLegend = FALSE) pheatmap(test, annotationCol = annotationCol, annotationRow = annotationRow)

# Specify colors ann_colors = list(Time = c("white", "firebrick"), CellType = c(CT1 = "#1B9E77", CT2 = "#D95F02"), GeneClass = c(Path1 = "#7570B3", Path2 = "#E7298A", Path3 = "#66A61E"))

pheatmap(test, annotationCol = annotationCol, annotationColors = ann_colors, main = "Title") pheatmap(test, annotationCol = annotationCol, annotationRow = annotationRow, annotationColors = ann_colors) pheatmap(test, annotationCol = annotationCol, annotationColors = ann_colors[2])

# Gaps in heatmaps pheatmap(test, annotationCol = annotationCol, clusterRows = FALSE, gapsRow = c(10, 14)) pheatmap(test, annotationCol = annotationCol, clusterRows = FALSE, gapsRow = c(10, 14), cutreeCol = 2)

# Show custom strings as row/col names labelsRow = c("", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "Il10", "Il15", "Il1b")

pheatmap(test, annotationCol = annotationCol, labelsRow = labelsRow)

# Specifying clustering from distance matrix drows = stats::dist(test, method = "minkowski") dcols = stats::dist(t(test), method = "minkowski") pheatmap(test, clusteringDistanceRows = drows, clusteringDistanceCols = dcols)

# Modify ordering of the clusters using clustering callback option callback = function(hc, mat) sv = svd(t(mat))$v[, 1] dend = reorder(as.dendrogram(hc), wts = sv) as.hclust(dend)

pheatmap(test, clusteringCallback = callback)


campbio/celda documentation built on April 5, 2024, 11:47 a.m.