Heatmap for showing clustering results and more

Share:

Description

Make heatmap with color scale from one matrix and hiearchical clustering of samples/features from another. Also built in functionality for showing the clusterings with the heatmap. Builds on aheatmap function of NMF package.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## S4 method for signature 'SummarizedExperiment'
plotHeatmap(data, isCount = FALSE,
  transFun = NULL, ...)

## S4 method for signature 'ClusterExperiment'
plotHeatmap(data,
  clusterSamplesData = c("hclust", "dendrogramValue", "orderSamplesValue",
  "primaryCluster"), clusterFeaturesData = c("var", "all", "PCA"),
  nFeatures = NULL, visualizeData = c("transformed", "centeredAndScaled",
  "original"), whichClusters = c("primary", "workflow", "all", "none"),
  sampleData = NULL, clusterFeatures = TRUE, colorScale, ...)

## S4 method for signature 'matrix'
plotHeatmap(data, sampleData = NULL,
  clusterSamplesData = NULL, clusterFeaturesData = NULL,
  whSampleDataCont = NULL, clusterSamples = TRUE, showSampleNames = FALSE,
  clusterFeatures = TRUE, showFeatureNames = FALSE, colorScale = seqPal5,
  clusterLegend = NULL, alignSampleData = FALSE,
  unassignedColor = "white", missingColor = "grey", breaks = NA,
  isSymmetric = FALSE, overRideClusterLimit = FALSE, ...)

## S4 method for signature 'ClusterExperiment'
plotCoClustering(data,
  invert = ifelse(!is.null(data@coClustering) && all(diag(data@coClustering)
  == 0), TRUE, FALSE), ...)

Arguments

data

data to use to determine the heatmap. Can be a matrix, ClusterExperiment or SummarizedExperiment object. The interpretation of parameters depends on the type of the input.

isCount

logical. Whether the data are in counts, in which case the default transFun argument is set as log2(x+1). This is simply a convenience to the user, and can be overridden by giving an explicit function to transFun.

transFun

function A function to use to transform the input data matrix before clustering.

...

for signature matrix, arguments passed to aheatmap. For the other signatures, passed to the method for signature matrix. Not all arguments can be passed to aheatmap effectively, see details.

clusterSamplesData

If data is a matrix, either a matrix that will be used to in hclust to define the hiearchical clustering of samples (e.g. normalized data) or a pre-existing dendrogram that clusters the samples. If data is a ClusterExperiment object, the input should be either character or integers. Indicates how (and whether) the samples should be clustered (or gives indices of the order for the samples). See details.

clusterFeaturesData

If data is a matrix, either a matrix that will be used in hclust to define the hiearchical clustering of features (e.g. normalized data) or a pre-existing dendrogram that clusters the features. If data is a ClusterExperiment object, the input should be either character or integers indicating which features should be used (see details).

nFeatures

integer indicating how many features should be used (if clusterFeaturesData is 'var' or 'PCA').

visualizeData

either a character string, indicating what form of the data should be used for visualizing the data (i.e. for making the color-scale), or a data.frame/matrix with same dimensions of assay(data).

whichClusters

character string, or vector of characters or integers, indicating what clusters should be visualized with the heatmap.

sampleData

If input is either a ClusterExperiment or SummarizedExperiment object, then sampleData must index the sampleData stored as a DataFrame in colData slot of the object. Whether that data is continuous or not will be determined by the properties of colData (no user input is needed). If input is matrix, sampleData is a matrix of additional data on the samples to show above heatmap. Unless indicated by whSampleDataCont, sampleData will be converted into factors, even if numeric. “-1” indicates the sample was not assigned to a cluster and gets color ‘unassignedColor’ and “-2“ gets the color 'missingColor'.

clusterFeatures

Logical as to whether to do hiearchical clustering of features (if FALSE, any input to clusterFeaturesData is ignored).

colorScale

palette of colors for the color scale of the heatmap.

whSampleDataCont

Which of the sampleData columns are continuous and should not be converted to counts. NULL indicates no additional sampleData.

clusterSamples

Logical as to whether to do hierarchical clustering of cells (if FALSE, any input to clusterSamplesData is ignored).

showSampleNames

Logical as to whether show sample names.

showFeatureNames

Logical as to whether show feature names.

clusterLegend

Assignment of colors to the clusters. If NULL, sampleData columns will be assigned colors internally. clusterLegend should be list of length equal to ncol(sampleData) with names equal to the colnames of sampleData. Each element of the list should be a either the format requested by aheatmap (a vector of colors with names corresponding to the levels of the column of sampleData), or should be format of ClusterExperiment.

alignSampleData

Logical as to whether should align the colors of the sampleData (only if clusterLegend not given and sampleData is not NULL).

unassignedColor

color assigned to cluster values of '-1' ("unassigned").

missingColor

color assigned to cluster values of '-2' ("missing").

breaks

Either a vector of breaks (should be equal to length 52), or a number between 0 and 1, indicating that the breaks should be equally spaced (based on the range in the data) upto the ‘breaks’ quantile, see setBreaks

isSymmetric

logical. if TRUE indicates that the input matrix is symmetric. Useful when plotting a co-clustering matrix or other sample by sample matrices (e.g., correlation).

overRideClusterLimit

logical. Whether to override the internal limit that only allows 10 clusterings/annotations. If overridden, may result in incomprehensible errors from aheatmap. Only override this if you have a very large plotting device and want to see if aheatmap can render it.

invert

logical determining whether the coClustering matrix should be inverted to be 1-coClustering for plotting. By default, if the diagonal elements are all zero, invert=TRUE, and otherwise invert=FALSE. If coClustering matrix is not a 0-1 matrix (e.g. if equal to a distance matrix output from clusterSingle, then the user should manually set this parameter to FALSE.)

Details

The plotHeatmap function calls aheatmap to draw the heatmap. The main points of plotHeatmap are to 1) allow for different matrix inputs, separating out the color scale visualization and the clustering of the samples/features. 2) to visualize the clusters and meta data with the heatmap. The intended use case is to allow the user to visualize the original count scale of the data (on the log-scale), but create the hierarchical clustering on another, more appropriate dataset for clustering, such as normalized data. Similarly, some of the palettes in the package were developed assuming that the visualization might be on unscaled/uncentered data, rather than the residual from the mean of the gene, and thus palettes need to take on a greater range of relevant values so as to show meaningful comparisons with genes on very different scales.

If data is a ClusterExperiment object, visualizeData indicates what kind of transformation should be done to assay(data) for calculating the color scale. The features will be clustered based on these data as well. A different data.frame or matrix can be given for the visualization. For example, if the ClusterExperiment object contains normalized data, but the user wishes that the color scale be based on the log-counts for easier interpretation, visualizeData could be set to be the log2(counts + 1).

If data is a ClusterExperiment object, clusterSamplesData can be used to indicate the type of clustering for the samples. If equal to 'dendrogramValue' the dendrogram stored in data will be used; if missing, a new one will be created based on the primaryCluster of data. If equal to "hclust", then standard hierachical clustering of the transformed data will be used. If 'orderSamplesValue' no clustering of the samples will be done, and instead the samples will be ordered as in the slot orderSamples of data. If equal to 'primaryCluster', again no clustering will be done, and instead the samples will be ordered based on grouping the samples to match the primaryCluster of data. If not one of these values, clusterSamplesData can be a character vector matching the clusterLabels (colnames of clusterMatrix).

If data is a matrix, then sampleData is a data.frame of annotation data to be plotted above the heatmap and whSampleDataCont gives the index of the column(s) of this dataset that should be consider continuous. Otherwise the annotation data for sampleData will be forced into a factor (which will be nonsensical for continous data). If data is a ClusterExperiment object, sampleData should refer to a index or column name of the colData slot of data. In this case sampleData will be added to any choices of clusterings chosen by the whichClusters argument (if any). If both clusterings and sample data are chosen, the clusterings will be shown closest to data (i.e. on bottom).

If data is a ClusterExperiment object, clusterFeaturesData is not a dataset, but instead indicates which features should be shown in the heatmap. "var" selects the nFeatures most variable genes (based on transformation(assay(data))); "PCA" results in a heatmap of the top nFeatures PCAs of the transformation(assay(data)). clusterFeaturesData can also be a vector of characters or integers, indicating the rownames or indices respectively of assay(data) that should be shown. For all of these options, the features are clustered based on the visualizeData data. Finally, in the ClusterExperiment version of plotHeatmap, clusterFeaturesData can be a list of indices or rownames, indicating that the features should be grouped according to the elements of the list, with blank (white) space between them (see makeBlankData for more details). In this case, no clustering is done of the features.

If breaks is a numeric value between 0 and 1, then breaks is assumed to indicate the upper quantile (on the log scale) at which the heatmap color scale should stop. For example, if breaks=0.9, then the breaks will evenly spaced up until the 0.9 upper quantile of data, and then all values after the 0.9 quantile will be absorbed by the upper-most color bin. This can help to reduce the visual impact of a few highly expressed genes (features).

Note that plotHeatmap calls aheatmap under the hood. This allows you to plot multiple heatmaps via par(mfrow=c(2,2)), etc. However, the dendrograms do not resize if you change the size of your plot window in an interactive session of R (this might be a problem for RStudio if you want to pop it out into a large window...).

Many arguments can be passed on to aheatmap, however, some are set internally by plotHeatmap. In particular, setting the values of Rowv or Colv will cause errors. color in aheatmap is replaced by colorScale in plotHeatmap. The annCol to give annotation to the samples is replaced by the sampleData; moreover, the annColors option in aheatmap will also be set internally to give more vibrant colors than the default in aheatmap (for ClusterExperiment objects, these values can also be set in the clusterLegend slot ). Other options should be passed on to aheatmap, though they have not been all tested.

plotCoClustering is a convenience function to plot the heatmap of the co-clustering matrix stored in the coClustering slot of a ClusterExperiment object.

Value

Returns (invisibly) a list with elements

  • aheatmapOut The output from the final call of aheatmap.

  • sampleData the annotation data.frame given to the argument annCol in aheatmap.

  • clusterLegend the annotation colors given to the argument annColors aheatmap.

  • breaks The breaks used for aheatmap, after adjusting for quantile.

Author(s)

Elizabeth Purdom

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
data(simData)

cl <- rep(1:3,each=100)
cl2 <- cl
changeAssign <- sample(1:length(cl), 80)
cl2[changeAssign] <- sample(cl[changeAssign])
ce <- clusterExperiment(simCount, cl2, transformation=function(x){log2(x+1)})

#simple, minimal, example. Show counts, but cluster on underlying means
plotHeatmap(ce)

#assign cluster colors
colors <- bigPalette[20:23]
names(colors) <- 1:3
plotHeatmap(data=simCount, clusterSamplesData=simData,
sampleData=data.frame(cl), clusterLegend=list(colors))

#show two different clusters
anno <- data.frame(cluster1=cl, cluster2=cl2)
out <- plotHeatmap(simData, sampleData=anno)

#return the values to see format for giving colors to the annotations
out$clusterLegend

#assign colors to the clusters based on plotClusters algorithm
plotHeatmap(simData, sampleData=anno, alignSampleData=TRUE)

#assign colors manually
annoColors <- list(cluster1=c("black", "red", "green"),
cluster2=c("blue","purple","yellow"))

plotHeatmap(simData, sampleData=anno, clusterLegend=annoColors)

#give a continuous valued -- need to indicate columns
anno2 <- cbind(anno, Cont=c(rnorm(100, 0), rnorm(100, 2), rnorm(100, 3)))
plotHeatmap(simData, sampleData=anno2, whSampleDataCont=3)

#compare changing breaks quantile on visual effect
## Not run: 
par(mfrow=c(2,2))
plotHeatmap(simData, colorScale=seqPal1, breaks=1, main="Full length")
plotHeatmap(simData,colorScale=seqPal1, breaks=.99, main="0.99 Quantile Upper
Limit")
plotHeatmap(simData,colorScale=seqPal1, breaks=.95, main="0.95 Quantile Upper
Limit")
plotHeatmap(simData, colorScale=seqPal1, breaks=.90, main="0.90 Quantile
Upper Limit")

## End(Not run)