aheatmap: A function to draw clustered heatmaps.

Description Usage Arguments Details Value Author(s)

View source: R/aheatmap.R

Description

A function to draw clustered heatmaps where one has better control over some graphical parameters such as cell size, etc.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
aheatmap(mat, color = bluered(100), kmeans_k = NA, breaks = NA,
  midpoint = NA, border_color = "grey60", cellwidth = NA,
  cellheight = NA, shrink = 0.92, scale = "row",
  clusterWithScaledData = FALSE, cluster_rows = TRUE, cluster_cols = TRUE,
  clustering_distance_rows = "correlation",
  clustering_distance_cols = "correlation", clustering_method = "ward",
  treeheight_row = ifelse(cluster_rows, 50, 0),
  treeheight_col = ifelse(cluster_cols, 50, 0), legend = TRUE,
  legend_breaks = NA, legend_labels = NA, annotation = NA,
  annotation_colors = NA, annoHeaderPosition = NA,
  annotation_legend = TRUE, drop_levels = TRUE, show_rownames = T,
  show_colnames = T, main = "", fontsize = 10, fontsize_row = fontsize,
  fontsize_col = fontsize, display_numbers = F, number_format = "%.2f",
  number_rotation = 0, fontsize_number = 0.8 * fontsize, filename = NA,
  width = NA, height = NA, cexRow, cexCol, labRow = NA, labCol = NA,
  truncate = NA, q1 = 0.01, q2 = 0.99, Lower, Upper, returnTree = FALSE,
  ...)

Arguments

mat

numeric matrix of the values to be plotted.

color

vector of colors used in heatmap.

kmeans_k

the number of kmeans clusters to make, if we want to agggregate the rows before drawing heatmap. If NA then the rows are not aggregated.

breaks

a sequence of numbers that covers the range of values in mat and is one element longer than color vector. Used for mapping values to colors. Useful, if needed to map certain values to certain colors, to certain values. If value is NA then the breaks are calculated automatically.

midpoint

whether explicitely to match a mid point of the values to a mid-colour (usually white color) for the main image. This is useful when one what to hide result around i,e. 0 which is not an interesting result. This value is based on the matrix inputted for color matrix. Therefore, if scaled, it is on the scaled scale.

border_color

color of cell borders on heatmap, use NA if no border should be drawn.

cellwidth

individual cell width in points. If left as NA, then the values depend on the size of plotting window.

cellheight

individual cell height in points. If left as NA, then the values depend on the size of plotting window.

shrink

sometimes no space is left for margin; set shrink (percentage of current figure) will the image

scale

character indicating if the values should be centered and scaled (first subtracts mean and then scaled by sd) in either the row direction or the column direction, or none. Corresponding values are "row", "column" and "none"; By default, scale by row is applied which is mostly done in microarray data (scale each gene)

clusterWithScaledData

this parameter controls whether the scaling of input matrix would affect clustering. By default this is set to be FALSE, which means clustering only affects the color, not affecting the dendrogram and the numbers showing; if set to be TRUE, the scaled data would be used for clustering and coloring (as well as the numbers shown) if necessary. It is recommended to set this as FALSE since scaling would attenuate the difference among samples and hence hide important pattern.

cluster_rows

boolean values determining if rows should be clustered,

cluster_cols

boolean values determining if columns should be clustered.

clustering_distance_rows

distance measure used in clustering rows. Possible values are "correlation" for Pearson correlation and all the distances supported by dist, such as "euclidean", etc. If the value is none of the above it is assumed that a distance matrix is provided.

clustering_distance_cols

distance measure used in clustering columns. Possible values the same as for clustering_distance_rows.

clustering_method

clustering method used. Accepts the same values as hclust.

treeheight_row

the height of a tree for rows, if these are clustered. Default value 50 points.

treeheight_col

the height of a tree for columns, if these are clustered. Default value 50 points.

legend

logical to determine if legend should be drawn or not.

legend_breaks

vector of breakpoints for the legend.

legend_labels

vector of labels for the legend_breaks.

annotation

data frame that specifies the annotations shown on top of the columns. Each row defines the features for a specific column. The columns in the data and rows in the annotation are matched using corresponding row and column names. Note that color schemes takes into account if variable is continuous or discrete.

annotation_colors

list for specifying annotation track colors manually. It is possible to define the colors for only some of the features. Check examples for details. For categorical column bar, define a named vector so that color -category will be correctly mapped; for continuous column bar, the specified color vector will be used to interpolate colors for values inbetween.

annoHeaderPosition

if left, which is the old one that adds annotation bar names at the left of the bars (c(3, 1)); now we add right so that the barname shows to the right. This is needed when no row dendrogram where he header will not show completely.

annotation_legend

boolean value showing if the legend for annotation tracks should be drawn.

drop_levels

logical to determine if unused levels are also shown in the legend

show_rownames

boolean specifying if column names are be shown.

show_colnames

boolean specifying if column names are be shown.

main

the title of the plot. set to "" so that there will be one line blank on top for good visual

fontsize

base fontsize for the plot

fontsize_row

fontsize for rownames (Default: fontsize)

fontsize_col

fontsize for colnames (Default: fontsize)

display_numbers

logical determining if the numeric values are also printed to the cells.

number_format

format strings (C printf style) of the numbers shown in cells.

number_rotation

rotation for number text display. default is 0 For example "%.2f" shows 2 decimal places and "%.1e" shows exponential notation (see more in sprintf).

fontsize_number

fontsize of the numbers displayed in cells

filename

file path where to save the picture. Filetype is decided by the extension in the path. Currently following formats are supported: png, pdf, tiff, bmp, jpeg. Even if the plot does not fit into the plotting window, the file size is calculated so that the plot would fit there, unless specified otherwise.

width

manual option for determining the output file width in inches.

height

manual option for determining the output file height in inches.

cexRow

scale rownames by this factor.

cexCol

scale colnames by this factor.

labRow

labRow as in heatmap.2 Notice this is specified as the input matrix; if ordered, i.e. by clustering, the names should be the original order since it will be internally updated.

labCol

labCol as in heatmap.2 Notice this is specified as the input matrix; if ordered, i.e. by clustering, the names should be the original order since it will be internally updated.

truncate
q1

parameter q1 to truncByLimit

q2

parameter q2 to truncByLimit

Lower

parameter Lower to truncByQuantile()

Upper

parameter Upper to truncByQuantile()

returnTree

whether to return the clustering tree back. Default is FALSE

...

graphical parameters for the text used in plot. Parameters passed to grid.text, see gpar.

Details

Default distance is correlation for both row and column; linkage is set to ward by default. Scaling is applied to rows by default. Modification has been made so that scaling only applies how the matrix is colored; the old matrix would be used to do the clustering. For the annotation data frame to show as a column bar, categorical variables need to be a factor; the function can calculate default colors But do not forget to have rownames of annotation data frame set to colnames of the matrix!

Added features: (1) truncate=NA, q1=0.01, q2=0.99, Lower, Upper: this is to truncate the valus to save color space. if truncate=FALSE, no truncation; if NA, it will be internally set to TRUE if scale!='none' (2) ...

The function also allows to aggregate the rows using kmeans clustering. This is advisable if number of rows is so big that R cannot handle their hierarchical clustering anymore, roughly more than 1000. Instead of showing all the rows separately one can cluster the rows in advance and show only the cluster centers. The number of clusters can be tuned with parameter kmeans_k.

Value

return the following list if returnTree is TRUE

Author(s)

Pan Tong nickytong@gmail.com


nickytong/GenAnalysis documentation built on July 20, 2019, 8:57 a.m.