clusterData: Hierarchical cluster analysis
In Huang-lab/oppti: Outlier Protein and Phosphosite Target Identifier

View source: R/analyze.R

clusterData

R Documentation

Hierarchical cluster analysis

Description

Displays the hierarchically clustered data by the "pheatmap" package. The numbers of clusters along the markers/samples can be set by the user, then the cluster structures are estimated by pair-wise analysis.

Usage

clusterData(data, annotation_row = NULL, annotation_col = NULL,
annotation_colors = NULL, main = NA, legend = TRUE,
clustering_distance_rows = "euclidean",
clustering_distance_cols = "euclidean", display_numbers = FALSE,
number_format = "%.0f", num_clusters_row = NULL,
num_clusters_col = NULL, cluster_rows = TRUE, cluster_cols = TRUE,
border_color = "gray60", annotate_new_clusters_col = FALSE,
zero_white = FALSE, color_low = '#006699', color_mid = 'white',
color_high = 'red',color_palette = NULL, show_rownames = FALSE,
show_colnames = FALSE, min_data = min(data, na.rm = TRUE),
max_data = max(data, na.rm = TRUE),
treeheight_row = ifelse(methods::is(cluster_rows, "hclust") ||
cluster_rows, 50, 0), treeheight_col = ifelse(methods::is(cluster_cols,
"hclust") || cluster_cols, 50, 0))

Arguments

`data`	an object of log2-normalized protein (or gene) expressions, containing markers in rows and samples in columns.
`annotation_row`	data frame that specifies the annotations shown on left side of the heat map. Each row defines the features for a specific row. The rows in the data and in the annotation are matched using corresponding row names. Note that color schemes takes into account if variable is continuous or discrete.
`annotation_col`	similar to annotation_row, but for columns.
`annotation_colors`	list for specifying annotation_row and annotation_col track colors manually. It is possible to define the colors for only some of the features.
`main`	character string, an overall title for the plot.
`legend`	logical, to determine if legend should be drawn or not.
`clustering_distance_rows`	distance measure used in clustering rows. Possible values are "correlation" for Pearson correlation and all the distances supported by dist, such as "euclidean", etc. If the value is none of the above it is assumed that a distance matrix is provided.
`clustering_distance_cols`	distance measure used in clustering columns. Possible values the same as for clustering_distance_rows.
`display_numbers`	logical, determining if the numeric values are also printed to the cells. If this is a matrix (with same dimensions as original matrix), the contents of the matrix are shown instead of original values.
`number_format`	format strings (C printf style) of the numbers shown in cells. For example "%.2f" shows 2 decimal places and "%.1e" shows exponential notation (see more in sprintf).
`num_clusters_row`	number of clusters the rows are divided into, based on the hierarchical clustering (using cutree), if rows are not clustered, the argument is ignored.
`num_clusters_col`	similar to num_clusters_row, but for columns.
`cluster_rows`	logical, determining if the rows should be clustered; or a hclust object.
`cluster_cols`	similar to cluster_rows, but for columns.
`border_color`	color of cell borders on heatmap, use NA if no border should be drawn.
`annotate_new_clusters_col`	logical, to annotate cluster IDs (column) that will be identified.
`zero_white`	logical, to display 0 values as white in the colormap.
`color_low`	color code for the low intensity values in the colormap.
`color_mid`	color code for the medium intensity values in the colormap.
`color_high`	color code for the high intensity values in the colormap.
`color_palette`	vector of colors used in heatmap.
`show_rownames`	boolean, specifying if row names are be shown.
`show_colnames`	boolean, specifying if column names are be shown.
`min_data`	numeric, data value corresponding to minimum intensity in the color_palette
`max_data`	numeric, data value corresponding to maximum intensity in the color_palette
`treeheight_row`	the height of a tree for rows, if these are clustered. Default value is 50 points.
`treeheight_col`	the height of a tree for columns, if these are clustered. Default value is 50 points.

Value

tree, the hierarchical tree structure.

cluster_IDs_row, the (row) cluster identities of the markers.

cluster_IDs_col, the (column) cluster identities of the samples.

Examples

set.seed(1)
dat = setNames(as.data.frame(matrix(runif(10*10),10,10),
row.names = paste('marker',1:10,sep='')), paste('sample',1:10,sep=''))
result = clusterData(dat)

Huang-lab/oppti documentation built on March 26, 2023, 12:52 p.m.