sp_pheatmap: Generating pheatmap plot

View source: R/sp_pheatmap.R

sp_pheatmapR Documentation

Generating pheatmap plot

Description

Generating pheatmap plot

Usage

sp_pheatmap(
  data,
  filename = NA,
  renameDuplicateRowNames = F,
  top_n = 1,
  statistical_value_type = mad,
  logv = NULL,
  log_add = 0,
  scale = "none",
  annotation_row = NULL,
  annotation_col = NULL,
  cluster_rows = FALSE,
  cluster_cols = FALSE,
  display_numbers = F,
  cluster_cols_variable = NULL,
  cluster_rows_variable = NULL,
  remove_cluster_cols_variable_in_annocol = FALSE,
  remove_cluster_rows_variable_in_annorow = FALSE,
  clustering_method = "complete",
  clustering_distance_rows = "pearson",
  clustering_distance_cols = "pearson",
  label_row_cluster_boundary = FALSE,
  label_col_cluster_boundary = FALSE,
  label_every_n_rowitems = 1,
  label_every_n_colitems = 1,
  breaks = NA,
  breaks_mid = NULL,
  breaks_digits = 2,
  correlation_plot = "None",
  maximum = Inf,
  minimum = -Inf,
  xtics_angle = 0,
  manual_color_vector = NULL,
  fontsize = 14,
  manual_annotation_colors_sidebar = NULL,
  cutree_cols = NA,
  cutree_rows = NA,
  anno_cutree_cols = F,
  anno_cutree_rows = F,
  kclu = NA,
  ytics = TRUE,
  xtics = TRUE,
  width = 0,
  height = 0,
  title = "",
  debug = FALSE,
  saveppt = FALSE,
  ...
)

Arguments

data

Data file or dataframe (with header line, the first column is the rowname, tab seperated. Colnames normally should be unique unless you know what you are doing.)

filename

Filename for output files.

renameDuplicateRowNames

Specify the way to deal with duplicate row names. Default FALSE: representing duplicated row names are not allowed. Accept TRUE: representing make duplicated row names unique by adding <.1>, <.2> for the second, third appearances.

top_n

An integer larger than 1 will be used to get top x genes (like top 5000). A float number less than 1 will be used to get top x fraction genes (like top 0.7 of all genes).

statistical_value_type

Specify the way for statistical computation. Default mad, accept mean, var, sum, median.

logv

First get log-value, then do other analysis. Accept an R function log2 or log10. Default FALSE.

log_add

A value to add before log-transfer in-case log zero. Default 0 the program will automatically choose value to add.

scale

Scale the data or not for clustering and visualization. Default 'none' means no scale, accept 'row', 'column' to scale by row or column.

annotation_row

A file or datafrmae to specify row-annotation with first column same as first column of data. Default NULL.

annotation_col

A file or datafrmae to specify col-annotation with first column sanme as first row of data. Default NULL.

cluster_rows

Hieratical cluster for rows. Default FALSE, accept TRUE. When there are less than 3 rows or more than 5000 rows, this parameter would always be set to FALSE.

cluster_cols

Hieratical cluster for columns. Default FALSE, accept TRUE. When there are less than 3 columns or more than 5000 columns, this parameter would always be set to FALSE.

display_numbers

logical determining if the numeric values are also printed to the cells. If this is a matrix (with same dimensions as original matrix), the contents of the matrix are shown instead of original values.

cluster_cols_variable

Reorder branch order of clustered columns by given variable. (Test only)

cluster_rows_variable

Reorder branch order of clustered rows by given variable. (Test only)

remove_cluster_cols_variable_in_annocol

Do not show cluster_cols_variable in column annotation.

remove_cluster_rows_variable_in_annorow

Do not show cluster_rows_variable in row annotation.

clustering_method

Clustering method, Default "complete". Accept "ward.D", "ward.D2","single", "average" (=UPGMA), "mcquitty" (=WPGMA), "median" (=WPGMC) or "centroid" (=UPGMC)

clustering_distance_rows

Clustering distance method for rows. Default 'pearson', accept 'spearman','euclidean', "manhattan", "maximum", "canberra", "binary", "minkowski", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup" , "binomial", "chao", "cao", "mahalanobis". (Some need vegan package)

clustering_distance_cols

Clustering distance method for cols. Default 'pearson', accept 'spearman','euclidean', "manhattan", "maximum", "canberra", "binary", "minkowski", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup" , "binomial", "chao", "cao", "mahalanobis". (Some need vegan package)

label_row_cluster_boundary

Only display labels of row cluster boundary (w) (the first item in cluster start).

label_col_cluster_boundary

Only display labels of column cluster boundary (x) (the first item in cluster start).

label_every_n_rowitems

Label every n row items (n>1). (Default 1 means labeling all row items. Supplying a large number when there are many rows to label only few rows. For a data matrix with 1000 rows, giving 10 here, will only label 10 genes, the 1st, 11st, 21st, ... 91st) (y)

label_every_n_colitems

Label every n column items (n>1) (Z) (Default 1 means labeling all column items. Supplying a large number when there are many columns to label only few columns. For a data matrix with 1000 columns, giving 10 here, will only label 10 genes, the 1st, 11st, 21st, ... 91st)

breaks

A sequence of numbers that covers the range of values in mat and is one element longer than color vector. Used for mapping values to colors. Useful, if needed to map certain values to certain colors, to certain values. If value is NA then the breaks are calculated automatically. if value is quantile, then the breaks would be computed to generate each quantile.

breaks_mid

Mid value for generating breaks when quantile is assigned to break.

breaks_digits

Number of digits kept for breaks. Default 2.

correlation_plot

First compute the correlation matrix of given data, then heatmap correlation data instead of raw data. Default "None", accept "row" or "col" for row correlation or column correlation.

maximum

The maximum value one want to keep, any number larger than given value would be taken as this given maximum value. Default Inf, Optional.

minimum

The smallest value one want to keep, any number smaller will be taken as this given minimum value. Default -Inf, Optional.

xtics_angle

Rotation angle for x-axis value. Default 0.

manual_color_vector

Manually set colors for each geom. Default NULL, meaning using ggplot2 default. Colors like c('red', 'blue', '#6181BD') (number of colors not matter) or a RColorBrewer color set like "BrBG" "PiYG" "PRGn" "PuOr" "RdBu" "RdGy" "RdYlBu" "RdYlGn" "Spectral" "Accent" "Dark2" "Paired" "Pastel1" "Pastel2" "Set1" "Set2" "Set3" "Blues" "BuGn" "BuPu" "GnBu" "Greens" "Greys" "Oranges" "OrRd" "PuBu" "PuBuGn" "PuRd" "Purples" "RdPu" "Reds" "YlGn" "YlGnBu" "YlOrBr" "YlOrRd" (check http://www.sthda.com/english/wiki/colors-in-r for more).

fontsize

Font size. Default 14.

manual_annotation_colors_sidebar

Annotation color. One can only specify color for each column of row-annotatation or col-annotation. For example, 'class' (two values: C1, C2) and group' (two values:G1, G2) are two row-annotations, 'type' (three values, T1, T2, T3) and 'size' (four values, 1, 2, 3, 4) are two col-annoations. Colors can be specified in a string as 'class=c(C1="blue", C2="yellow"), size=c("white", "green"), type=c(T1="pink", T2="black", T3="cyan")' or a list as list(class=c(C1="blue", C2="yellow"),size=c("white", "green")). In R, one can use colors() function to get names of all available colors.

cutree_cols

similar to cutree_rows, but for columns

cutree_rows

number of clusters the rows are divided into, based on the hierarchical clustering (using cutree), if rows are not clustered, the argument is ignored

anno_cutree_cols

Add column tree-cut results as column annotation.

anno_cutree_rows

Add row tree-cut results as row annotation.

kclu

Aggregate the rows using kmeans clustering. This is advisable if number of rows is so big that R cannot handle their hierarchical clustering anymore, roughly more than 1000. Instead of showing all the rows separately one can cluster the rows in advance and show only the cluster centers. The number of clusters can be tuned here. Default 'NA' which means no cluster, other positive interger is accepted for executing kmeans cluster, also the parameter represents the number of expected clusters

ytics

Display ytics.

xtics

Display xtics.

width

Picture width

height

Picture height

title

Title of picture. Default empty title

saveppt

Whether to output PPT format. Default false, doesn't output. Accept TRUE, will output ppt file.

...

Other parameters given to pheatmap.

Value

Generate a PDF and TXT file.

Examples

a = c(12,14,17,11,16)
b = c(4,20,15,11,9)
c = c(5,7,19,8,18)
d = c(15,13,11,17,16)
e = c(12,19,16,7,9)
pheatmap_data = as.data.frame(cbind(a,b,c,d,e))
sp_pheatmap(data = pheatmap_data)

## Not run:
pheatmap_data = "pheatmap.data"
sp_pheatmap(data = pheatmap_data)
## End(Not run)



Tong-Chen/ImageGP documentation built on April 14, 2025, 12:54 p.m.