draw.emb.kmeans: Visualize the K-means Clustering Result by several dimension...

View source: R/pipeline_functions.R

draw.emb.kmeansR Documentation

Visualize the K-means Clustering Result by several dimension reduction and embedding methods.

Description

draw.emb.kmeans is a data visualization function to show the K-means clustering result of a data matrix. A PCA/MDS/UMAP biplot is generated to visualize the clustering. Two biplots side-by-side will show the comparison between real observation labels (left) and the K-means predicted labels (right).

Usage

draw.emb.kmeans(
  mat = NULL,
  embedding_method = "pca",
  all_k = NULL,
  obs_label = NULL,
  legend_pos = "topleft",
  legend_cex = 0.8,
  plot_type = "2D.ellipse",
  point_cex = 1,
  kmeans_strategy = "basic",
  choose_k_strategy = "ARI",
  return_type = "optimal",
  main = "",
  verbose = TRUE,
  use_color = NULL,
  pre_define = NULL
)

Arguments

mat

a numeric data matrix, the columns (e.g. sample) will be clustered using the feature (e.g. genes) rows.

embedding_method

character, embedding method, choose from pca, mds and umap. Default is pca.

all_k

a vector of integers, a pre-defined K value. K is the number of final clusters. If NULL, the function will try all possible K values. Default is NULL.

obs_label

a vector of characters, a vector describes each sample's selected phenotype information, using sample name as vector name. Can be obtained by calling get_obs_label.

legend_pos

character, position of the plot legend. Default is 'topleft'.

legend_cex

numeric, text size of the plot legend. Default is 0.8.

plot_type

character, plot type. Users can choose from "2D", "2D.ellipse", "2D.interactive","2D.text" and "3D". Default is "2D.ellipse".

point_cex

numeric, size of the point in the plot. Default is 1.

kmeans_strategy

character, K-means clustering algorithm. Users can choose "basic" or "consensus". "consensus" is performed by ConsensusClusterPlus. Default is "basic".

choose_k_strategy

character, method to choose the K-value. Users can choose from "ARI (adjusted rand index)", "NMI (normalized mutual information)" and "Jaccard". Default is "ARI".

return_type

character, the type of result returned. Users can choose "optimal" or "all". "all", all the K-values in all_k will be returned. "optimal", only the K-value yielding the optimal classification result will be returned. Default is "optimal".

main

character, title for the plot.

verbose

logical, if TRUE, print out detailed information during calculation. Default is TRUE.

use_color

a vector of color codes, colors to be assigned to each member of display label. Default is brewer.pal(9, 'Set1').

pre_define

a vector of characters, pre-defined color codes for a certain input (e.g. c("blue", "red") with names c("A", "B")). Default is NULL.

Details

This function is mainly used to check the sample clustering result, in aim to detect if any abnormal (outlier) sample(s) exsist. The input is a high-throughput expression matrix. Each row is a gene/transcript/probe and each column is a sample. Users need to provide the real observation label for each sample. A K-value yielding the optimal classification result will be used to generate the predicted labels. A comparision score (choose from ARI, NMI, Jaccard) will be calculated and shown in the figure.

Value

Return a vector of predicted labels, if return_type is set to "optimal". Or a list of all possible K-values, if return_type is set to be "all". If plot_type='2D.interactive', will return a plotly class object for interactive display.

Examples

network.par <- list()
network.par$out.dir.DATA <- system.file('demo1','network/DATA/',package = "NetBID2")
NetBID.loadRData(network.par=network.par,step='exp-QC')
mat <- Biobase::exprs(network.par$net.eset)
phe <- Biobase::pData(network.par$net.eset)
intgroup <- get_int_group(network.par$net.eset)
for(i in 1:base::length(intgroup)){
 print(intgroup[i])
 pred_label <- draw.emb.kmeans(mat=mat,all_k = NULL,obs_label=get_obs_label(phe,intgroup[i]))
 print(base::table(list(pred_label=pred_label,obs_label=get_obs_label(phe,intgroup[i]))))
}
pred_label <- draw.emb.kmeans(mat=mat,all_k = NULL,
                             obs_label=get_obs_label(phe,'subgroup'),
                             kmeans_strategy='consensus')
## interactive display
draw.emb.kmeans(mat=mat,all_k = NULL,
               obs_label=get_obs_label(phe,'subgroup'),
               plot_type='2D.interactive',
               pre_define=c('WNT'='blue','SHH'='red','G4'='green'))

jyyulab/NetBID documentation built on Dec. 23, 2024, 6:34 a.m.