draw.emb.kmeans: Visualize the K-means Clustering Result by several dimension...
In jyyulab/NetBID: Network-based Bayesian Inference of Drivers, version 2

draw.emb.kmeans

R Documentation

Visualize the K-means Clustering Result by several dimension reduction and embedding methods.

Description

draw.emb.kmeans is a data visualization function to show the K-means clustering result of a data matrix. A PCA/MDS/UMAP biplot is generated to visualize the clustering. Two biplots side-by-side will show the comparison between real observation labels (left) and the K-means predicted labels (right).

Usage

draw.emb.kmeans(
  mat = NULL,
  embedding_method = "pca",
  all_k = NULL,
  obs_label = NULL,
  legend_pos = "topleft",
  legend_cex = 0.8,
  plot_type = "2D.ellipse",
  point_cex = 1,
  kmeans_strategy = "basic",
  choose_k_strategy = "ARI",
  return_type = "optimal",
  main = "",
  verbose = TRUE,
  use_color = NULL,
  pre_define = NULL
)

Arguments

`mat`	a numeric data matrix, the columns (e.g. sample) will be clustered using the feature (e.g. genes) rows.
`embedding_method`	character, embedding method, choose from pca, mds and umap. Default is pca.
`all_k`	a vector of integers, a pre-defined K value. K is the number of final clusters. If NULL, the function will try all possible K values. Default is NULL.
`obs_label`	a vector of characters, a vector describes each sample's selected phenotype information, using sample name as vector name. Can be obtained by calling `get_obs_label`.
`legend_pos`	character, position of the plot legend. Default is 'topleft'.
`legend_cex`	numeric, text size of the plot legend. Default is 0.8.
`plot_type`	character, plot type. Users can choose from "2D", "2D.ellipse", "2D.interactive","2D.text" and "3D". Default is "2D.ellipse".
`point_cex`	numeric, size of the point in the plot. Default is 1.
`kmeans_strategy`	character, K-means clustering algorithm. Users can choose "basic" or "consensus". "consensus" is performed by `ConsensusClusterPlus`. Default is "basic".
`choose_k_strategy`	character, method to choose the K-value. Users can choose from "ARI (adjusted rand index)", "NMI (normalized mutual information)" and "Jaccard". Default is "ARI".
`return_type`	character, the type of result returned. Users can choose "optimal" or "all". "all", all the K-values in `all_k` will be returned. "optimal", only the K-value yielding the optimal classification result will be returned. Default is "optimal".
`main`	character, title for the plot.
`verbose`	logical, if TRUE, print out detailed information during calculation. Default is TRUE.
`use_color`	a vector of color codes, colors to be assigned to each member of display label. Default is brewer.pal(9, 'Set1').
`pre_define`	a vector of characters, pre-defined color codes for a certain input (e.g. c("blue", "red") with names c("A", "B")). Default is NULL.

Details

This function is mainly used to check the sample clustering result, in aim to detect if any abnormal (outlier) sample(s) exsist. The input is a high-throughput expression matrix. Each row is a gene/transcript/probe and each column is a sample. Users need to provide the real observation label for each sample. A K-value yielding the optimal classification result will be used to generate the predicted labels. A comparision score (choose from ARI, NMI, Jaccard) will be calculated and shown in the figure.

Value

Return a vector of predicted labels, if return_type is set to "optimal". Or a list of all possible K-values, if return_type is set to be "all". If plot_type='2D.interactive', will return a plotly class object for interactive display.

Examples

network.par <- list()
network.par$out.dir.DATA <- system.file('demo1','network/DATA/',package = "NetBID2")
NetBID.loadRData(network.par=network.par,step='exp-QC')
mat <- Biobase::exprs(network.par$net.eset)
phe <- Biobase::pData(network.par$net.eset)
intgroup <- get_int_group(network.par$net.eset)
for(i in 1:base::length(intgroup)){
 print(intgroup[i])
 pred_label <- draw.emb.kmeans(mat=mat,all_k = NULL,obs_label=get_obs_label(phe,intgroup[i]))
 print(base::table(list(pred_label=pred_label,obs_label=get_obs_label(phe,intgroup[i]))))
}
pred_label <- draw.emb.kmeans(mat=mat,all_k = NULL,
                             obs_label=get_obs_label(phe,'subgroup'),
                             kmeans_strategy='consensus')
## interactive display
draw.emb.kmeans(mat=mat,all_k = NULL,
               obs_label=get_obs_label(phe,'subgroup'),
               plot_type='2D.interactive',
               pre_define=c('WNT'='blue','SHH'='red','G4'='green'))

jyyulab/NetBID documentation built on July 16, 2025, 4:05 p.m.