draw.clustComp: Visualize Each Sample's Observed Label vs. Predicted Label in...

View source: R/pipeline_functions.R

draw.clustCompR Documentation

Visualize Each Sample's Observed Label vs. Predicted Label in Table

Description

draw.clustComp draws a table to show each sample's observed label vs. its predicted label. Each row represents an observed label (e.g. one subgroup of disease), each column represents the predicted label created by classification algorithm (e.g K-means).

Usage

draw.clustComp(
  pred_label,
  obs_label,
  strategy = "ARI",
  use_col = TRUE,
  low_K = 5,
  highlight_clust = NULL,
  main = NULL,
  clust_cex = 1,
  outlier_cex = 0.3
)

Arguments

pred_label

a vector of characters, the predicted labels created by classification (e.g K-means).

obs_label

a vector of characters, the observed labels annotated by phenotype data.

strategy

character, method to quantify the similarity between predicted labels vs. observed labels. Users can choose from "ARI (adjusted rand index)", "NMI (normalized mutual information)" and "Jaccard". Default is "ARI".

use_col

logical, If TRUE, the table will be colored. The more sample gathered in one table cell, the darker shade it has. Default is TRUE.

low_K

integer, a threshold of sample number to be shown in a single cell. If too many samples gathered in a single table cell, it will be challenging for eyes. By setting the value of this threshold, if the number of samples gathered in one table cell exceeded the threshold, only the number will be shown. Otherwise, all samples' names will be listed. Default is 5.

highlight_clust

a vector of characters, the predicted label need to be highlighted in the figure.

main

character, an overall title for the plot.

clust_cex

numeric, text size for the predicted label (column names). Default is 1.

outlier_cex

numeric, text size for the observed label (row names). Default is 0.3.

Details

The table provides more details about the side-by-side PCA biplot created by draw.emb.kmeans. The purpose is to find if any abnormal sample (outlier) exists. The darker the table cell is, the more samples are gathered in the corresponding label.

Value

Return a matrix of integers and a table for visualization. Rows are predicted label, columns are observed label. Integer is the number of samples gathered in the corresponding label.

Examples

network.par <- list()
network.par$out.dir.DATA <- system.file('demo1','network/DATA/',package = "NetBID2")
NetBID.loadRData(network.par=network.par,step='exp-QC')
mat <- Biobase::exprs(network.par$net.eset)
phe <- Biobase::pData(network.par$net.eset)
intgroup <- 'subgroup'
pred_label <- draw.emb.kmeans(mat=mat,all_k = NULL,
                             obs_label=get_obs_label(phe,intgroup),
                             kmeans_strategy='consensus')
draw.clustComp(pred_label,get_obs_label(phe,intgroup),outlier_cex=1,low_K=2,use_col=TRUE)
draw.clustComp(pred_label,get_obs_label(phe,intgroup),outlier_cex=1,low_K=2,use_col=FALSE)

jyyulab/NetBID documentation built on Dec. 23, 2024, 6:34 a.m.