kbranches.global: Clustering on K-Branches
In theislab/kbranches: K-Branches clustering

Description Usage Arguments Value Examples

View source: R/kbranches-global.R

Clusters data on K-Branches (halflines) with a common center and calculates the corresponding GAP statistic

kbranches.global(input_dat, Kappa, Dmat = NULL, init_Kmeans = TRUE,
  c0 = NULL, Vmat = NULL, nstart = 20, nstart_GAP = 20,
  nstart_kmeans = 20, B_GAP = NULL, fixed_center = NULL,
  medoids = FALSE, silent = TRUE, silent_internal = TRUE,
  show_plots = FALSE, show_lines = TRUE, show_plots_GAP = FALSE)

`input_dat:`	data frame of input data with rows=samles and cols=dimensions.
`Kappa:`	number of clusters (halflines)
`Dmat:`	matrix containing sample distances
`init_Kmeans:`	if TRUE: initialize directions v1,...,vk using K-Means. FALSE: use directions of randomly selected samples
`c0:`	initial value for the center of all half-lines
`Vmat:`	matrix whose K rows are the direction vectors
`nstart_GAP:`	number of initializations for clustering when calculating the GAP statistic
`nstart_kmeans:`	number of initializations for Kmeans (when using Kmeans to initialize khalflines)
`B_GAP:`	number of bootstrap datasets used to compute the GAP statistic, if NULL (default), it won't be computed
`fixed_center:`	if not NULL, then K-halflines will run with the given center fixed
`medoids:`	if TRUE, the medoids version of khalflines will be used (slower)
`silent:`	set to FALSE to display messages (for debugging)
`silent_internal:`	set to TRUE to display messages and plots of internal clustering functions (for debugging)
`show_plots:`	if TRUE, the clustering will result be plotted
`show_lines:`	if TRUE, show the halflines in the plot
`show_plots_GAP:`	if TRUE, show the plots when performing clustering under the null distribution to calculate the GAP statistic (for debugging)

a list with elements:

- cluster: cluster assignment for each sample (numeric)
- Kappa: number of clusters (halflines)
- err: total clustering cost
- iters: total iterations of the algorithm
- c0: position (row index in input_dat) of the center sample
- Vmat: positions (row indices in input_dat) of the direction samples
- clust_counts: number (count) of samples in each of the clusters
- all_clustering_errors: vector of total clustering error for each of the nstart different initializations
- all_clusterings: total results for each of the nstart different initializations
- GAP: value of the modified GAP statistic for the given Kappa
- GAPl: value of the modified GAP statistic for the given Kappa using the logarithm of the expected dispersion
- GAP_orig: value of the oroginal GAP statistic for the given Kappa (using the logarithm of the expected dispersion)
- GAP_orig_no_log: value of the oroginal GAP statistic for the given Kappa (without using the logarithm of the expected dispersion)
- GAP.sd: standard deviation of GAP
- GAPl.sd: standard deviation of GAPl
- GAP_orig.sd: standard deviation of GAP_orig
- GAP_orig_no_log.sd: standard deviation of GAP_orig_no_log
- call: function call

#cluster the 2D data on three halflines
set.seed(1)

#load the data
data(scdata.3lines.simulated6genes_subsampled)
raw_dat <- scdata.3lines.simulated6genes_subsampled

#perform diffusion map dimensionality reduction
dmap <- destiny::DiffusionMap(raw_dat, sigma = 1000)

#keep the first 2 diffusion components
input_dat <- destiny::as.data.frame(dmap)[, 1:2]

#cluster with K=3
clust <- kbranches.global(input_dat, Kappa = 3)

#plot the clustering results
plot(input_dat, pch=21, col=clust$cluster, bg=clust$cluster, main = 'K-Branch clustering')