Description Usage Arguments Value Examples
View source: R/kbranches-global.R
Clusters data on K-Branches (halflines) with a common center and calculates the corresponding GAP statistic
1 2 3 4 5 |
input_dat: |
data frame of input data with rows=samles and cols=dimensions. |
Kappa: |
number of clusters (halflines) |
Dmat: |
matrix containing sample distances |
init_Kmeans: |
if TRUE: initialize directions v1,...,vk using K-Means. FALSE: use directions of randomly selected samples |
c0: |
initial value for the center of all half-lines |
Vmat: |
matrix whose K rows are the direction vectors |
nstart_GAP: |
number of initializations for clustering when calculating the GAP statistic |
nstart_kmeans: |
number of initializations for Kmeans (when using Kmeans to initialize khalflines) |
B_GAP: |
number of bootstrap datasets used to compute the GAP statistic, if NULL (default), it won't be computed |
fixed_center: |
if not NULL, then K-halflines will run with the given center fixed |
medoids: |
if TRUE, the medoids version of khalflines will be used (slower) |
silent: |
set to FALSE to display messages (for debugging) |
silent_internal: |
set to TRUE to display messages and plots of internal clustering functions (for debugging) |
show_plots: |
if TRUE, the clustering will result be plotted |
show_lines: |
if TRUE, show the halflines in the plot |
show_plots_GAP: |
if TRUE, show the plots when performing clustering under the null distribution to calculate the GAP statistic (for debugging) |
a list with elements:
- cluster: cluster assignment for each sample (numeric)
- Kappa: number of clusters (halflines)
- err: total clustering cost
- iters: total iterations of the algorithm
- c0: position (row index in input_dat) of the center sample
- Vmat: positions (row indices in input_dat) of the direction samples
- clust_counts: number (count) of samples in each of the clusters
- all_clustering_errors: vector of total clustering error for each of the nstart different initializations
- all_clusterings: total results for each of the nstart different initializations
- GAP: value of the modified GAP statistic for the given Kappa
- GAPl: value of the modified GAP statistic for the given Kappa using the logarithm of the expected dispersion
- GAP_orig: value of the oroginal GAP statistic for the given Kappa (using the logarithm of the expected dispersion)
- GAP_orig_no_log: value of the oroginal GAP statistic for the given Kappa (without using the logarithm of the expected dispersion)
- GAP.sd: standard deviation of GAP
- GAPl.sd: standard deviation of GAPl
- GAP_orig.sd: standard deviation of GAP_orig
- GAP_orig_no_log.sd: standard deviation of GAP_orig_no_log
- call: function call
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #cluster the 2D data on three halflines
set.seed(1)
#load the data
data(scdata.3lines.simulated6genes_subsampled)
raw_dat <- scdata.3lines.simulated6genes_subsampled
#perform diffusion map dimensionality reduction
dmap <- destiny::DiffusionMap(raw_dat, sigma = 1000)
#keep the first 2 diffusion components
input_dat <- destiny::as.data.frame(dmap)[, 1:2]
#cluster with K=3
clust <- kbranches.global(input_dat, Kappa = 3)
#plot the clustering results
plot(input_dat, pch=21, col=clust$cluster, bg=clust$cluster, main = 'K-Branch clustering')
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.