OptDimClusterStability: Determine optimal projection dimension (PCA or random...

Description Usage Arguments Value Author(s) See Also Examples

View source: R/OptDimClusterStability.R

Description

Find the optimal projection dimension for PCA or random projections based on cluster stability by performing a line search over the target dimension q.

Usage

1
2
3
OptDimClusterStability(xx, k, method = "PCA", n_grid = 5,
  q_max = min(ncol(xx), sqrt(10 * nrow(xx)/k)), true_labels = NULL,
  parallel = FALSE, verbose = FALSE)

Arguments

xx

The data matrix (n x p).

k

The number of clusters.

method

Projection method ("PCA" or random projections: "gaussian", "achlioptas" or "li"). Default: "PCA".

n_grid

Number of values to be used in the line search for optimal projection dimension. Default: 5.

q_max

Maximum target dimension to be used in line search. (Note: the smallest target dimension is always k, the maximum may not exceed the total dimensionality p). Default: sqrt(10n/k).

true_labels

Vector of true cluster assignments (if provided, it is used to compute the Rand index and q_star).

parallel

Logical, if true: perform line search over q in parallel.

verbose

Logical, if true: print progress information.

Value

q_opt

Optimal target dimension (maximises cluster stability).

stab_score

Stability measure for q_opt.

q_star

Optimal ("oracle") target dimension (maximises adj. Rand index). Only available if true labels have been provided.

Author(s)

Bernd Taschler bernd.taschler@dzne.de

Sach Mukherjee sach.mukherjee@dzne.de

See Also

MCAPfit, GMMwrapper, ClusterStability

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  ## Default settings, 50x10 standard Normal input matrix:
  OptDimClusterStability(xx=matrix(rnorm(500),50), k=2)
  
  ## finer search over q:
  OptDimClusterStability(xx=matrix(rnorm(2e4),100,200), k=2, n_grid=10)
  
  ## set max. q, provide class labels, run in parallel:
  ## Not run: 
  OptDimClusterStability(xx=rbind(matrix(rnorm(2e4),100,200), 
                                  matrix(rnorm(2e4, mean = 2),100,200)),
                         k=2, q_max=15, true_labels = c(rep(0,100), rep(1,100)),
                         parallel=TRUE)
 
## End(Not run)

btaschler/mcap documentation built on May 26, 2019, 1:31 a.m.