CKMSelVar: The function, combined with CardKMeans, selects the number of...

View source: R/CKMSelVar.R

CKMSelVarR Documentation

The function, combined with CardKMeans, selects the number of masking variables, given the fixed number of clusters

Description

The function, combined with CardKMeans, selects the number of masking variables, given the fixed number of clusters

Usage

CKMSelVar(
  dataset,
  n.cluster,
  search = "dep",
  maxnum = 10,
  n.rep = 20,
  kmeans_starts = 10
)

Arguments

dataset

the orginal dataset on which CKM and its model selection procedure operates

n.cluster

the total number of clusters

search

the mode of selecting over the grid. "all" = selecting over each point of the grid; while it maximizes the accuracy, it is overly slow with large number of variables. "sub" = the "grid search with a zoom" strategy; while it is less accurate compared to searching the full grid, it is efficient even with large number of variables. "dep" automatically adjust to one of the above two methods based on the number of variables. When # variables < 25, the search covers every possible value of the grid. This is also the default option.

maxnum

the parameter is only useful when the "grid search with a zoom" strategy is applied. It restricts the maximal number of values searched over in any iteration. The default value is set at 10.

n.rep

the number of permutated datasets when calculating the gap statistic

kmeans_starts

the number of starts used in the kmeans algorithm

Value

@return The function will return a ckm object that is the list of five elements. The first denotes the selected number of masking variables; the second includes all indicies of signaling variables; the third is a vector illustrating cluster assignment; the forth is the pre-determined or selected "optimal" number of clusters; the fifth is the original dataset.

Examples

ncluster <- 3
ckm.sel.var <- CKMSelVar(dataset, ncluster)


syuanuvt/CKM documentation built on Dec. 1, 2022, 9:06 p.m.