CKMSelAll | R Documentation |
The function is an advanced version of CKMSelVar that selects the number of clusters together with the number of masking variables
CKMSelAll( dataset, maxclust, search = "dep", maxnum = 10, n.rep = 20, method = "globalmax", kmeans_starts = 10 )
dataset |
the orginal dataset on which CKM and its model selection procedure operates |
maxclust |
the maximal possible number of clusters |
search |
the mode of selecting over the grid. "all" = selecting over each point of the grid; while it maximizes the accuracy, it is overly slow with large number of variables. "sub" = the "grid search with a zoom" strategy; while it is less accurate compared to searching the full grid, it is efficient even with large number of variables. "dep" automatically adjust to one of the above two methods based on the number of variables. When # variables < 25, the search covers every possible value of the grid. This is also the default option. |
maxnum |
the parameter is only useful when the "grid search with a zoom" strategy is applied. It restricts the maximal number of values searched over in any iteration. The default value is set at 10. |
n.rep |
the number of permutated datasets when calculating the gap statistic |
method |
different criterion exists as to determine the number of clusters based on the gap statistic; the users can try out these various options in the "method" argument. The default is "globalmax": selects the largest gap over all possible number of clusters (i.e. global maxima). Other options include "firstSEmax": select the "first" gap that falls within the range of the largest gap minus one SE (i.e. the one SE role); "firstmax": select the first largest gap (i.e., local maxima), "Tibs2001SEmax": the recommened guideline of Tibshirani, 2011 that takes the one-SE rule. |
kmeans_starts |
the number of starts used in the kmeans algorithm |
The function will return a ckm object that is the list of five elements. The first denotes the selected number of masking variables; the second includes all indicies of signaling variables; the third is a vector illustrating cluster assignment; the forth is the pre-determined or selected "optimal" number of clusters; the fifth is the original dataset.
maxcluster <- 10 ckm.sel.all <- CKMSelAll(dataset, maxcluster)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.