gapSelect | R Documentation |
This function selects the optimal number of clusters for the Random Covariance Clustering Model (RCCM) based on a Gap statistic as proposed by Tibshirani et al. (2001).
gapSelect(x, gMax, B = 100, zs, optLambdas, ncores = 1)
x |
List of K data matrices each of dimension n_k x p |
gMax |
Maximum number of clusters or groups to consider. Must be at least 2. |
B |
Number of reference data sets to generate. |
zs |
K x gMax - 1 matrix with estimated cluster memberships for each number of clusters considered. |
optLambdas |
Data frame with 4 columns (lambda1, lambda2, lambda3, and G) and gMax - 1 rows. The first 3 columns are the tuning parameter values to implement the RCCM for a given number of clusters, and the G column is the number of clusters that must range from 2 to gMax |
ncores |
Number of computing cores to use if desired to run in parallel. Optional. |
A list of length 3 containing:
The optimally selected number of clusters (nclusts).
The gMax observed Gap statistics (gaps).
The gMax adjusted standard deviations of the simulated gap statistics (sigmas).
Andrew DiLernia
Tibshirani, Robert, et al. "Estimating the Number of Clusters in a Data Set via the Gap Statistic." Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 63, no. 2, 2001, pp. 411-423., doi:10.1111/1467-9868.00293.
# Generate data with 2 clusters with 12 and 10 subjects respectively, # 15 variables for each subject, 100 observations for each variable for each subject, # the groups sharing about 50% of network connections, and 10% of differential connections # within each group set.seed(1994) myData <- rccSim(G = 2, clustSize = 10, p = 10, n = 177, overlap = 0.20, rho = 0.10) # Analyze simulated data with RCCM optLambdas <- data.frame(lambda1 = 10, lambda2 = 50, lambda3 = 0.10, G = 2:3) result2 <- rccm(x = myData$simDat, lambda1 = optLambdas$lambda1[1], lambda2 = optLambdas$lambda2[1], lambda3 = optLambdas$lambda3[1], nclusts = 2) result3 <- rccm(x = myData$simDat, lambda1 = optLambdas$lambda1[2], lambda2 = optLambdas$lambda2[2], lambda3 = optLambdas$lambda3[2], nclusts = 3) # Estimated cluster memberships zHats <- cbind(apply(result2$weights, MARGIN = 2, FUN = which.max), apply(result3$weights, MARGIN = 2, FUN = which.max)) # Selecting number of clusters clustRes <- gapSelect(x = myData$simDat, gMax = 3, B = 50, zs = zHats, optLambdas = optLambdas)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.