Random Covariance Clustering Model
This R package implements methods for clustering and joint estimation of multiple sparse precision matrices. Methods implemented include the Random Covariance Model and Random Covariance Clustering Model.
rccm
Implements the Random Covariance Clustering Model (RCCM) for joint estimation of sparse precision matrices belonging to multiple clusters or groups. Optimization is conducted using block coordinate descent.
randCov
Implements the Random Covariance Model (RCM) for joint estimation of multiple sparse precision matrices. Optimization is conducted using block coordinate descent.
gapSelect
Selects the optimal number of clusters for the Random Covariance Clustering Model (RCCM) based on a Gap statistic as proposed by Tibshirani et al. (2001).
rccSim
Simulates data based on the Random Covariance Clustering Model (RCCM). Data is generated in a hierarchical manner, beginning with group-level networks and precision matrices and then subject-level networks and matrices.
starsRccm
Implements a modified stability approach for
regularization selection (stARS) method for tuning parameter
selection. Methods available to implement include the
fused graphical lasso (Danaher et al., 2014), group graphical lasso (Danaher et al., 2014),
graphical lasso (Friedman et al., 2008), random covariance clustering
model (RCCM), and the random covariance model (Zhang et al., 2019).
cvTune
Implements k-fold cross-validation for tuning parameter
selection. Methods available to implement include the
fused graphical lasso (Danaher et al., 2014), group graphical lasso (Danaher et al., 2014),
graphical lasso (Friedman et al., 2008), random covariance clustering
model (RCCM), and the Random Covariance Model (Zhang et al., 2019).
dwishart
Probability density function for the Wishart distribution.
randCalc
Calculates the rand index describing the amount of agreement between two integer vectors of cluster memberships.
Install the latest version of the package from GitHub with the following R code:
if("devtools" %in% installed.packages() == FALSE) {
install.packages("devtools")
}
devtools::install_github("dilernia/rccm")
Here we walk through brief examples of using some key functions. For a more detailed simulation example for implementing the RCCM, see this link for a simulation with precision matrices with higher magnitude entries, and this link for a simulation with low magnitude entries.
Below we generate a single data set with 2 clusters with 12 and 10 subjects respectively, 15 variables for each subject, 100 observations for each variable for each subject, the groups sharing about 50% of network connections, and 10% of differential connections within each group:
library(rccm)
# Simulate data
set.seed(1994)
myData <- rccSim(G = 2, clustSize = c(12, 10), p = 15, n = 100, overlap = 0.50, rho = 0.10)
# Analyze simulated data with RCCM
result <- rccm(x = myData$simDat, lambda1 = 10, lambda2 = 50, lambda3 = 2, nclusts = 2)
# Check clustering performance
zHats <- apply(result$weights, MARGIN = 2, FUN = which.max)
randCalc(zHats, myData$zgks)
Below we generate a single data set and then obtain the optimal estimation results using stARS for tuning parameter selection with 10 bootstrap samples for demonstration. A higher number of bootstrap samples (30 or 50) should be used in practice for more reliable results. We also include examples for using Ward clustering & GGL and K-means clustering and GLasso, and note that the magnitude of the tuning parameters that work well for each method vary, so we adjust / scale them accordingly to achieve the best results for each respective method.
# Simulate data
set.seed(1994)
G <- 2
myData <- rccSim(G = G, clustSize = 10, p = 5, n = 100, overlap = 0.50, rho = 0.10)
# Standardizing data
simData$simDat <- lapply(simData$simDat, FUN = scale)
# Grid of tuning parameters to search over
lambdas <- expand.grid(lambda1 = c(25, 35, 45), lambda2 = c(300, 325), lambda3 = 0.01)
# Find optimal tuning parameter set using modified stARS with 10 bootstrap samples
optTune <- starsRccm(datf = myData$simDat, lambs = lambdas, G = G, N = 10, method = "RCCM")
optWardggl <- starsRccm(datf = myData$simDat, lambs = lambdas, G = G, N = 10, method = "GGL")
optGL <- starsRccm(datf = myData$simDat, lambs = lambdas, G = G, N = 10, method = "GLasso")
# Analyze with optimally selected tuning parameters for RCCM
resultRccm <- rccm(x = myData$simDat, lambda1 = optTune$lambda1[1],
lambda2 = optTune$lambda2[1], lambda3 = optTune$lambda3[1], nclusts = G)
# Using Ward & GGL approach
# Function for calculating pair-wise Frob norm differences and then clustering
MatClust <- function(mats, G) {
K <- dim(mats)[3]
combos <- expand.grid(s1 = 1:K, s2 = 1:K)
distMat <- matrix(NA, nrow = K, ncol = K)
for(r in 1:K) {
for(s in 1:K) {
distMat[r, s] <- norm(mats[, , r] - mats[, , s], type = 'F')
}
}
cl0 <- cutree(hclust(d = as.dist(distMat), method = "ward.D"), k = G)
wgk <- matrix(NA, nrow = G, ncol = K)
for(i in 1:G) {
for(j in 1:K) {
wgk[i, j] <- ifelse(cl0[j] == i, 1, 0)
}
}
return(wgk)
}
K <- length(simData$simDat)
Sl <- sapply(simData$simDat, cov, simplify = "array")
gl <- sapply(1:K, simplify = "array", FUN = function(x){glasso::glasso(Sl[, , x], rho = 0.001,
penalize.diagonal = FALSE)$wi})
# Estimating cluster memberships using Ward clustering based on dissimilarity matrix of Frob norm differences
GGLres <- list()
GGLres$weights <- MatClust(gl, G = G)
# Analyzing using GGL within each estimated cluster
GGLres$res <- unlist(lapply(FUN = function(g) {
prec <- JGL::JGL(Y = simData$simDat[which(as.logical(GGLres$weights[g, ]))], penalty = "group",
penalize.diagonal = FALSE, lambda1 = optWardggl$lambda1[1] / 100,
lambda2 = optWardggl$lambda2[1] / 50000, return.whole.theta = TRUE)$theta
return(setNames(prec, c(which(as.logical(GGLres$weights[g, ])))))
}, X = 1:settings[i, ]$G), recursive = F)
# Using GLasso & K-means clustering
GLassores <- list()
GLassores$res <- lapply(myData$simDat, FUN = function(x) {
glasso::glasso(cov(x), rho = optGL$lambda1[1] / 100, penalize.diagonal = FALSE)$wi})
# Creating matrix of vectorized precision matrix estimates
vMat <- do.call(rbind, lapply(X = GLassores$res, FUN = as.numeric))
# Finding estimated cluster memberships using k-means clustering
zHat <- as.integer(factor(kmeans(x = vMat, centers = G)$cluster, levels = c(1:G)))
Below we generate a single data set with 2 clusters with 10 subjects in each group, 10 variables, 150 observations, with the groups sharing about 50% of network connections, and 10% of differential connections within each group:
# Simulate data
set.seed(1994)
myData <- rccSim(G = 2, clustSize = 10, p = 10, n = 150, overlap = 0.50, rho = 0.10)
# Analyze simulated data with RCCM
optLambdas <- data.frame(lambda1 = 25, lambda2 = 100, lambda3 = 0.1, G = 2:3)
result2 <- rccm(x = myData$simDat, lambda1 = optLambdas$lambda1[1],
lambda2 = optLambdas$lambda2[1], lambda3 = optLambdas$lambda3[1],
nclusts = 2)
result3 <- rccm(x = myData$simDat, lambda1 = optLambdas$lambda1[2],
lambda2 = optLambdas$lambda2[2], lambda3 = optLambdas$lambda3[2],
nclusts = 3)
# Estimated cluster memberships
zHats <- cbind(apply(result2$weights, MARGIN = 2, FUN = which.max),
apply(result3$weights, MARGIN = 2, FUN = which.max))
# Selecting number of clusters (takes ~ 2 minutes)
clustRes <- gapSelect(x = myData$simDat, gMax = 3, B = 30, zs = zHats,
optLambdas = optLambdas)
clustRes
Danaher, P., Wang, P., & Witten, D. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(2), 373-397.
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411-423.
Zhang, L., DiLernia, A., Quevedo, K., Camchong, J., Lim, K., & Pan, W. (2020). A random covariance model for bi‐level graphical modeling with application to resting‐state fMRI data. Biometrics.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.