# R/groupsparsewkm.R In vimpclust: Variable Importance in Clustering

#### Documented in groupsparsewkm

#' @title Group-sparse weighted k-means
#' @export
#'
#' @description This function performs group-sparse weighted k-means on a set
#' of observations described by numerical variables organized in groups.
#' It generalizes the sparse clustering algorithm introduced by
#' Witten & Tibshirani (2010) to groups. While the algorithm clusters the observations, the groups of variables are supposed priorly known.
#'  The algorithm computes a series of weights associated to the groups of variables, the weights
#' indicating the importance of each group in the clustering process.
#'
#' @param X a numerical matrix or a dataframe of dimension \code{n} (observations) by \code{p}
#' (variables).
#' @param centers an integer representing the number of clusters.
#' @param lambda a vector of numerical values (or a single value) providing
#' a grid of values for the regularization parameter. If NULL (by default), the function computes its
#' own lambda sequence of length \code{nlambda} (see details).
#' @param nlambda an integer indicating the number of values for the regularization parameter.
#' By default, \code{nlambda=20}.
#' @param index a vector of integers of size \code{p} providing the group membership
#'  for each variable. By default, \code{index=1:ncol(X)} i.e. no groups or groups of size 1.
#' @param sizegroup a boolean. If TRUE, the group sizes (number of variables in each group) are taken into account in the penalty term (see details).
#'  By default, \code{sizegroup=TRUE}.
#' @param nstart an integer representing the number of random starts in the k-means algorithm.
#'  By default, \code{nstart=10}.
#' @param itermaxw an integer indicating the maximum number of iterations for the inside
#' loop over the weights \code{w}. By default, \code{itermaxw=20}.
#' @param itermaxkm an integer representing the maximum number of iterations in the k-means
#' algorithm. By default, \code{itermaxkm=10}.
#' @param scaling a boolean. If TRUE, variables are scaled to zero mean and unit variance. By default, \code{scaling=TRUE}.
#' @param epsilonw a positive numerical value. It provides the precision of the stopping criterion over \code{w}. By default, \code{epsilonw =1e-04}.
#' @param verbose an integer value. If \code{verbose=0}, the function stays silent, if \code{verbose=1} (default option), it  prints
#'  whether the stopping criterion over the weights \code{w} is satisfied.
#'
#' @return \item{lambda}{a numerical vector containing the regularization parameters (a grid of values).}
#' @return \item{W}{a \code{p} by \code{length(lambda)} numerical matrix. It contains the weights associated to each variable.}
#' @return \item{Wg}{a \code{L} by \code{length(lambda)} numerical matrix, where \code{L} is the number of groups. It contains the weights associated to each group.}
#' @return \item{cluster}{a \code{n} by \code{length(lambda)} integer matrix. It contains the cluster memberships, for each value of the regularization parameter.}
#' @return \item{sel.feat}{a numerical vector of the same length as \code{lambda}, giving the number of selected variables for each value of the regularization parameter.}
#' @return \item{sel.groups}{a numerical vector of the same length as \code{lambda}, giving the number of selected groups of variables for each value of the regularization parameter.}
#' @return \item{Z}{a matrix of size \code{n} by \code{p} containing the scaled data if \code{scaling=TRUE}, and a copy of \code{X} otherwise.}
#' @return \item{bss.per.feature}{a matrix of size \code{p} by \code{length(lambda)}. It contains the between-class variance computed for each variable.}
#'
#'
#' @details
#' Group-sparse weighted k-means performs clustering on data described by numerical variables priorly partitionned into groups, and automatically selects the most discriminant groups by
#' setting to zero the weights of the non-discriminant ones.
#'
#' The algorithm is based on the optimization of a cost function which is the weighted between-class variance penalized by a group L1-norm. The groups must be priorly defined through
#' expert knowledge. If there is no group structure (each group contains one variable only), the algorithm reduces to the sparse weighted k-means introduced in Witten & Tibshirani (2010).
#' The penalty term may take into account the size of the groups by setting \code{sizegroup=TRUE} (see Chavent et al. (2020) for further details on the mathematical expression of the
#' optimized criterion). The importance of the penalty term may be adjusted through the regularization parameter \code{lambda}. If \code{lambda=0}, there is no penalty applied to the
#' weighted between-class variance. The larger \code{lambda}, the larger the penalty term and the number of groups with null weights.
#'
#' The output of the algorithm is three-folded: one gets a partitioning of the data, a vector of weights associated to each group, and a vector of weights associated to each
#' variable. Weights equal to zero imply that the associated variables or the associated groups do not participate in the clustering process.
#'
#' Since it is difficult to chose the regularization parameter \code{lambda} without prior knowledge, the function builds automatically a grid of parameters and finds the partitioning
#' and the vectors of weights associated to each value in the grid.
#'
#' Note that when the regularization parameter is equal to 0 (no penalty applied), the output is different from that of a regular k-means, since the optimized criterion is a weighted
#' between-class variance and not the between-class variance only.
#'
#' @references Witten, D. M., & Tibshirani, R. (2010). A framework for feature
#' selection in clustering. Journal of the American Statistical Association,
#' 105(490), p.713-726.
#' @references Chavent, M. & Lacaille, J. & Mourer, A. & Olteanu, M. (2020).
#' Sparse k-means for mixed data via group-sparse clustering, ESANN proceedings.
#'
#'
#'@examples
#' data(iris)
#' # define two groups of variables:
#' # "Sepal.Length" and "Sepal.Width" in group 1
#' # "Petal.Length" and "Petal.Width"  in group 2
#' index <- c(1, 2, 1, 2)
#' # group-sparse k-means
#' \donttest{
#' out <- groupsparsewkm(X = iris[,-5], centers = 3, index = index)
#' # grid of regularization parameters
#' out$lambda #' k <- 10 #' # weights of the variables for the k-th regularization parameter #' out$W[,k]
#' # weights of the groups for the k-th regularization parameter
#' out$Wg[,k] #' # partition obtained with for the k-th regularization parameter #' out$cluster[,k]
#' # between-class variance on each variable
#' out$bss.per.feature[,k] #' # between-class variance #' sum(out$bss.per.feature[,k])/length(index)
#'
#' # one variable per group (equivalent to sparse k-means)
#' index <- 1:4 # default option in groupsparsewkm
#' # sparse k-means
#' out <- groupsparsewkm(X = iris[,-5], centers = 3, index = index)
#' # or
#' out <- groupsparsewkm(X = iris[,-5], centers = 3)
#' # group weights and variable weights are identical in this case
#' out$Wg #' out$W
#' }

groupsparsewkm <- function(X, centers, lambda = NULL, nlambda = 20,
index = 1:ncol(X), sizegroup = TRUE,
nstart = 10, itermaxw = 20, itermaxkm = 10,
scaling = TRUE, verbose = 1, epsilonw = 1e-04)
{
call <- match.call()
check_fun_groupsparsw(X, lambda, nlambda, index, sizegroup, itermaxw, scaling, verbose)

X <- as.matrix(scale(X, center = scaling, scale = scaling))

X <- X[,order(index)]
index <- sort(index)

km <- stats::kmeans(X, centers, nstart, itermaxkm)
clusterini <- km$cluster bss.per.featureini <- weightedss(X, clusterini)$bss.per.feature/nrow(X)
maxlam <-  max(stats::aggregate(bss.per.featureini, by=list(index), function(x){norm.vect(x)/sqrt(length(x))})[,2])
check_maxlambda(lambda, maxlam)

if (is.null(lambda))
lambda <- seq(from = 0, to = (maxlam - 0.001), length.out = nlambda)

# loop over lambda
ls <- sapply(lambda, FUN = function(i) {

# loop over w
cluster1 <- clusterini
bss.per.feature <- bss.per.featureini
w1 <- rep(1/ncol(X), ncol(X))  # initialization of w1 / w1 gives the weight at the i+1 iterations
w0 <- abs(stats::rnorm(ncol(X)))
W_cl <- list()
niter <- 1
while (sum(abs(w1 - w0))/sum(abs(w0)) > epsilonw && niter < itermaxw) {
if (niter != 1) {
Xw <- t(t(X)*sqrt(w1))
km <- stats::kmeans(Xw, centers, nstart = nstart, iter.max = itermaxkm)
cluster1 <- km$cluster bss.per.feature <- weightedss(X, cluster1)$bss.per.feature/nrow(X)
}
w0 <- w1
cluster0 <- cluster1
# Soft thresholding operator
w1 <- groupsoft(bss.per.feature, i, index, sizegroup)
if (norm.vect(w1)==0)
niter <- itermaxw
niter = niter + 1
}  # end loop over w
if (verbose == 1)
if (niter >= itermaxw)
print(paste0("the stopping criterion over w is not satisfied for lambda = ", i)) else
print(paste0("the stopping criterion over w is satisfied for lambda = ", i))

W_cl[[1]] <- w0
W_cl[[2]] <- cluster0
W_cl[[3]] <- bss.per.feature
W_cl
})  # end loop over lambda

W <- do.call(cbind, (ls[1, ]))
cluster <- do.call(cbind, (ls[2, ]))
bss.per.feature <- do.call(cbind, (ls[3, ]))

rownames(W) <- colnames(X)
colnames(W) <- paste("Lambda", 1:length(lambda))

Wg <- stats::aggregate(W, by=list(sort(index)), norm.vect)[,-1]
Wg <- as.matrix(Wg)
rownames(Wg) <- paste("Group", unique(sort(index)))
colnames(Wg) <- paste("Lambda", 1:length(lambda))

selected.features <- apply(W,2,function(x){sum(x>0)})
selected.groups <- apply(Wg,2,function(x){sum(x>0)})

rownames(bss.per.feature) <- colnames(X)
colnames(bss.per.feature) <- paste("Lambda", 1:length(lambda))

rownames(cluster) <- rownames(X)
colnames(cluster)  <- paste("Lambda", 1:length(lambda))

out <- list(call = call, type="NumGroupSparse", lambda = lambda, W = W, Wg = Wg,
cluster = cluster,   sel.feat= selected.features,
sel.groups=selected.groups,
Z = X, index = index, bss.per.feature = bss.per.feature)
class(out) <- "spwkm"
return(out)
}


## Try the vimpclust package in your browser

Any scripts or data that you put into this service are public.

vimpclust documentation built on Jan. 8, 2021, 5:34 p.m.