fastnmf: Consensus clustering using non-negative matrix factorization
In clusterMI: Cluster Analysis with Missing Values by Multiple Imputation

fastnmf

R Documentation

Consensus clustering using non-negative matrix factorization

Description

From a list of partitions fastnmf pools partition as proposed in Li and Ding (2007) <doi:10.1109/ICDM.2007.98>.

Usage

fastnmf(
  listpart,
  nb.clust,
  method.init = c("BOK", "kmeans"),
  threshold = 10^(-5),
  printflag = TRUE,
  parameter.kmeans = list(nstart = 100, iter.max = 50, algorithm = c("Hartigan-Wong",
    "Lloyd", "Forgy", "MacQueen"), trace = FALSE),
  parameter.minibatchkmeans = list(batch_size = 10, num_init = 1, max_iters = 50,
    init_fraction = 1, initializer = "kmeans++", early_stop_iter = 10, verbose = FALSE,
    CENTROIDS = NULL, tol = 1e-04, tol_optimal_init = 0.3, seed = 1)
)

Arguments

`listpart`	a list of partitions
`nb.clust`	an integer specifying the number of clusters
`method.init`	a vector giving initialisation methods used among "BOK", "kmeans", "minibatchkmeans" "sample". See details.
`threshold`	a real specifying when the NMF algorithm is stoped. Default value is 10^(-5)
`printflag`	a boolean. If TRUE, nmf will print messages on console. Default value is TRUE
`parameter.kmeans`	a list of arguments for kmeans function. See keans help page.
`parameter.minibatchkmeans`	list of arguments for MiniBatchKmeans function. See MiniBatchKmeans help page.

Details

fastnmf performs consensus clustering using non-negative matrix factorization following Li and Ding (2007) <doi:10.1109/ICDM.2007.98>. The set of partitions that are aggregated needs to be given as a list where each element is a vector of numeric values. Note that the number of classes for each partition can vary. The number of classes for the consensus partition should be given using the nb.clust argument. The NMF algorithm is iterative and required an initial partition. This latter is specified by method.init. method.init = "BOK" means the partition considered is a partition from listpart which minimizes the NMF criterion. Alternative methods are "kmeans", "minibathckmeans" or "sample". If method.init = "kmeans" (or "minibatchkmeans"), then clustering on the average of connectivity matrices is performed by kmeans (or "minibatchkmeans"). Mini Batch Kmeans could be faster than kmeans if the number of individuals is large. If method.init = "sample", then a random partition is drawn. If method.init is a vector of several characters, then several initialization methods are considered and the best method is returned. By default, method.init = c("BOK", "kmeans").

Value

For each initialisation method, a list of 5 objets is returned

`Htilde`	A fuzzy disjunctive table
`S`	A positive matrix
`Mtilde`	The average of connectivity matrices
`crit`	A vector with the optimized criterion at each iteration
`cluster`	the consensus partition in nb.clust classes

In addition, the best initialisation method is returned

References

T. Li, C. Ding, and M. I. Jordan (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, ICDM'07, page 577-582, USA. IEEE Computer Society. <doi:10.1109/ICDM.2007.98>

Examples


data(wine, package = "clusterMI")
require(clustrd)
set.seed(123456)
ref <- wine$cult
nb.clust <- 3
m <- 3 # number of imputed data sets. Should be larger in practice
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)

#imputation
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)

#analysis using reduced kmeans

## apply the cluspca function on each imputed data set
res.ana.rkm <- lapply(res.imp$res.imp,
                      FUN = cluspca,
                      nclus = nb.clust,
                      ndim = 2,
                      method= "RKM")
## extract the set of partitions (under "list" format)
res.ana.rkm <-lapply(res.ana.rkm,"[[","cluster")

# pooling by NMF
res.pool.rkm <- fastnmf(res.ana.rkm, nb.clust = nb.clust)
## extract the partition corresponding to the best initialisation
part <- res.pool.rkm$best$clust

clusterMI documentation built on April 4, 2025, 12:55 a.m.