fastnmf: Consensus clustering using non-negative matrix factorization

View source: R/fastnmf.R

fastnmfR Documentation

Consensus clustering using non-negative matrix factorization

Description

From a list of partitions fastnmf pools partition as proposed in Li and Ding (2007) <doi:10.1109/ICDM.2007.98>.

Usage

fastnmf(
  listpart,
  nb.clust,
  method.init = c("BOK", "kmeans"),
  threshold = 10^(-5),
  printflag = TRUE,
  parameter.kmeans = list(nstart = 100, iter.max = 50, algorithm = c("Hartigan-Wong",
    "Lloyd", "Forgy", "MacQueen"), trace = FALSE),
  parameter.minibatchkmeans = list(batch_size = 10, num_init = 1, max_iters = 50,
    init_fraction = 1, initializer = "kmeans++", early_stop_iter = 10, verbose = FALSE,
    CENTROIDS = NULL, tol = 1e-04, tol_optimal_init = 0.3, seed = 1)
)

Arguments

listpart

a list of partitions

nb.clust

an integer specifying the number of clusters

method.init

a vector giving initialisation methods used among "BOK", "kmeans", "minibatchkmeans" "sample". See details.

threshold

a real specifying when the NMF algorithm is stoped. Default value is 10^(-5)

printflag

a boolean. If TRUE, nmf will print messages on console. Default value is TRUE

parameter.kmeans

a list of arguments for kmeans function. See keans help page.

parameter.minibatchkmeans

list of arguments for MiniBatchKmeans function. See MiniBatchKmeans help page.

Details

fastnmf performs consensus clustering using non-negative matrix factorization following Li and Ding (2007) <doi:10.1109/ICDM.2007.98>. The set of partitions that are aggregated needs to be given as a list where each element is a vector of numeric values. Note that the number of classes for each partition can vary. The number of classes for the consensus partition should be given using the nb.clust argument. The NMF algorithm is iterative and required an initial partition. This latter is specified by method.init. method.init="BOK" means the partition considered is a partition from listpart which minimizes the NMF criterion. Alternative methods are "kmeans", "minibathckmeans" or "sample". If method.init = "kmeans" (or "minibatchkmeans"), then clustering on the average of connectivity matrices is performed by kmeans (or "minibatchkmeans"). Mini Batch Kmeans could be faster than kmeans if the number of invididuals is large. If method.init = "sample", then a random partition is drawn. If method.init is a vector of several characters, then several initialization methods are considered and the best method is returned. By default, method.init= c("BOK", "kmeans").

Value

For each initialisation method, a list of 5 objets is returned

Htilde

A fuzzy disjunctive table

S

A positive matrix

Mtilde

The average of connectivity matrices

crit

A vector with the optimized criterion at each iteration

cluster

the consensus partition in nb.clust classes

In addition, the best initialisation method is returned

References

T. Li, C. Ding, and M. I. Jordan (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, ICDM'07, page 577-582, USA. IEEE Computer Society. <doi:10.1109/ICDM.2007.98>

See Also

kmeans MiniBatchKmeans

Examples

data(wine)
require(clustrd)
set.seed(123456)
ref <- wine$cult
nb.clust <- 3
m <- 3 # number of imputed data sets. Should be larger in practice
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)

#imputation
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)

#analysis using reduced kmeans

## apply the cluspca function on each imputed data set
res.ana.rkm <- lapply(res.imp$res.imp,
                      FUN = cluspca,
                      nclus = nb.clust,
                      ndim = 2,
                      method= "RKM")
## extract the set of partitions (under "list" format)
res.ana.rkm <-lapply(res.ana.rkm,"[[","cluster")

# pooling by NMF
res.pool.rkm <- fastnmf(res.ana.rkm, nb.clust = nb.clust)
## extract the partition corresponding to the best initialisation
part <- res.pool.rkm$best$clust


clusterMI documentation built on Oct. 23, 2024, 5:07 p.m.