fastnmf | R Documentation |
From a list of partitions fastnmf
pools partition as proposed in Li and Ding (2007) <doi:10.1109/ICDM.2007.98>.
fastnmf(
listpart,
nb.clust,
method.init = c("BOK", "kmeans"),
threshold = 10^(-5),
printflag = TRUE,
parameter.kmeans = list(nstart = 100, iter.max = 50, algorithm = c("Hartigan-Wong",
"Lloyd", "Forgy", "MacQueen"), trace = FALSE),
parameter.minibatchkmeans = list(batch_size = 10, num_init = 1, max_iters = 50,
init_fraction = 1, initializer = "kmeans++", early_stop_iter = 10, verbose = FALSE,
CENTROIDS = NULL, tol = 1e-04, tol_optimal_init = 0.3, seed = 1)
)
listpart |
a list of partitions |
nb.clust |
an integer specifying the number of clusters |
method.init |
a vector giving initialisation methods used among "BOK", "kmeans", "minibatchkmeans" "sample". See details. |
threshold |
a real specifying when the NMF algorithm is stoped. Default value is 10^(-5) |
printflag |
a boolean. If TRUE, nmf will print messages on console. Default value is TRUE |
parameter.kmeans |
a list of arguments for kmeans function. See keans help page. |
parameter.minibatchkmeans |
list of arguments for MiniBatchKmeans function. See MiniBatchKmeans help page. |
fastnmf performs consensus clustering using non-negative matrix factorization following Li and Ding (2007) <doi:10.1109/ICDM.2007.98>. The set of partitions that are aggregated needs to be given as a list where each element is a vector of numeric values. Note that the number of classes for each partition can vary. The number of classes for the consensus partition should be given using the nb.clust
argument. The NMF algorithm is iterative and required an initial partition. This latter is specified by method.init
. method.init="BOK"
means the partition considered is a partition from listpart
which minimizes the NMF criterion. Alternative methods are "kmeans", "minibathckmeans" or "sample". If method.init
= "kmeans" (or "minibatchkmeans"), then clustering on the average of connectivity matrices is performed by kmeans (or "minibatchkmeans"). Mini Batch Kmeans could be faster than kmeans if the number of invididuals is large. If method.init
= "sample", then a random partition is drawn. If method.init
is a vector of several characters, then several initialization methods are considered and the best method is returned. By default, method.init= c("BOK", "kmeans")
.
For each initialisation method, a list of 5 objets is returned
Htilde |
A fuzzy disjunctive table |
S |
A positive matrix |
Mtilde |
The average of connectivity matrices |
crit |
A vector with the optimized criterion at each iteration |
cluster |
the consensus partition in nb.clust classes |
In addition, the best initialisation method is returned
T. Li, C. Ding, and M. I. Jordan (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, ICDM'07, page 577-582, USA. IEEE Computer Society. <doi:10.1109/ICDM.2007.98>
kmeans
MiniBatchKmeans
data(wine)
require(clustrd)
set.seed(123456)
ref <- wine$cult
nb.clust <- 3
m <- 3 # number of imputed data sets. Should be larger in practice
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)
#imputation
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)
#analysis using reduced kmeans
## apply the cluspca function on each imputed data set
res.ana.rkm <- lapply(res.imp$res.imp,
FUN = cluspca,
nclus = nb.clust,
ndim = 2,
method= "RKM")
## extract the set of partitions (under "list" format)
res.ana.rkm <-lapply(res.ana.rkm,"[[","cluster")
# pooling by NMF
res.pool.rkm <- fastnmf(res.ana.rkm, nb.clust = nb.clust)
## extract the partition corresponding to the best initialisation
part <- res.pool.rkm$best$clust
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.