adproclus: Additive profile clustering

Description Usage Arguments Details Value References See Also Examples

View source: R/main.R

Description

Perform ADditive PROfile CLUStering (ADPRCOLUS) on object by variable data.

Usage

1
2
adproclus(data, centers, nstart = 1L, algorithm = "ALS1",
  SaveAllStarts = FALSE)

Arguments

data

object-by-variable data matrix of class matrix or data.frame.

centers

either the number of clusters k, or a matrix of initial (distinct) cluster centres. If a number k, a random set of k rows in data is chosen as initial centres.

nstart

if centers is a number, a vector of length 1 or 2 denoting the number of random and rational starts to be performed.

algorithm

character string "ALS1" (default) or "ALS2", denoting the type of alternating least squares algorithm.

SaveAllStarts

logical. If TRUE, the results of all algorithm starts are returned. By default, only the best solution is retained.

Details

In this function, Mirkin's (1987, 1990) Aditive Profile Clustering (ADPROCLUS) method is used to obtain an unrestricted overlapping clustering model of the object by variable data provided by data.

The ADPROCLUS model approximates an I x J object by variable data matrix X by an I x J model matrix M that can be decomposed into an I x K binary cluster membership matrix A and a K x J real-valued cluster profile matrix P, with K indicating the number of overlapping clusters. In particular, the aim of an ADPROCLUS analysis is therefore, given a number of clusters k, to estimate a model matrix M = AP that reconstructs data matrix X as close as possible in a least squares sense (i.e. sum of squared residuals). For a detailed illustration of the ADPROCLUS model and associated loss function, see Wilderjans et al., 2011.

The alternating least squares algorithms ("ALS1" and "ALS2") that can be used for minimization of the loss function were proposed by Depril et al. (2008). In "ALS2", starting from an initial random or rational estimate of A (see getRandom and getRational), A and P are alternately re-estimated conditionally upon each other until convergence. The "ALS1" algorithm differs from the one previous one in that each row in A is updated independently and that the conditionally optimal P is recalculated after each row update, instead of the end of the matrix. For a discussion and comparison of the different algorithms, see Depril et al., 2008.

Warning: Computation time increases exponentially with increasing number of clusters, k! We recommend to determine the computation time of a single start for each specific dataset and k before employing a multistart procedure.

Value

By default, adproclus returns a list with the following components: (If SaveAllStarts is TRUE, a list is returned for each start of the algorithm)

Model

matrix. The obtained overlapping clustering model M of the same size as data.

Membs

matrix. The membership matrix A of the clustering model.

Profs

matrix. The profile matrix P of the clustering model.

sse

numeric. The residual sum of sqares of the clustering model, which is minimised by the ALS algorithm.

totvar

numeric. The total sum of squares ofdata.

explvar

numeric. The proportion of variance in data that is accounted for by the clustering model.

alg_iter

numeric. The number of iterations of the algorithm.

timer

numeric. The amount of time (in seconds) the algorithm ran for.

initialStart

list. A list containing initial membership and profile matrices, as well as the type of start that was used to obtain the clustering solution. (as returned by getRandom or getRational)

References

Wilderjans, T. F., Ceulemans, E., Van Mechelen, I., & Depril, D. (2010). ADPROCLUS: a graphical user interface for fitting additive profile clustering models to object by variable data matrices. Behavior Research Methods, 43(1), 56-65.

Depril, D., Van Mechelen, I., & Mirkin, B. (2008). Algorithms for additive clustering of rectangular data tables. Computational Statistics and Data Analysis, 52, 4923-4938.

Mirkin, B. G. (1987). The method of principal clusters. Automation and Remote Control, 10:131-143.

Mirkin, B. G. (1990). A sequential fitting procedure for linear data analysis models. Journal of Classification, 7(2):167-195.

See Also

getRandom and getRational for generating random and rational starts for ADORCLUS.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Loading a test dataset into the global environment
x <- ADPROCLUS::CGdata

# Quick clustering with K = 3 clusters
clust <- adproclus(x, 3)

# Clustering with K = 4 clusters,
# using the ALS2 algorithm,
# with 5 random and 5 rational starts
clust <- adproclus(x, 4, c(5,5), "ALS2")

# Saving the results of all starts
clust <- adproclus(x, 3, c(2,2), SaveAllStarts = TRUE)

# Clustering using a user-defined rational start
start <- getRational(x,3)
clust <- adproclus(x, start$P)

JRBCH/ADPROCLUS documentation built on Oct. 30, 2019, 7:33 p.m.