dpcakm | R Documentation |
Performs simultaneously k-means partitioning on units and disjoint PCA on the variables, computing each principal component from a different subset of variables. The result is a simplified, easier to interpret loading matrix A, the principal components and the clustering. The reduced subspace is identified by the centroids.
dpcakm(X, K, Q, Rndstart, verbose, maxiter, tol, constr, print, prep)
X |
Units x variables numeric data matrix. |
K |
Number of clusters for the units. |
Q |
Number of principal components. |
Rndstart |
Number of runs to be performed (Defaults is 20). |
verbose |
Outputs basic summary statistics for each run (1 = enabled; 0 = disabled, default option). |
maxiter |
Maximum number of iterations allowed (if convergence is not yet reached. Default is 100). |
tol |
Tolerance threshold (maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed. Default is 1e-6). |
constr |
is a vector of length J = nr. of variables, pre-specifying to which cluster some of the variables must be assigned. Each component of the vector can assume integer values from 1 o Q = nr. of variable-cluster / principal components (See examples for more details), or 0 if no constraint on the variable is imposed (i.e., it will be assigned based on the plain algorithm). |
print |
Prints summary statistics of the results (1 = enabled; 0 = disabled, default option). |
prep |
Pre-processing of the data. 1 performs the z-score transform (default choice); 2 performs the min-max transform; 0 leaves the data un-pre-processed. |
returns a list of estimates and some descriptive quantities of the final results.
V |
Variables x factors membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which cluster each variable has been assigned. |
U |
Units x clusters membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which cluster each unit has been assigned. |
A |
Variables x components loading matrix. |
centers |
K x Q matrix of centers containing the row means expressed in the reduced space of Q principal components. |
totss |
The total sum of squares (scalar). |
withinss |
Vector of within-cluster sum of squares, one component per cluster. |
betweenss |
Amount of deviance captured by the model (scalar). |
K-size |
Number of units assigned to each row-cluster (vector). |
Q-size |
Number of variables assigned to each column-cluster (vector). |
pseudoF |
Calinski-Harabasz index of the resulting partition (scalar). |
loop |
The index of the (best) run from which the results have been chosen. |
it |
the number of iterations performed during the (best) run. |
Ionel Prunila, Maurizio Vichi
Vichi M., Saporta G. (2009) "Clustering and disjoint principal component analysis" <doi:10.1016/j.csda.2008.05.028>
# Iris data
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5])
# No constraint on variables
out <- dpcakm(iris, K = 3, Q = 2, Rndstart = 5)
# Constraint: the first two variables must contribute to the same factor.
outc <- dpcakm(iris, K = 3, Q = 2, Rndstart = 5,constr = c(1,1,0,0))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.