# cluspca: Joint dimension reduction and clustering of continuous data. In clustrd: Methods for Joint Dimension Reduction and Clustering

## Description

This function implements Factorial K-means (Vichi and Kiers, 2001) and Reduced K-means (De Soete and Carroll, 1994), as well as a compromise version of these two methods. The methods combine Principal Component Analysis for dimension reduction with K-means for clustering.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12``` ```cluspca(data, nclus, ndim, alpha = NULL, method = c("RKM","FKM"), center = TRUE, scale = TRUE, rotation = "none", nstart = 100, smartStart = NULL, seed = NULL) ## S3 method for class 'cluspca' print(x, ...) ## S3 method for class 'cluspca' summary(object, ...) ## S3 method for class 'cluspca' fitted(object, mth = c("centers", "classes"), ...) ```

## Arguments

 `data` Dataset with metric variables `nclus` Number of clusters (nclus = 1 returns the PCA solution `ndim` Dimensionality of the solution `method` Specifies the method. Options are RKM for reduced K-means and FKM for factorial K-means (default = `"RKM"`) `alpha` Adjusts for the relative importance of RKM and FKM in the objective function; `alpha` = 0.5 leads to reduced K-means, `alpha` = 0 to factorial K-means, and `alpha` = 1 reduces to the tandem approach (PCA followed by K-means) `center` A logical value indicating whether the variables should be shifted to be zero centered (default = `TRUE)` `scale` A logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place (default = `TRUE)` `rotation` Specifies the method used to rotate the factors. Options are `none` for no rotation, `varimax` for varimax rotation with Kaiser normalization and `promax` for promax rotation (default = `"none"`) `nstart` Number of starts (default = 100) `smartStart` If `NULL` then a random cluster membership vector is generated. Alternatively, a cluster membership vector can be provided as a starting solution `seed` An integer that is used as argument by `set.seed()` for offsetting the random number generator when smartStart = NULL. The default value is NULL. `x` For the `print` method, a class of `clusmca` `object` For the `summary` method, a class of `clusmca` `mth` For the `fitted` method, a character string that specifies the type of fitted value to return: `"centers"` for the observations center vector, or `"class"` for the observations cluster membership value `...` Not used

## Details

For the K-means part, the algorithm of Hartigan-Wong is used by default.

The hidden `print` and `summary` methods print out some key components of an object of class `cluspca`.

The hidden `fitted` method returns cluster fitted values. If method is `"classes"`, this is a vector of cluster membership (the cluster component of the "cluspca" object). If method is `"centers"`, this is a matrix where each row is the cluster center for the observation. The rownames of the matrix are the cluster membership values.

When `nclus` = 1 the function returns the PCA solution and `plot(object)` shows the corresponding biplot.

## Value

 `obscoord` Object scores `attcoord` Variable scores `centroid` Cluster centroids `cluster` Cluster membership `criterion` Optimal value of the objective function `size` The number of objects in each cluster `scale` A copy of `scale` in the return object `center` A copy of `center` in the return object `nstart` A copy of `nstart` in the return object `odata` A copy of `data` in the return object

## References

De Soete, G., and Carroll, J. D. (1994). K-means clustering in a low-dimensional Euclidean space. In Diday E. et al. (Eds.), New Approaches in Classification and Data Analysis, Heidelberg: Springer, 212-219.

Vichi, M., and Kiers, H.A.L. (2001). Factorial K-means analysis for two-way data. Computational Statistics and Data Analysis, 37, 49-64.

`clusmca`, `cluspcamix`, `tuneclus`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25``` ```#Reduced K-means with 3 clusters in 2 dimensions after 10 random starts data(macro) outRKM = cluspca(macro, 3, 2, method = "RKM", rotation = "varimax", scale = FALSE, nstart = 10) summary(outRKM) #Scatterplot (dimensions 1 and 2) and cluster description plot plot(outRKM, cludesc = TRUE) #Factorial K-means with 3 clusters in 2 dimensions #with a Reduced K-means starting solution data(macro) outFKM = cluspca(macro, 3, 2, method = "FKM", rotation = "varimax", scale = FALSE, smartStart = outRKM\$cluster) outFKM #Scatterplot (dimensions 1 and 2) and cluster description plot plot(outFKM, cludesc = TRUE) #To get the Tandem approach (PCA(SVD) + K-means) outTandem = cluspca(macro, 3, 2, alpha = 1, seed = 1234) plot(outTandem) #nclus = 1 just gives the PCA solution #outPCA = cluspca(macro, 1, 2) #outPCA #Scatterplot (dimensions 1 and 2) #plot(outPCA) ```

### Example output

```Loading required package: ggplot2
dummies-1.5.6 provided by Decision Patterns

Solution with 3 clusters of sizes 12 (60%), 5 (25%), 3 (15%) in 2 dimensions. Variables were mean centered and unstandardized.

Cluster centroids:
Dim.1   Dim.2
Cluster 1 -1.1627 -2.9713
Cluster 2 -3.5997  5.9900
Cluster 3 10.6502  1.9020

Variable scores:
Dim.1   Dim.2
GDP  0.0638 -0.1169
LI  -0.1734 -0.0140
UR  -0.0610 -0.4849
IR   0.6662 -0.0344
TB  -0.7179  0.0678
NNS  0.0544  0.8633

Within cluster sum of squares by cluster:
[1] 113.4856  23.2023  45.8149
(between_SS / total_SS =  79.72 %)

Clustering vector:
Australia      Canada     Finland      France       Spain      Sweden
1           1           1           1           1           1
USA Netherlands      Greece      Mexico    Portugal     Austria
1           2           3           3           3           1
Belgium     Denmark     Germany       Italy       Japan      Norway
2           1           1           1           2           2
Switzerland          UK
2           1

Objective criterion value: 431.7131

Available output:

[1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion" "size"
[7] "odata"     "scale"     "center"    "nstart"
\$map

\$parcoord

Solution with 3 clusters of sizes 12 (60%), 5 (25%), 3 (15%) in 2 dimensions. Variables were mean centered and unstandardized.

Cluster centroids:
Dim.1   Dim.2
Cluster 1 -0.2945 -0.8344
Cluster 2 -3.9747  1.7404
Cluster 3  7.8024  0.4367

Variable scores:
Dim.1   Dim.2
GDP  0.2272  0.9209
LI  -0.6554  0.1850
UR   0.0504 -0.1255
IR   0.6648 -0.1139
TB  -0.2666 -0.0412
NNS -0.0574  0.2956

Within cluster sum of squares by cluster:
[1] 26.6997 12.7474  1.0522
(between_SS / total_SS =  87.62 %)

Objective criterion value: 40.4992

Available output:

[1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion" "size"
[7] "odata"     "scale"     "center"    "nstart"
\$map

\$parcoord

Warning messages:
1: Removed 1 rows containing missing values (geom_segment).
2: Removed 1 rows containing missing values (geom_text_repel).
```

clustrd documentation built on May 8, 2019, 5:03 p.m.