sjc.cluster: Compute hierarchical or kmeans cluster analysis In sjPlot: Data Visualization for Statistics in Social Science

Description

Compute hierarchical or kmeans cluster analysis and return the group association for each observation as vector.

Usage

 ```1 2 3 4 5``` ```sjc.cluster(data, groupcount = NULL, method = c("hclust", "kmeans"), distance = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), agglomeration = c("ward", "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid"), iter.max = 20, algorithm = c("Hartigan-Wong", "Lloyd", "MacQueen")) ```

Arguments

 `data` A data frame with variables that should be used for the cluster analysis. `groupcount` Amount of groups (clusters) used for the cluster solution. May also be a set of initial (distinct) cluster centres, in case `method = "kmeans"` (see `kmeans` for details on `centers` argument). If `groupcount = NULL` and `method = "kmeans"`, the optimal amount of clusters is calculated using the gap statistics (see `sjc.kgap`). For `method = "hclust"`, `groupcount` needs to be specified. Following functions may be helpful for estimating the amount of clusters: Use `sjc.elbow` to determine the group-count depending on the elbow-criterion. If `method = "kmeans"`, use `sjc.kgap` to determine the group-count according to the gap-statistic. If `method = "hclust"` (hierarchical clustering, default), use `sjc.dend` to inspect different cluster group solutions. Use `sjc.grpdisc` to inspect the goodness of grouping (accuracy of classification). `method` Method for computing the cluster analysis. By default (`"kmeans"`), a kmeans cluster analysis will be computed. Use `"hclust"` to compute a hierarchical cluster analysis. You can specify the initial letters only. `distance` Distance measure to be used when `method = "hclust"` (for hierarchical clustering). Must be one of `"euclidean"`, `"maximum"`, `"manhattan"`, `"canberra"`, `"binary"` or `"minkowski"`. See `dist`. If is `method = "kmeans"` this argument will be ignored. `agglomeration` Agglomeration method to be used when `method = "hclust"` (for hierarchical clustering). This should be one of `"ward"`, `"single"`, `"complete"`, `"average"`, `"mcquitty"`, `"median"` or `"centroid"`. Default is `"ward"` (see `hclust`). If `method = "kmeans"` this argument will be ignored. See 'Note'. `iter.max` Maximum number of iterations allowed. Only applies, if `method = "kmeans"`. See `kmeans` for details on this argument. `algorithm` Algorithm used for calculating kmeans cluster. Only applies, if `method = "kmeans"`. May be one of `"Hartigan-Wong"` (default), `"Lloyd"` (used by SPSS), or `"MacQueen"`. See `kmeans` for details on this argument.

Value

The group classification for each observation as vector. This group classification can be used for `sjc.grpdisc`-function to check the goodness of classification. The returned vector includes missing values, so it can be appended to the original data frame `data`.

Note

Since R version > 3.0.3, the `"ward"` option has been replaced by either `"ward.D"` or `"ward.D2"`, so you may use one of these values. When using `"ward"`, it will be replaced by `"ward.D2"`.

To get similar results as in SPSS Quick Cluster function, following points have to be considered:

1. Use the `/PRINT INITIAL` option for SPSS Quick Cluster to get a table with initial cluster centers.

2. Create a `matrix` of this table, by consecutively copying the values, one row after another, from the SPSS output into a matrix and specify `nrow` and `ncol` arguments.

3. Use `algorithm="Lloyd"`.

4. Use the same amount of `iter.max` both in SPSS and this `sjc.qclus`.

This ensures a fixed initial set of cluster centers (as in SPSS), while `kmeans` in R always selects initial cluster sets randomly.

References

Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2014) cluster: Cluster Analysis Basics and Extensions. R package.

Examples

 ```1 2 3 4 5``` ```# Hierarchical clustering of mtcars-dataset groups <- sjc.cluster(mtcars, 5) # K-means clustering of mtcars-dataset groups <- sjc.cluster(mtcars, 5, method="k") ```

sjPlot documentation built on Aug. 23, 2018, 5:03 p.m.