View source: R/cluster_analysis.R
cluster_analysis | R Documentation |
Compute hierarchical or kmeans cluster analysis and return the group assignment for each observation as vector.
cluster_analysis( x, n = NULL, method = "kmeans", include_factors = FALSE, standardize = TRUE, verbose = TRUE, distance_method = "euclidean", hclust_method = "complete", kmeans_method = "Hartigan-Wong", dbscan_eps = 15, iterations = 100, ... )
x |
A data frame. |
n |
Number of clusters used for supervised cluster methods. If |
method |
Method for computing the cluster analysis. Can be |
include_factors |
Logical, if |
standardize |
Standardize the dataframe before clustering (default). |
verbose |
Toggle warnings and messages. |
distance_method |
Distance measure to be used for methods based on
distances (e.g., when |
hclust_method |
Agglomeration method to be used when |
kmeans_method |
Algorithm used for calculating kmeans cluster. Only applies,
if |
dbscan_eps |
The |
iterations |
The number of replications. |
... |
Arguments passed to or from other methods. |
The print()
and plot()
methods show the (standardized) mean value for
each variable within each cluster. Thus, a higher absolute value indicates
that a certain variable characteristic is more pronounced within that
specific cluster (as compared to other cluster groups with lower absolute
mean values).
Clusters classification can be obtained via print(x, newdata = NULL, ...)
.
The group classification for each observation as vector. The
returned vector includes missing values, so it has the same length
as nrow(x)
.
There is also a plot()
-method implemented in the see-package.
Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2014) cluster: Cluster Analysis Basics and Extensions. R package.
n_clusters()
to determine the number of clusters to extract.
cluster_discrimination()
to determine the accuracy of cluster group
classification via linear discriminant analysis (LDA).
performance::check_clusterstructure()
to check suitability of data
for clustering.
https://www.datanovia.com/en/lessons/
set.seed(33) # K-Means ==================================================== rez <- cluster_analysis(iris[1:4], n = 3, method = "kmeans") rez # Show results predict(rez) # Get clusters summary(rez) # Extract the centers values (can use 'plot()' on that) if (requireNamespace("MASS", quietly = TRUE)) { cluster_discrimination(rez) # Perform LDA } # Hierarchical k-means (more robust k-means) if (require("factoextra", quietly = TRUE)) { rez <- cluster_analysis(iris[1:4], n = 3, method = "hkmeans") rez # Show results predict(rez) # Get clusters } # Hierarchical Clustering (hclust) =========================== rez <- cluster_analysis(iris[1:4], n = 3, method = "hclust") rez # Show results predict(rez) # Get clusters # K-Medoids (pam) ============================================ if (require("cluster", quietly = TRUE)) { rez <- cluster_analysis(iris[1:4], n = 3, method = "pam") rez # Show results predict(rez) # Get clusters } # PAM with automated number of clusters if (require("fpc", quietly = TRUE)) { rez <- cluster_analysis(iris[1:4], method = "pamk") rez # Show results predict(rez) # Get clusters } # DBSCAN ==================================================== if (require("dbscan", quietly = TRUE)) { # Note that you can assimilate more outliers (cluster 0) to neighbouring # clusters by setting borderPoints = TRUE. rez <- cluster_analysis(iris[1:4], method = "dbscan", dbscan_eps = 1.45) rez # Show results predict(rez) # Get clusters } # Mixture ==================================================== if (require("mclust", quietly = TRUE)) { library(mclust) # Needs the package to be loaded rez <- cluster_analysis(iris[1:4], method = "mixture") rez # Show results predict(rez) # Get clusters }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.