View source: R/distantia_cluster_kmeans.R
distantia_cluster_kmeans | R Documentation |
This function combines the dissimilarity scores computed by distantia()
, the K-means clustering method implemented in stats::kmeans()
, and the clustering optimization method implemented in utils_cluster_hclust_optimizer()
to help group together time series with similar features.
When clusters = NULL
, the function utils_cluster_hclust_optimizer()
is run underneath to perform a parallelized grid search to find the number of clusters maximizing the overall silhouette width of the clustering solution (see utils_cluster_silhouette()
).
This function supports a parallelization setup via future::plan()
, and progress bars provided by the package progressr.
distantia_cluster_kmeans(df = NULL, clusters = NULL, seed = 1)
df |
(required, data frame) Output of |
clusters |
(required, integer) Number of groups to generate. If NULL (default), |
seed |
(optional, integer) Random seed to be used during the K-means computation. Default: 1 |
list:
cluster_object
: kmeans object object for further analyses and custom plotting.
clusters
: integer, number of clusters.
silhouette_width
: mean silhouette width of the clustering solution.
df
: data frame with time series names, their cluster label, and their individual silhouette width scores.
d
: psi distance matrix used for clustering.
optimization
: only if clusters = NULL
, data frame with optimization results from utils_cluster_hclust_optimizer()
.
Other distantia_support:
distantia_aggregate()
,
distantia_boxplot()
,
distantia_cluster_hclust()
,
distantia_matrix()
,
distantia_model_frame()
,
distantia_spatial()
,
distantia_stats()
,
distantia_time_delay()
,
utils_block_size()
,
utils_cluster_hclust_optimizer()
,
utils_cluster_kmeans_optimizer()
,
utils_cluster_silhouette()
#weekly covid prevalence in California
tsl <- tsl_initialize(
x = covid_prevalence,
name_column = "name",
time_column = "time"
)
#subset 10 elements to accelerate example execution
tsl <- tsl_subset(
tsl = tsl,
names = 1:10
)
if(interactive()){
#plotting first three time series
tsl_plot(
tsl = tsl[1:3],
guide_columns = 3
)
}
#dissimilarity analysis
distantia_df <- distantia(
tsl = tsl,
lock_step = TRUE
)
#hierarchical clustering
#automated number of clusters
distantia_kmeans <- distantia_cluster_kmeans(
df = distantia_df,
clusters = NULL
)
#names of the output object
names(distantia_kmeans)
#kmeans object
distantia_kmeans$cluster_object
#distance matrix used for clustering
distantia_kmeans$d
#number of clusters
distantia_kmeans$clusters
#clustering data frame
#group label in column "cluster"
distantia_kmeans$df
#mean silhouette width of the clustering solution
distantia_kmeans$silhouette_width
#kmeans plot
# factoextra::fviz_cluster(
# object = distantia_kmeans$cluster_object,
# data = distantia_kmeans$d,
# repel = TRUE
# )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.