kmeans_procedure: K-means procedure
In helda: Preprocess Data and Get Better Insights from Machine Learning Models

Description Usage Arguments Value Author(s) References Examples

This function allows to perform k-means clustering with constrained on the size of clusters

kmeans_procedure(
  data,
  columns,
  threshold_min,
  threshold_max,
  verbose = FALSE,
  seed = 42
)

`data`	a R data frame.
`columns`	a vector of columns names of the data frame on which we perform the kmeans algorithm. These features have to be numeric.
`threshold_min`	an integer. It represents the minimum size for cluster.
`threshold_max`	an integer. It represents the maximum size fo cluster.
`verbose`	a boolean. If set to TRUE print the current state of the procedure (by default set to FALSE).
`seed`	an integer. This represents the seed for the random call (if we want the output to be reproducible).

a R data frame. This contains the id of the original data frame and a column 'cluster' representing the cluster to which the observation belongs to.

Simon CORDE

Link to the author's github package repository: https://github.com/Redcart/helda

library(dplyr)
data <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
features <- colnames(data)
result <- kmeans_procedure(data = data, columns = features, threshold_min = 2, threshold = 10,
verbose=FALSE, seed=10)