KMeansClustering | R Documentation |
This function can be used to train a k-means clustering machine learning model in R.
K Means Clustering is widely used Unsupervised Machine Learning Algorithm, this function can be used to perform unsuperwised Clustering or Labelling based on KMC algorithm. This package imports the mini-batch k-means function from ClusterR package which has been developed and written in C++, therefore it is computationally very fast.
clusters
indicate the number of clusters, this is a hyperparameter and must be tuned.
b_size
indicates the the size of the mini batches to be used while fitting the model.
num_rep
indicates the number of times the algorithm shall be run each time with the different centroid seeds chosen randomly.
max_iterations
indicate the maximum number of epochs performed for clustering.
init_fraction
indicates the total percentage of data to be used for the purpose of initialization of the random centroids points, it applies if initializer is set to kmeans++. It shall be of type float with in range of 0 to 1.
initializer
this indicates the method that has been used for the initialization of the centeroids. It can take values of kmeans++, optimal_init, or quantile_init, ususally kmeans++ is used.
early_stop_iterations
indicate the contination foe running the algorithm for given number of iterations after finding one of the best within-cluster-sum-ofsquared-error.
This
field indicates if you want to the progress to be printed on the console or not, It shall be logical either TRUE or FALSE.
centroids
is a matrix of initial cluster centroids. The columns shall be equal to the features in the data and the rows shall be equal to the number of centeroids or clusters.
tolerance
shall be a floating number, in case is an iteration number is > 1 and iteration number is < max_itererations and the tolerance is greater than the squared norm of the centroids, then this is an indication that kmeans clustering algorithm has converged
tolerance_optimal_init
is the tolerance value for the optimal_init type of initializer, the greater value is an indication of well separated clusters.
seed
shall be an integer value for Random Number Generator.
model
this is used for internal purpose for superml.
max_clusters
this can be either a numeric , a contiguous or non-continguous numeric vector specifying search space of the clusters.
This
field indicates if you want to the progress to be printed on the console or not, It shall be logical either TRUE or FALSE.
new()
KMeansClustering$new( clusters, b_size = 10, num_rep = 1, max_iterations = 100, init_fraction = 1, initializer = "kmeans++", early_stop_iterations = 10, verbose = FALSE, centroids = NULL, tolerance = 1e-04, tolerance_optimal_init = 0.3, seed = 1, max_clusters = NA )
clusters
It shall be of type numeric, the value must lie between 0 and 1.
b_size
It shall be of type nuemric, indicates the mini batch size for minibatch C++ package.
num_rep
It shall be of type integer, indicates the number of times the algorithm shall be run each time with the different centroid seeds chosen randomly.
max_iterations
It shall be of type integer indicating maximum number of iterations to be performed.
init_fraction
It shall be of type float,init_fraction indicates the total percentage of data to be used for the purpose of initialization of the random centroids points, it applies if initializer is set to kmeans++. It shall be of type float with in range of 0 to 1.
initializer
It shall be of type character,indicating the initiazer for centeroids most famous is kmeans++.
early_stop_iterations
It shall be of type integer, indication to run the algorithm for number of given interations after the best within-cluster-sum-ofsquared-error has been achieved.
verbose
It shall be of type logical, either TRUE or FALSE, indicating whether progress shall be printed to the console during calculations.
centroids
It shall be a matrix with entities of type integer for float, indicating the initial cluster centroids.
tolerance
It shall be of type float, in case is an iteration number is > 1 and iteration number is < max_itererations and the tolerance is greater than the squared norm of the centroids, then this is an indication that kmeans clustering algorithm has converged a float number. If, in case of an iteration (iteration > 1 and iteration < max_iters) "tol" is greater than the squared norm of the centroids, then kmeans has converged
tolerance_optimal_init
It shall be of type float, tolerance_optimal_init is the tolerance value for the optimal_init type of initializer, the greater value is an indication of well separated clusters.
seed
Its shall be of type integer, indicating the value for Random Number Generator.
max_clusters
max_clusters can be either a numeric , a contiguous or non-continguous numeric vector specifying search space of the clusters.
Create a new KMeansClustering
object.
A KMeansClustering
object.
data_set <- rbind(replicate(30, rnorm(1e4, 3)), replicate(30, rnorm(1e4, -1)), replicate(30, rnorm(1e4, 5))) km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)
fit()
KMeansClustering$fit(X_data, y = NULL, find_optimal = FALSE)
X_data
X_data shall be either a data.frame or a matrix containing the features of interest.
y
y is set to NULL only kept here because of superml general e:g way for every x you have to map it to y.
find_optimal
find_optimal shall be logical, it indicates to search the optimal clusters automatically.
This functions fits the KMeansClustering model
NULL
data_set <- rbind(replicate(30, rnorm(1e4, 3)), replicate(30, rnorm(1e4, -1)), replicate(30, rnorm(1e4, 5))) km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6) km$fit(data_set, find_optimal = FALSE)
predict()
KMeansClustering$predict(X_data)
X_data
it shall be an R Data Frame or Matrix
Returns the prediction on the provided data.
a vector containing predictions
data_set <- rbind(replicate(30, rnorm(1e4, 2)), replicate(30, rnorm(1e4, -1)), replicate(30, rnorm(1e4, 5))) km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6) km$fit(data_set, find_optimal = FALSE) preds <- km$predict(data_set)
clone()
The objects of this class are cloneable with this method.
KMeansClustering$clone(deep = FALSE)
deep
Whether to make a deep clone.
## ------------------------------------------------ ## Method `KMeansClustering$new` ## ------------------------------------------------ data_set <- rbind(replicate(30, rnorm(1e4, 3)), replicate(30, rnorm(1e4, -1)), replicate(30, rnorm(1e4, 5))) km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6) ## ------------------------------------------------ ## Method `KMeansClustering$fit` ## ------------------------------------------------ data_set <- rbind(replicate(30, rnorm(1e4, 3)), replicate(30, rnorm(1e4, -1)), replicate(30, rnorm(1e4, 5))) km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6) km$fit(data_set, find_optimal = FALSE) ## ------------------------------------------------ ## Method `KMeansClustering$predict` ## ------------------------------------------------ data_set <- rbind(replicate(30, rnorm(1e4, 2)), replicate(30, rnorm(1e4, -1)), replicate(30, rnorm(1e4, 5))) km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6) km$fit(data_set, find_optimal = FALSE) preds <- km$predict(data_set)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.