| KMeansClustering | R Documentation |
This function can be used to train a k-means clustering machine learning model in R.
K Means Clustering is widely used Unsupervised Machine Learning Algorithm, this function can be used to perform unsuperwised Clustering or Labelling based on KMC algorithm. This package imports the mini-batch k-means function from ClusterR package which has been developed and written in C++, therefore it is computationally very fast.
clustersindicate the number of clusters, this is a hyperparameter and must be tuned.
b_sizeindicates the the size of the mini batches to be used while fitting the model.
num_repindicates the number of times the algorithm shall be run each time with the different centroid seeds chosen randomly.
max_iterationsindicate the maximum number of epochs performed for clustering.
init_fractionindicates the total percentage of data to be used for the purpose of initialization of the random centroids points, it applies if initializer is set to kmeans++. It shall be of type float with in range of 0 to 1.
initializerthis indicates the method that has been used for the initialization of the centeroids. It can take values of kmeans++, optimal_init, or quantile_init, ususally kmeans++ is used.
early_stop_iterationsindicate the contination foe running the algorithm for given number of iterations after finding one of the best within-cluster-sum-ofsquared-error.
Thisfield indicates if you want to the progress to be printed on the console or not, It shall be logical either TRUE or FALSE.
centroidsis a matrix of initial cluster centroids. The columns shall be equal to the features in the data and the rows shall be equal to the number of centeroids or clusters.
toleranceshall be a floating number, in case is an iteration number is > 1 and iteration number is < max_itererations and the tolerance is greater than the squared norm of the centroids, then this is an indication that kmeans clustering algorithm has converged
tolerance_optimal_initis the tolerance value for the optimal_init type of initializer, the greater value is an indication of well separated clusters.
seedshall be an integer value for Random Number Generator.
modelthis is used for internal purpose for superml.
max_clustersthis can be either a numeric , a contiguous or non-continguous numeric vector specifying search space of the clusters.
Thisfield indicates if you want to the progress to be printed on the console or not, It shall be logical either TRUE or FALSE.
new()KMeansClustering$new( clusters, b_size = 10, num_rep = 1, max_iterations = 100, init_fraction = 1, initializer = "kmeans++", early_stop_iterations = 10, verbose = FALSE, centroids = NULL, tolerance = 1e-04, tolerance_optimal_init = 0.3, seed = 1, max_clusters = NA )
clustersIt shall be of type numeric, the value must lie between 0 and 1.
b_sizeIt shall be of type nuemric, indicates the mini batch size for minibatch C++ package.
num_repIt shall be of type integer, indicates the number of times the algorithm shall be run each time with the different centroid seeds chosen randomly.
max_iterationsIt shall be of type integer indicating maximum number of iterations to be performed.
init_fractionIt shall be of type float,init_fraction indicates the total percentage of data to be used for the purpose of initialization of the random centroids points, it applies if initializer is set to kmeans++. It shall be of type float with in range of 0 to 1.
initializerIt shall be of type character,indicating the initiazer for centeroids most famous is kmeans++.
early_stop_iterationsIt shall be of type integer, indication to run the algorithm for number of given interations after the best within-cluster-sum-ofsquared-error has been achieved.
verboseIt shall be of type logical, either TRUE or FALSE, indicating whether progress shall be printed to the console during calculations.
centroidsIt shall be a matrix with entities of type integer for float, indicating the initial cluster centroids.
toleranceIt shall be of type float, in case is an iteration number is > 1 and iteration number is < max_itererations and the tolerance is greater than the squared norm of the centroids, then this is an indication that kmeans clustering algorithm has converged a float number. If, in case of an iteration (iteration > 1 and iteration < max_iters) "tol" is greater than the squared norm of the centroids, then kmeans has converged
tolerance_optimal_initIt shall be of type float, tolerance_optimal_init is the tolerance value for the optimal_init type of initializer, the greater value is an indication of well separated clusters.
seedIts shall be of type integer, indicating the value for Random Number Generator.
max_clustersmax_clusters can be either a numeric , a contiguous or non-continguous numeric vector specifying search space of the clusters.
Create a new KMeansClustering object.
A KMeansClustering object.
data_set <- rbind(replicate(30, rnorm(1e4, 3)),
replicate(30, rnorm(1e4, -1)),
replicate(30, rnorm(1e4, 5)))
km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)
fit()KMeansClustering$fit(X_data, y = NULL, find_optimal = FALSE)
X_dataX_data shall be either a data.frame or a matrix containing the features of interest.
yy is set to NULL only kept here because of superml general e:g way for every x you have to map it to y.
find_optimalfind_optimal shall be logical, it indicates to search the optimal clusters automatically.
This functions fits the KMeansClustering model
NULL
data_set <- rbind(replicate(30, rnorm(1e4, 3)),
replicate(30, rnorm(1e4, -1)),
replicate(30, rnorm(1e4, 5)))
km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)
km$fit(data_set, find_optimal = FALSE)
predict()KMeansClustering$predict(X_data)
X_datait shall be an R Data Frame or Matrix
Returns the prediction on the provided data.
a vector containing predictions
data_set <- rbind(replicate(30, rnorm(1e4, 2)),
replicate(30, rnorm(1e4, -1)),
replicate(30, rnorm(1e4, 5)))
km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)
km$fit(data_set, find_optimal = FALSE)
preds <- km$predict(data_set)
clone()The objects of this class are cloneable with this method.
KMeansClustering$clone(deep = FALSE)
deepWhether to make a deep clone.
## ------------------------------------------------
## Method `KMeansClustering$new`
## ------------------------------------------------
data_set <- rbind(replicate(30, rnorm(1e4, 3)),
replicate(30, rnorm(1e4, -1)),
replicate(30, rnorm(1e4, 5)))
km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)
## ------------------------------------------------
## Method `KMeansClustering$fit`
## ------------------------------------------------
data_set <- rbind(replicate(30, rnorm(1e4, 3)),
replicate(30, rnorm(1e4, -1)),
replicate(30, rnorm(1e4, 5)))
km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)
km$fit(data_set, find_optimal = FALSE)
## ------------------------------------------------
## Method `KMeansClustering$predict`
## ------------------------------------------------
data_set <- rbind(replicate(30, rnorm(1e4, 2)),
replicate(30, rnorm(1e4, -1)),
replicate(30, rnorm(1e4, 5)))
km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)
km$fit(data_set, find_optimal = FALSE)
preds <- km$predict(data_set)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.