| KMeansTrainer | R Documentation |
Trains a k-means machine learning model in R
Trains a unsupervised K-Means clustering algorithm. It borrows mini-batch k-means function from ClusterR package written in c++, hence it is quite fast.
clustersthe number of clusters
batch_sizethe size of the mini batches
num_initnumber of times the algorithm will be run with different centroid seeds
max_itersthe maximum number of clustering iterations
init_fractionpercentage of data to use for the initialization centroids (applies if initializer is kmeans++ or optimal_init). Should be a float number between 0.0 and 1.0.
initializerthe method of initialization. One of, optimal_init, quantile_init, kmeans++ and random.
early_stop_itercontinue that many iterations after calculation of the best within-cluster-sum-ofsquared-error
verboseeither TRUE or FALSE, indicating whether progress is printed during clustering
centroidsa matrix of initial cluster centroids. The rows of the CENTROIDS matrix should be equal to the number of clusters and the columns should be equal to the columns of the data
tola float number. If, in case of an iteration (iteration > 1 and iteration < max_iters) "tol" is greater than the squared norm of the centroids, then kmeans has converged
tol_optimal_inittolerance value for the ’optimal_init’ initializer. The higher this value is, the far appart from each other the centroids are.
seedinteger value for random number generator (RNG)
modeluse for internal purpose
max_clusterseither a numeric value, a contiguous or non-continguous numeric vector specifying the cluster search space
new()KMeansTrainer$new( clusters, batch_size = 10, num_init = 1, max_iters = 100, init_fraction = 1, initializer = "kmeans++", early_stop_iter = 10, verbose = FALSE, centroids = NULL, tol = 1e-04, tol_optimal_init = 0.3, seed = 1, max_clusters = NA )
clustersnumeric, When building the vocabulary ignore terms that have a document frequency strictly lower than the given threshold, value lies between 0 and 1.
batch_sizenuemric, When building the vocabulary ignore terms that have a document frequency strictly higher than the given threshold, value lies between 0 and 1.
num_initinteger, use top features sorted by count to be used in bag of words matrix.
max_iterscharacter, regex expression to use for text cleaning.
init_fractionlist, a list of stopwords to use, by default it uses its inbuilt list of standard stopwords
initializercharacter, splitting criteria for strings, default: " "
early_stop_itercontinue that many iterations after calculation of the best within-cluster-sum-ofsquared-error
verboseeither TRUE or FALSE, indicating whether progress is printed during clustering
centroidsa matrix of initial cluster centroids. The rows of the CENTROIDS matrix should be equal to the number of clusters and the columns should be equal to the columns of the data
tola float number. If, in case of an iteration (iteration > 1 and iteration < max_iters) "tol" is greater than the squared norm of the centroids, then kmeans has converged
tol_optimal_inittolerance value for the ’optimal_init’ initializer. The higher this value is, the far appart from each other the centroids are.
seedinteger value for random number generator (RNG)
max_clusterseither a numeric value, a contiguous or non-continguous numeric vector specifying the cluster search space
Create a new 'KMeansTrainer' object.
A 'KMeansTrainer' object.
data <- rbind(replicate(20, rnorm(1e4, 2)),
replicate(20, rnorm(1e4, -1)),
replicate(20, rnorm(1e4, 5)))
km_model <- KMeansTrainer$new(clusters=2, batch_size=30, max_clusters=6)
fit()KMeansTrainer$fit(X, y = NULL, find_optimal = FALSE)
Xdata.frame or matrix containing features
yNULL only kept here for superml's standard way
find_optimallogical, to find the optimal clusters automatically
Trains the KMeansTrainer model
NULL
data <- rbind(replicate(20, rnorm(1e4, 2)),
replicate(20, rnorm(1e4, -1)),
replicate(20, rnorm(1e4, 5)))
km_model <- KMeansTrainer$new(clusters=2, batch_size=30, max_clusters=6)
km_model$fit(data, find_optimal = FALSE)
predict()KMeansTrainer$predict(X)
Xdata.frame or matrix
Returns the prediction on test data
a vector of predictions
data <- rbind(replicate(20, rnorm(1e4, 2)),
replicate(20, rnorm(1e4, -1)),
replicate(20, rnorm(1e4, 5)))
km_model <- KMeansTrainer$new(clusters=2, batch_size=30, max_clusters=6)
km_model$fit(data, find_optimal = FALSE)
predictions <- km_model$predict(data)
clone()The objects of this class are cloneable with this method.
KMeansTrainer$clone(deep = FALSE)
deepWhether to make a deep clone.
## ------------------------------------------------
## Method `KMeansTrainer$new`
## ------------------------------------------------
data <- rbind(replicate(20, rnorm(1e4, 2)),
replicate(20, rnorm(1e4, -1)),
replicate(20, rnorm(1e4, 5)))
km_model <- KMeansTrainer$new(clusters=2, batch_size=30, max_clusters=6)
## ------------------------------------------------
## Method `KMeansTrainer$fit`
## ------------------------------------------------
data <- rbind(replicate(20, rnorm(1e4, 2)),
replicate(20, rnorm(1e4, -1)),
replicate(20, rnorm(1e4, 5)))
km_model <- KMeansTrainer$new(clusters=2, batch_size=30, max_clusters=6)
km_model$fit(data, find_optimal = FALSE)
## ------------------------------------------------
## Method `KMeansTrainer$predict`
## ------------------------------------------------
data <- rbind(replicate(20, rnorm(1e4, 2)),
replicate(20, rnorm(1e4, -1)),
replicate(20, rnorm(1e4, 5)))
km_model <- KMeansTrainer$new(clusters=2, batch_size=30, max_clusters=6)
km_model$fit(data, find_optimal = FALSE)
predictions <- km_model$predict(data)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.