KMeansClustering: K-Means Clustering
In MalikShahidSultan/machinelearning: This Package Provides 2 Algorithms for Performing Machine Learning The K Nearest Neighbour Algorithm is from Supervised Machine Learning and K Means Clustering from Unsupervised Machine Learning. The Package Provides methods of fit and predict which are similar to Python's Scikit Learn Library.

KMeansClustering

R Documentation

K-Means Clustering

Description

This function can be used to train a k-means clustering machine learning model in R.

Details

K Means Clustering is widely used Unsupervised Machine Learning Algorithm, this function can be used to perform unsuperwised Clustering or Labelling based on KMC algorithm. This package imports the mini-batch k-means function from ClusterR package which has been developed and written in C++, therefore it is computationally very fast.

Public fields

clusters: indicate the number of clusters, this is a hyperparameter and must be tuned.
b_size: indicates the the size of the mini batches to be used while fitting the model.
num_rep: indicates the number of times the algorithm shall be run each time with the different centroid seeds chosen randomly.
max_iterations: indicate the maximum number of epochs performed for clustering.
init_fraction: indicates the total percentage of data to be used for the purpose of initialization of the random centroids points, it applies if initializer is set to kmeans++. It shall be of type float with in range of 0 to 1.
initializer: this indicates the method that has been used for the initialization of the centeroids. It can take values of kmeans++, optimal_init, or quantile_init, ususally kmeans++ is used.
early_stop_iterations: indicate the contination foe running the algorithm for given number of iterations after finding one of the best within-cluster-sum-ofsquared-error.
This: field indicates if you want to the progress to be printed on the console or not, It shall be logical either TRUE or FALSE.
centroids: is a matrix of initial cluster centroids. The columns shall be equal to the features in the data and the rows shall be equal to the number of centeroids or clusters.
tolerance: shall be a floating number, in case is an iteration number is > 1 and iteration number is < max_itererations and the tolerance is greater than the squared norm of the centroids, then this is an indication that kmeans clustering algorithm has converged
tolerance_optimal_init: is the tolerance value for the optimal_init type of initializer, the greater value is an indication of well separated clusters.
seed: shall be an integer value for Random Number Generator.
model: this is used for internal purpose for superml.
max_clusters: this can be either a numeric , a contiguous or non-continguous numeric vector specifying search space of the clusters.

Active bindings

This: field indicates if you want to the progress to be printed on the console or not, It shall be logical either TRUE or FALSE.

Methods

Method `new()`

Usage

KMeansClustering$new(
  clusters,
  b_size = 10,
  num_rep = 1,
  max_iterations = 100,
  init_fraction = 1,
  initializer = "kmeans++",
  early_stop_iterations = 10,
  verbose = FALSE,
  centroids = NULL,
  tolerance = 1e-04,
  tolerance_optimal_init = 0.3,
  seed = 1,
  max_clusters = NA
)

Arguments

clusters: It shall be of type numeric, the value must lie between 0 and 1.
b_size: It shall be of type nuemric, indicates the mini batch size for minibatch C++ package.
num_rep: It shall be of type integer, indicates the number of times the algorithm shall be run each time with the different centroid seeds chosen randomly.
max_iterations: It shall be of type integer indicating maximum number of iterations to be performed.
init_fraction: It shall be of type float,init_fraction indicates the total percentage of data to be used for the purpose of initialization of the random centroids points, it applies if initializer is set to kmeans++. It shall be of type float with in range of 0 to 1.
initializer: It shall be of type character,indicating the initiazer for centeroids most famous is kmeans++.
early_stop_iterations: It shall be of type integer, indication to run the algorithm for number of given interations after the best within-cluster-sum-ofsquared-error has been achieved.
verbose: It shall be of type logical, either TRUE or FALSE, indicating whether progress shall be printed to the console during calculations.
centroids: It shall be a matrix with entities of type integer for float, indicating the initial cluster centroids.
tolerance: It shall be of type float, in case is an iteration number is > 1 and iteration number is < max_itererations and the tolerance is greater than the squared norm of the centroids, then this is an indication that kmeans clustering algorithm has converged a float number. If, in case of an iteration (iteration > 1 and iteration < max_iters) "tol" is greater than the squared norm of the centroids, then kmeans has converged
tolerance_optimal_init: It shall be of type float, tolerance_optimal_init is the tolerance value for the optimal_init type of initializer, the greater value is an indication of well separated clusters.
seed: Its shall be of type integer, indicating the value for Random Number Generator.
max_clusters: max_clusters can be either a numeric , a contiguous or non-continguous numeric vector specifying search space of the clusters.

Details

Create a new KMeansClustering object.

Returns

A KMeansClustering object.

Examples

data_set <- rbind(replicate(30, rnorm(1e4, 3)),
             replicate(30, rnorm(1e4, -1)),
             replicate(30, rnorm(1e4, 5)))
km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)

Method `fit()`

Usage

KMeansClustering$fit(X_data, y = NULL, find_optimal = FALSE)

Arguments

X_data: X_data shall be either a data.frame or a matrix containing the features of interest.
y: y is set to NULL only kept here because of superml general e:g way for every x you have to map it to y.
find_optimal: find_optimal shall be logical, it indicates to search the optimal clusters automatically.

Details

This functions fits the KMeansClustering model

Returns

NULL

Examples

data_set <- rbind(replicate(30, rnorm(1e4, 3)),
             replicate(30, rnorm(1e4, -1)),
             replicate(30, rnorm(1e4, 5)))
km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)
km$fit(data_set, find_optimal = FALSE)

Method `predict()`

Usage

KMeansClustering$predict(X_data)

Arguments

X_data: it shall be an R Data Frame or Matrix

Details

Returns the prediction on the provided data.

Returns

a vector containing predictions

Examples

data_set <- rbind(replicate(30, rnorm(1e4, 2)),
             replicate(30, rnorm(1e4, -1)),
             replicate(30, rnorm(1e4, 5)))
km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)
km$fit(data_set, find_optimal = FALSE)
preds <- km$predict(data_set)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

KMeansClustering$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples


## ------------------------------------------------
## Method `KMeansClustering$new`
## ------------------------------------------------

data_set <- rbind(replicate(30, rnorm(1e4, 3)),
             replicate(30, rnorm(1e4, -1)),
             replicate(30, rnorm(1e4, 5)))
km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)

## ------------------------------------------------
## Method `KMeansClustering$fit`
## ------------------------------------------------

data_set <- rbind(replicate(30, rnorm(1e4, 3)),
             replicate(30, rnorm(1e4, -1)),
             replicate(30, rnorm(1e4, 5)))
km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)
km$fit(data_set, find_optimal = FALSE)

## ------------------------------------------------
## Method `KMeansClustering$predict`
## ------------------------------------------------

data_set <- rbind(replicate(30, rnorm(1e4, 2)),
             replicate(30, rnorm(1e4, -1)),
             replicate(30, rnorm(1e4, 5)))
km <- KMeansClustering$new(clusters=2, b_size=30, max_clusters=6)
km$fit(data_set, find_optimal = FALSE)
preds <- km$predict(data_set)

MalikShahidSultan/machinelearning documentation built on May 9, 2022, 8:32 p.m.

MalikShahidSultan/machinelearning index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.