R/RcppExports.R

Defines functions mini_batch compute_wcss predict_mini_batch

Documented in compute_wcss mini_batch predict_mini_batch

# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

#' Predict_mini_batch
#'
#' Prediction function for mini-batch k-means applied to matrix-like objects.
#'
#'
#'@param data matrix-like objectcontaining numeric or
#'  integer data (obseravtions in rows, variables in columns).
#'@param CENTROIDS a matrix of initial cluster centroids. The rows of the
#'  CENTROIDS matrix should be equal to the number of clusters and the columns
#'  should equal the columns of the data.
#'@return it returns a vector with the clusters.
#'@details
#'
#'This function takes the data and the output centroids and returns the
#'clusters.
#'
#'This implementation relies very heavily on the
#'\code{\link[ClusterR]{MiniBatchKmeans}} implementation. We provide the
#'ability to work with other matrix-like objects other than base matrices (e.g,
#'DelayedMatrix and HDF5Matrix) through the \code{beachmat}
#'library.
#'
#'@author Yuwei Ni
#'
#'@examples
#'data(iris)
#'km = mini_batch(as.matrix(iris[,1:4]), clusters = 3,
#'                batch_size = 10, max_iters = 10)
#'clusters = predict_mini_batch(as.matrix(iris[,1:4]),
#'                              CENTROIDS = km$centroids)
#' @export
predict_mini_batch <- function(data, CENTROIDS) {
    .Call(`_mbkmeans_predict_mini_batch`, data, CENTROIDS)
}

#' Compute Whithin-Cluster Sum of Squares
#'
#' Given a vector of cluster labels, a matrix of centroids, and a dataset, it
#' computes the WCSS.
#'
#'@param clusters numeric vector with the cluster assignments.
#'@param cent numeric matrix with the centroids (clusters in rows, variables
#'  in columns).
#'@param data matrix-like object containing the data (numeric or integer).
#'
#'@return A numeric vector with the value of WCSS per cluster.
#'
#'@examples
#'data = matrix(1:30,nrow = 10)
#'cl <- mini_batch(data, 2, 10, 10)
#'compute_wcss(cl$Clusters, cl$centroids, data)
#'
#' @export
compute_wcss <- function(clusters, cent, data) {
    .Call(`_mbkmeans_compute_wcss`, clusters, cent, data)
}

#' Mini_batch
#'
#' Mini-batch-k-means for matrix-like objects
#'
#'@param data numeric or integer matrix-like object.
#'@param clusters the number of clusters.
#'@param batch_size the size of the mini batches.
#'@param num_init number of times the algorithm will be run with different
#'  centroid seeds.
#'@param max_iters the maximum number of clustering iterations.
#'@param init_fraction percentage of data to use for the initialization
#'  centroids (applies if initializer is \emph{kmeans++} ). Should be a float
#'  number between 0.0 and 1.0.
#'@param initializer the method of initialization. One of \emph{kmeans++} and
#'  \emph{random}. See details for more information.
#'@param compute_labels logical indicating whether to compute the final cluster
#'  labels.
#'@param calc_wcss logical indicating whether the within-cluster sum of squares
#'  should be computed and returned (ignored if `compute_labels = FALSE`).
#'@param early_stop_iter continue that many iterations after calculation of the
#'  best within-cluster-sum-of-squared-error.
#'@param verbose logical indicating whether progress is printed on screen.
#'@param CENTROIDS an optional matrix of initial cluster centroids. The rows of
#'  the CENTROIDS matrix should be equal to the number of clusters and the
#'  columns should be equal to the columns of the data.
#'@param tol convergence tolerance.
#'@return
#'a list with the following attributes:
#'
#'centroids: the final centroids;
#'
#'WCSS_per_cluster (optional): the final per-cluster WCSS.
#'
#'best_initialization: which initialization value led to the best WCSS
#'solution;
#'
#'iters_per_initialization: number of iterations per each initialization;
#'
#'Clusters (optional): the final cluster labels.
#'
#'@details This function performs k-means clustering using mini batches. It was
#'inspired by the implementation in https://github.com/mlampros/ClusterR.
#'
#'The input matrix can be in any format supported by the `DelayedArray` /
#'`beachmat` framework, including the matrix classes defined in the `Matrix`
#'package and the `HDFMatrix` class.
#'
#'There are two possible initializations.
#'
#'\strong{kmeans++}: kmeans++ initialization.
#'
#'\strong{random}: random selection of data rows as initial centroids.
#'
#'@references Sculley, D., 2010, April. Web-scale k-means clustering. In
#'Proceedings of the 19th international conference on World wide web (pp.
#'1177-1178). ACM.
#'
#'Arthur, D. and Vassilvitskii, S., 2007, January. k-means++: The advantages of
#'careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium
#'on Discrete algorithms (pp. 1027-1035). Society for Industrial and Applied
#'Mathematics.
#'
#'@examples
#'data = matrix(1:30,nrow = 10)
#'mini_batch(data, 2, 10, 10)
#'
#' @export
mini_batch <- function(data, clusters, batch_size, max_iters, num_init = 1L, init_fraction = 1.0, initializer = "kmeans++", compute_labels = TRUE, calc_wcss = FALSE, early_stop_iter = 10L, verbose = FALSE, CENTROIDS = NULL, tol = 1e-4) {
    .Call(`_mbkmeans_mini_batch`, data, clusters, batch_size, max_iters, num_init, init_fraction, initializer, compute_labels, calc_wcss, early_stop_iter, verbose, CENTROIDS, tol)
}

Try the mbkmeans package in your browser

Any scripts or data that you put into this service are public.

mbkmeans documentation built on Nov. 15, 2020, 2:07 a.m.