Clara_Medoids: Clustering large applications
In ClusterR: Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans, K-Medoids and Affinity Propagation Clustering

Clara_Medoids

R Documentation

Clustering large applications

Description

Clustering large applications

Usage

Clara_Medoids(
  data,
  clusters,
  samples,
  sample_size,
  distance_metric = "euclidean",
  minkowski_p = 1,
  threads = 1,
  swap_phase = TRUE,
  fuzzy = FALSE,
  verbose = FALSE,
  seed = 1
)

Arguments

`data`	matrix or data frame
`clusters`	the number of clusters
`samples`	number of samples to draw from the data set
`sample_size`	fraction of data to draw in each sample iteration. It should be a float number greater than 0.0 and less or equal to 1.0
`distance_metric`	a string specifying the distance method. One of, euclidean, manhattan, chebyshev, canberra, braycurtis, pearson_correlation, simple_matching_coefficient, minkowski, hamming, jaccard_coefficient, Rao_coefficient, mahalanobis, cosine
`minkowski_p`	a numeric value specifying the minkowski parameter in case that distance_metric = "minkowski"
`threads`	an integer specifying the number of cores to run in parallel. Openmp will be utilized to parallelize the number of the different sample draws
`swap_phase`	either TRUE or FALSE. If TRUE then both phases ('build' and 'swap') will take place. The 'swap_phase' is considered more computationally intensive.
`fuzzy`	either TRUE or FALSE. If TRUE, then probabilities for each cluster will be returned based on the distance between observations and medoids
`verbose`	either TRUE or FALSE, indicating whether progress is printed during clustering
`seed`	integer value for random number generator (RNG)

Details

The Clara_Medoids function is implemented in the same way as the 'clara' (clustering large applications) algorithm (Kaufman and Rousseeuw(1990)). In the 'Clara_Medoids' the 'Cluster_Medoids' function will be applied to each sample draw.

Value

a list with the following attributes : medoids, medoid_indices, sample_indices, best_dissimilarity, clusters, fuzzy_probs (if fuzzy = TRUE), clustering_stats, dissimilarity_matrix, silhouette_matrix

Author(s)

Lampros Mouselimis

References

Anja Struyf, Mia Hubert, Peter J. Rousseeuw, (Feb. 1997), Clustering in an Object-Oriented Environment, Journal of Statistical Software, Vol 1, Issue 4

Examples


data(dietary_survey_IBS)

dat = dietary_survey_IBS[, -ncol(dietary_survey_IBS)]

dat = center_scale(dat)

clm = Clara_Medoids(dat, clusters = 3, samples = 5, sample_size = 0.2, swap_phase = TRUE)

ClusterR documentation built on Nov. 5, 2025, 6:51 p.m.