clara_clust: CLARA clustering
In Ltochon/CLARA.seq: CLARA Clustering

Description Usage Arguments Value Examples

View source: R/main.R

With the help of TraMineR package, CLARA clustering provide a clustering of big dataset.
The main objective is to cluster state sequences with the "LCS" distance calculation method to find the best partition in N clusters.

clara_clust(
  data,
  nb_sample = 100,
  size_sample = 40 + 2 * nb_cluster,
  nb_cluster = 4,
  distargs = list(method = "LCS"),
  plot = FALSE,
  find_best_method = "Distance",
  with.diss = TRUE,
  cores = detectCores() - 1
)

`data`	The dataset to use. In case of sequences, use seqdef (from TraMineR package) to create such an object.
`nb_sample`	The number of subsets to test.
`size_sample`	The size of each subset
`nb_cluster`	The number of medoids
`distargs`	List with method parameters to apply. (See the function seqdist in TraMineR package)
`plot`	Boolean variable to plot the result of clustering
`find_best_method`	Method to select the best subset. "Distance" is for the mean distance and "DB" is for Davies-Bouldin value.
`with.diss`	Boolean if the distance matrix should be returned
`cores`	Number of cores to use for parallelism

An object of class clara_seq

#creating sequences
library(TraMineR)
data(mvad)
mvad.labels <- c("employment", "further education", "higher education","joblessness", "school", "training")
mvad.scode <- c("EM", "FE", "HE", "JL", "SC", "TR")
mvad.seq <- seqdef(mvad, 17:86, states = mvad.scode,abels = mvad.labels, xtstep = 6)

#CLARA Clustering
my_cluster <- clara_clust(mvad.seq,nb_cluster = 4, nb_sample = 10, size_sample = 20, with.diss = TRUE)

#CLARA Clustering with Davies-Bouldin Method
my_cluster <- clara_clust(mvad.seq,nb_cluster = 4, nb_sample = 10, size_sample = 20, with.diss = TRUE, find_best_method = "DB")