clara_clust: CLARA clustering

Description Usage Arguments Value Examples

View source: R/main.R

Description

With the help of TraMineR package, CLARA clustering provide a clustering of big dataset.
The main objective is to cluster state sequences with the "LCS" distance calculation method to find the best partition in N clusters.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
clara_clust(
  data,
  nb_sample = 100,
  size_sample = 40 + 2 * nb_cluster,
  nb_cluster = 4,
  distargs = list(method = "LCS"),
  plot = FALSE,
  find_best_method = "Distance",
  with.diss = TRUE,
  cores = detectCores() - 1
)

Arguments

data

The dataset to use. In case of sequences, use seqdef (from TraMineR package) to create such an object.

nb_sample

The number of subsets to test.

size_sample

The size of each subset

nb_cluster

The number of medoids

distargs

List with method parameters to apply. (See the function seqdist in TraMineR package)

plot

Boolean variable to plot the result of clustering

find_best_method

Method to select the best subset. "Distance" is for the mean distance and "DB" is for Davies-Bouldin value.

with.diss

Boolean if the distance matrix should be returned

cores

Number of cores to use for parallelism

Value

An object of class clara_seq

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#creating sequences
library(TraMineR)
data(mvad)
mvad.labels <- c("employment", "further education", "higher education","joblessness", "school", "training")
mvad.scode <- c("EM", "FE", "HE", "JL", "SC", "TR")
mvad.seq <- seqdef(mvad, 17:86, states = mvad.scode,abels = mvad.labels, xtstep = 6)

#CLARA Clustering
my_cluster <- clara_clust(mvad.seq,nb_cluster = 4, nb_sample = 10, size_sample = 20, with.diss = TRUE)

#CLARA Clustering with Davies-Bouldin Method
my_cluster <- clara_clust(mvad.seq,nb_cluster = 4, nb_sample = 10, size_sample = 20, with.diss = TRUE, find_best_method = "DB")

Ltochon/CLARA.seq documentation built on Dec. 17, 2021, 1:12 a.m.