parKmeans: Parallel implementation of Kmeans

Description Usage Arguments Value Examples

Description

Parallel implementation of Kmeans to test for global minimum

Usage

1
2
parKmeans(dataset, seeds, centers, cores = 10,
  algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen")[1])

Arguments

dataset

the dataset upon which the kmeans algoritm will be run

seeds

the number of seed you would like to test for a global min

centers

vector of k centers to try

cores

number of cores to run in parallel, default is 10

Value

Data frame with each of the runs' metrics and seed number

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
data("sales")

# remove NAs
seeds <- 50
centers <- c(5:15)

best_fit <- parKmeans(sales, seeds = seeds,  centers = centers)
best_fit

# plot runs
load_pkg("ggplot2")

best_fit %>% select(clusters, withinss, seed) %>% mutate(seed = factor(seed)) %>%
  ggplot(aes(x = clusters, y = withinss, color = seed)) +
  geom_line(show.legend=FALSE) +
  geom_smooth(se = F, color = "black", size = 1, span = 0.6)  + 
  ylab(label="WithinSS") + 
  xlab("Clusters") + 
  theme_bw() +
  scale_colour_manual(values=c(rep("#8599bc", seeds)))

HanjoStudy/quotidieR documentation built on May 5, 2019, 6:13 p.m.