parKmeans: Parallel implementation of Kmeans
In HanjoStudy/quotidieR: Everyday functions for my personal use in analytics

Description Usage Arguments Value Examples

Parallel implementation of Kmeans to test for global minimum

1 2	parKmeans(dataset, seeds, centers, cores = 10, algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen")[1])

`dataset`	the dataset upon which the kmeans algoritm will be run
`seeds`	the number of seed you would like to test for a global min
`centers`	vector of k centers to try
`cores`	number of cores to run in parallel, default is 10

Data frame with each of the runs' metrics and seed number

data("sales")

# remove NAs
seeds <- 50
centers <- c(5:15)

best_fit <- parKmeans(sales, seeds = seeds,  centers = centers)
best_fit

# plot runs
load_pkg("ggplot2")

best_fit %>% select(clusters, withinss, seed) %>% mutate(seed = factor(seed)) %>%
  ggplot(aes(x = clusters, y = withinss, color = seed)) +
  geom_line(show.legend=FALSE) +
  geom_smooth(se = F, color = "black", size = 1, span = 0.6)  + 
  ylab(label="WithinSS") + 
  xlab("Clusters") + 
  theme_bw() +
  scale_colour_manual(values=c(rep("#8599bc", seeds)))