knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(package1) library(matrixStats) library(Rcpp)
To use the function k_means_cluster:
x = rbind(matrix(rnorm(100, sd = 0.3), ncol = 2), matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) k = 2 k_means_cluster(x, k)
Note if the data matrix is in higher than 3-dimension, it will need more implementation to visualize. To includes plots if the input data matrix is in 2-dimensions:
output = k_means_cluster(x, k) plot(output$data, col = output$clusters, main = "plot of clustering on data", xlab = "output$data[,1]", ylab = "output$data[,2]")
Same dataset if choosing a larger nstart value (will sacrifice effiency for robustness):
output = k_means_cluster(x, k, nstart = 25) plot(output$data, col = output$clusters, main = "plot of clustering with large nstart on data", xlab = "output$data[,1]", ylab = "output$data[,2]")
Same dataset if choosing a smaller nstart value (will sacrifice robustness, not recommended):
output = k_means_cluster(x, k, nstart = 1) plot(output$data, col = output$clusters, main = "plot of clustering with small nstart on data", xlab = "output$data[,1]", ylab = "output$data[,2]")
Another example simulated data with 3-dimensions:
x = rbind(matrix(rnorm(150, sd = 0.3), ncol = 3), matrix(rnorm(150, mean = 1, sd = 0.3), ncol = 3)) k_means_cluster(x, k)
Another example simulated data with more than 2 clusters:
x = rbind(matrix(rnorm(100, sd = 0.3), ncol = 2), matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2), matrix(rnorm(100, mean = 2, sd = 0.3), ncol = 2)) k = 3 output = k_means_cluster(x, k) plot(output$data, col = output$clusters, main = "plot of clustering on data", xlab = "output$data[,1]", ylab = "output$data[,2]")
Same dataset if the best k is not chosen:
k = 2 output = k_means_cluster(x, k) plot(output$data, col = output$clusters, main = "plot of clustering on data", xlab = "output$data[,1]", ylab = "output$data[,2]")
Note that K-Means clustering depends on initial centroids, and the cluster numbers are assigned randomly. The best way to evaluate the package is compare output centroids:
c = kmeans(x, k)$centers c = c[order(c[,1]),] my_c = k_means_cluster(x, k)$centroids all(c - my_c < 1e-5) system.time(kmeans(x, k)) system.time(k_means_cluster(x, k)$centroids)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.