An R package for k-means clustering.
Sophia Wang, Susan Fung, Guanchen Zhang
This is the repository for the R version of the ssgkmeansr
package. The Python version is available here.
This package implements the classical unsupervised clustering method, k-means, for two-dimensional datasets with options for choosing the initial centroids (e.g. random and kmeans++). Users will be able to find clusters in their data, label new data, and observe the clustering results.
The package implements the following functions:
Outputs related to performance (within cluster sum of squared distance) is part of the output from clustering.
The package includes two datasets for testing and demonstration.
Run the following command in R:
devtools::install_github("UBC-MDS/ssgkmeansr")
library(ssgkmeansr)
# Generating training data
set.seed(46)
var <- .5
N <- 100
feature_one <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))
feature_two <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))
data_train<- data.frame(x1 = feature_one,
x2 = feature_two)
# Generating test data
set.seed(1)
var <- .1
feature_one <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))
feature_two <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))
data_test <- data.frame(x1 = feature_one,
x2 = feature_two)
# training
cluster <- fit(data = data_train, K = 3, method = "kmpp") # using kmeans++
kmplot(cluster$data) # plot training results
cluster$centroids # show centroids
cluster$withinSS # show within cluster sum of squared distance
# predicting
result <- predict(data = data_test, centroids = cluster[[3]])
kmplot(dat = result)
Similar packages:
ssgkmeansr
is intended to help understand the fundamentals of k-means and variants. Contributors are encouraged to build advanced features on top of this base k-means package.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.