Build Status


An R package for k-means clustering.


Sophia Wang, Susan Fung, Guanchen Zhang


This is the repository for the R version of the ssgkmeansr package. The Python version is available here.

This package implements the classical unsupervised clustering method, k-means, for two-dimensional datasets with options for choosing the initial centroids (e.g. random and kmeans++). Users will be able to find clusters in their data, label new data, and observe the clustering results.


The package implements the following functions:

Outputs related to performance (within cluster sum of squared distance) is part of the output from clustering.

The package includes two datasets for testing and demonstration.

Installing the Package

Run the following command in R:




# Generating training data
var <- .5
N <- 100
feature_one <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))
feature_two <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))

data_train<- data.frame(x1 = feature_one,
                        x2 = feature_two)

# Generating test data
var <- .1
feature_one <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))
feature_two <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))

data_test <- data.frame(x1 = feature_one,
                        x2 = feature_two)

# training
cluster <- fit(data = data_train, K = 3, method = "kmpp") # using kmeans++
kmplot(cluster$data)  # plot training results
cluster$centroids     # show centroids
cluster$withinSS      # show within cluster sum of squared distance

# predicting
result <- predict(data = data_test, centroids = cluster[[3]])
kmplot(dat = result)


Similar packages:

ssgkmeansr is intended to help understand the fundamentals of k-means and variants. Contributors are encouraged to build advanced features on top of this base k-means package.

UBC-MDS/ssgkmeansr documentation built on May 25, 2019, 1:36 p.m.