README.md
In UBC-MDS/ssgkmeansr: An R package for k-means clustering

ssgkmeansr

An R package for k-means clustering.

Sophia Wang, Susan Fung, Guanchen Zhang

This is the repository for the R version of the ssgkmeansr package. The Python version is available here.

This package implements the classical unsupervised clustering method, k-means, for two-dimensional datasets with options for choosing the initial centroids (e.g. random and kmeans++). Users will be able to find clusters in their data, label new data, and observe the clustering results.

The package implements the following functions:

initial points selection:
basic k-means: initial centroids are picked randomly.
k-means++: initial centroids are picked based on distance. More details can be found here.
clustering: build clusters and save cluster attributes
prediction: predict the label of new data based on the cluster attributes
plotting: the package will provide plotting functions to visualize the results and performance

Outputs related to performance (within cluster sum of squared distance) is part of the output from clustering.

The package includes two datasets for testing and demonstration.

Run the following command in R:

devtools::install_github("UBC-MDS/ssgkmeansr")

library(ssgkmeansr)

# Generating training data
set.seed(46)
var <- .5
N <- 100
feature_one <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))
feature_two <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))

data_train<- data.frame(x1 = feature_one,
                        x2 = feature_two)

# Generating test data
set.seed(1)
var <- .1
feature_one <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))
feature_two <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))

data_test <- data.frame(x1 = feature_one,
                        x2 = feature_two)

# training
cluster <- fit(data = data_train, K = 3, method = "kmpp") # using kmeans++
kmplot(cluster$data)  # plot training results
cluster$centroids     # show centroids
cluster$withinSS      # show within cluster sum of squared distance

# predicting
result <- predict(data = data_test, centroids = cluster[[3]])
kmplot(dat = result)

Similar packages:

ssgkmeansr is intended to help understand the fundamentals of k-means and variants. Contributors are encouraged to build advanced features on top of this base k-means package.

UBC-MDS/ssgkmeansr documentation built on May 25, 2019, 1:36 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com