README.md

Build Status

ssgkmeansr

An R package for k-means clustering.

Contributors

Sophia Wang, Susan Fung, Guanchen Zhang

Description

This is the repository for the R version of the ssgkmeansr package. The Python version is available here.

This package implements the classical unsupervised clustering method, k-means, for two-dimensional datasets with options for choosing the initial centroids (e.g. random and kmeans++). Users will be able to find clusters in their data, label new data, and observe the clustering results.

Functions

The package implements the following functions:

Outputs related to performance (within cluster sum of squared distance) is part of the output from clustering.

The package includes two datasets for testing and demonstration.

Installing the Package

Run the following command in R:

devtools::install_github("UBC-MDS/ssgkmeansr")

Examples

library(ssgkmeansr)

# Generating training data
set.seed(46)
var <- .5
N <- 100
feature_one <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))
feature_two <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))

data_train<- data.frame(x1 = feature_one,
                        x2 = feature_two)

# Generating test data
set.seed(1)
var <- .1
feature_one <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))
feature_two <- c(rnorm(N,-1, var),rnorm(N,0, var),rnorm(N,1, var))

data_test <- data.frame(x1 = feature_one,
                        x2 = feature_two)

# training
cluster <- fit(data = data_train, K = 3, method = "kmpp") # using kmeans++
kmplot(cluster$data)  # plot training results
cluster$centroids     # show centroids
cluster$withinSS      # show within cluster sum of squared distance

# predicting
result <- predict(data = data_test, centroids = cluster[[3]])
kmplot(dat = result)

Ecosystem

Similar packages:

ssgkmeansr is intended to help understand the fundamentals of k-means and variants. Contributors are encouraged to build advanced features on top of this base k-means package.



UBC-MDS/ssgkmeansr documentation built on May 25, 2019, 1:36 p.m.