README.md

Build Status CRAN_Status_Badge Downloads

clusternor (clustering NUMA optimized routines library for clustering)

Repo contents

R bindings for Clustering NUMA optimized routines. This package is supported for Linux, Mac OSX and Windows.

NOTE: This is a package from C++ source that will compile using your gcc compiler.

Tested on

Hardware requirements

License

This software is licensed under the Apache version 2.0 license.

Best Performance configuration

For the best performance on Linux make sure the numa system package is installed via

apt-get install -y build-essential libnuma-dbg libnuma-dev libnuma1

R Dependencies

Stable builds

Install from CRAN directly. Installation time is normally ~2min.

install.packages("clusternor")

Bleeding edge install

Install directly from Github. This has dependency on the following system packages:

git clone --recursive https://github.com/flashxio/knorR.git
cd knorR
./install.sh

Mac: Install via brew install autoconf

Ubuntu: Install via apt-get install autoconf

NOTE: The command may require administrator privileges (i.e., sudo)

Docker

A Docker images with all dependencies installed can be obtained by:

docker pull flashxio/knorr-base

NOTE: The clusternor R package must still be installed on this image via: install.packages("clusternor")

If you prefer to build the image yourself, you can use this Dockerfile

Examples:

Work with data already in-memory

iris.mat <- as.matrix(iris[,1:4])
k <- length(unique(iris[, dim(iris)[2]])) # Number of unique classes
kms <- Kmeans(iris.mat, k)

Work with data from disk

To work with data from disk simply use binary row-major data. Please see this link for a detailed description.

fn <- "/path/to/file.bin" # Use real file
k <- 2 # The number of clusters
nrow <- 50 # The number of rows
ncol <- 5 # The number of columns
kms <-Kmeans(fn, nrow, ncol, k, init="kmeanspp", nthread=2)

Test data

We provide test data that is included as part of the package and can be accessed directly via this link or through the R interpreter after the package is required in R as clusternor::test_data.

Reproduction and Verification

require(clusternor)
kms <- Kmeans(test_data, test_centroids)

Expected output:

Runtime for this action should be nearly instantaneous on any machine:

> kms
$nrow
[1] 50

$ncol
[1] 5

$iters
[1] 5

$k
[1] 8

$centers
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 2.881889 4.079735 4.243061 1.953790 2.690649
[2,] 2.494522 2.334093 2.204031 4.161763 2.444349
[3,] 3.630086 2.398294 3.793616 2.404824 4.490043
[4,] 3.909759 3.991190 2.947161 3.762090 1.950588
[5,] 4.574327 3.645658 3.975175 4.505870 3.595890
[6,] 3.190091 4.267428 1.643788 3.229366 3.700539
[7,] 2.110254 3.147714 2.153235 1.581510 3.102312
[8,] 2.186852 2.027695 3.938736 1.410910 2.383727

$cluster
 [1] 3 2 3 3 6 8 8 3 3 2 3 4 7 7 5 4 2 1 2 1 2 7 7 5 1 1 8 7 5 2 6 2 4 6 6 8 2 5
[39] 7 4 6 5 6 4 7 4 5 4 2 5

$size
[1] 4 9 6 7 7 6 7 4

Help

Please refere to the docs provided:

?clusternor::Kmeans
?clusternor::Skmeans
?clusternor::KmeansPP
?clusternor::Hmeans
?clusternor::Xmeans
?clusternor::Gmeans
?clusternor::MiniBatchKmeans
?clusternor::FuzzyCMeans
?clusternor::Kmedoids


neurodata/knorR documentation built on May 25, 2019, 10:35 p.m.