README.md

bisectingkmeans

This is a Sparklyr extension for bisectinkmeans algorithm in Spark MLlib

Note: Inorder to use impute you need to have spark 2.2.0

Installation:

library(devtools)
install_github("Yotabites/bisectingkmeans")

Usage:

Following code sample gives the usage of the library

Sys.setenv("SPARK_HOME"="$SPARK_HOME")
Sys.setenv("SPARK_HOME_VERSION"="2.2.0")
library(sparklyr)
library(dplyr)
library(bisectingkmeans)
sc <- spark_connect(master = "local", app_name = "sparklyr")
sc <- spark_connect(master = "yarn-client", app_name = "sparklyr")

Note data should have only float values for computation

sdf <- spark_read_csv(sc,path = "Data.csv", name = "SampleData")
count(sdf)
df<-impute(sdf,"mean")
bkm <- df  %>%  ml_bisectingkmeans(centers=5L)
print("The Compute Cost is  ",bkm$cost)
pred_df<-sdf_predict(bkm,df)%>%select(prediction)


Yotabites/bisectingkmeans documentation built on Dec. 22, 2017, 1:47 a.m.