An R wrapper for the PCARD Spark package. Which, does the folllowing: The algorithm performs Random Discretization and Principal Components Analysis to the input data, then joins the results and trains a decision tree on it.
library(sparkPCARD) library(dplyr) library(tidyr) sc <- spark_connect(master = "local") # Load the iris dataset copy_to(sc, iris, "iris", overwrite = TRUE) iris <- tbl(sc, "iris")
model <- iris %>% ml_pcard(10, 5, response = "Species", features = c("Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width"))
prediction <- predict(model, iris)
m.dt <- iris %>% ml_decision_tree(max.bins = 5, response = "Species", features = c("Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width")) p.dt <- predict(m.dt, iris) m.rf <- iris %>% ml_random_forest(max.bins = 5, num.trees = 10, response = "Species", features = c("Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width")) p.rf <- predict(m.rf, iris) results <- data.frame( Species = iris %>% select(Species) %>% collect(), PCARD = prediction, Decision.Tree = p.dt, Random.Forest = p.rf )
Mis-classification on Training Dataset:
results %>% gather(model, prediction, -Species) %>% mutate(incorrect = if_else(Species != prediction, 1, 0)) %>% group_by(Species, model) %>% summarise(incorrect = sum(incorrect)) %>% spread(model, incorrect) %>% as.data.frame()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.