knitr::opts_chunk$set(echo = TRUE, warning = FALSE)
The package currently contains functions for plotting ROC curves, AUC bar charts, as well as gains, lift and response charts for communicating classification model performance.
install.packages("devtools") library(devtools) install_github("PeerCHristensen/autoMLviz")
We'll load some preprocessed sample data and create train, validation and test sets with the caret package before creating an H2OAutoML object, which will contain the performance metrics from multiple models.
library(tidyverse) library(caret) library(h2o) h2o.init() h2o.no_progress() df <- read_csv("https://raw.github.com/peerchristensen/churn_prediction/master/telecom_churn_prep.csv") %>% mutate_if(is.character,factor) index <- createDataPartition(df$Churn, p = 0.7, list = FALSE) train_data <- df[index, ] val_test_data <- df[-index, ] index2 <- createDataPartition(val_test_data$Churn, p = 0.5, list = FALSE) valid_data <- val_test_data[-index2, ] test_data <- val_test_data[index2, ] train_hf <- as.h2o(train_data) valid_hf <- as.h2o(valid_data) test_hf <- as.h2o(test_data) outcome <- "Churn" predictors <- setdiff(names(train_hf), outcome) autoML <- h2o.automl(x = predictors, y = outcome, training_frame = train_hf, validation_frame = valid_hf, leaderboard_frame = test_hf, balance_classes = TRUE, max_runtime_secs = 1000)
First we'll evaluate the performance of the trained models with ROC curves plotted using ggplot2.
By default, roc_curves will compare all models in the AutoML leaderboard. By setting best = TRUE
, only the best model will be shown. Setting save_png = TRUE
will save the plot as a picture. The function also returns the relevant data, in case you want to customise the plot. Note that the test_data
argument must be specified. Furthermore, you can control how many models you would like to include in your plots by setting the n_models
argument. The default number of models to return is five.
library(autoMLviz) roc_curves(autoML, test_data = test_hf)
auc_bars(autoML, test_data = test_hf)
Two functions producing gains, lift and response charts are included. Whereas lift4gains()
shows the performance of the best model, lift4gains2()
can be used to compare all models in the AutoML leaderboard.
To get a reference line in the response plot, you need to pass the proportion of target class observations to the response_ref
argument.
lift4gains(autoML, response_ref = .265)
Note that AutoML objects may return different numbers of quantiles for some models causing some of the lines to appear incomplete.
As gains, lift and response charts can sometimes be difficult to understand, the explain
argument can be set to TRUE
, which adds subtitles to each plot explaining how they should be interpreted.
lift4gains2(autoML, response_ref = .265, explain = TRUE)
With autoMLviz, you can also plot variable importannce for the best model. varImp_plot()
calls the standard h2o
plotting function, whereas varImp_ggplot()
creates the plot using ggplot and a theme that is consistent with the above figures. If the best model is an ensemble model, both functions will create two plots. One showing the model importance, and the second showing variable importance for the most important model within the ensemble.
varImp_plot(autoML)
varImp_ggplot(autoML)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.