Description Datapreparation functions Parameterization functions Plotting functions Author(s) See Also Examples
Plots to evaluate the business performance of predictive models in R. A number of widely used plots to assess the quality of a predictive model from a business perspective can easily be created. Using these plots, it can be shown how implementation of the model will impact business targets like response on a campaign or return on investment. It's very easy to apply modelplotr to predictive models that are developed in caret, mlr, h2o or keras. For other models, even those built outside of R, an instruction is included. The modelplotr package provides three categories of important functions: datapreparation, plot parameterization and plotting.
The datapreparation functions are:
prepare_scores_and_ntiles
Function that builds a dataframe
that contains actuals and predictions on the target variable for each dataset in datasets
and each model in models
.
As inputs, it takes dataframes to score and model objects created with caret, mlr, h2o or keras.
Specifically for keras models, built with keras_model_sequential() or with the keras functional API, there is the
prepare_scores_and_ntiles_keras
function.
To use modelplotr on top of models created otherwise, even models built outside r, see aggregate_over_ntiles
plotting_scope
Function that creates a dataframe in the required format for all
modelplotr plots, relevant to the selected scope of evaluation. Each record in this dataframe represents
a unique combination of datasets, models, target classes and ntiles. As an input, plotting_scope can handle
both a dataframe created with aggregate_over_ntiles
as well as a dataframe created with
prepare_scores_and_ntiles
(or with prepare_scores_and_ntiles_keras
or created otherwise, with similar layout).
aggregate_over_ntiles
Function that aggregates the output of prepare_scores_and_ntiles
to create a dataframe with aggregated actuals and predictions. Each record in this dataframe represents
a unique combination of datasets, models, target classes and ntiles. In most cases, you do not need to use function
since the plotting_scope
function will call this function automatically.
Most parameterization functions are internal functions. However, one is available for customization:
customize_plot_text
Function that returns a list that contains all textual elements for
all plots that modelplotr can create. By changing the elements in this list - simply by overwriting values -
and then including this list with the custom_plot_text
parameter in plot functions, plot texts can easily be customized
to meet your (language) preferences
The plotting functions are:
plot_cumgains
Generates the cumulative gains plot. This plot, often referred to as the gains chart, helps answering the question: When we apply the model and select the best X ntiles, what percentage of the actual target class observations can we expect to target?
plot_cumlift
Generates the cumulative lift plot, often referred to as lift plot or index plot, helps you answer the question: When we apply the model and select the best X ntiles, how many times better is that than using no model at all?
plot_response
Generates the response plot. It plots the percentage of target class observations per ntile. It can be used to answer the following business question: When we apply the model and select ntile X, what is the expected percentage of target class observations in that ntile?
plot_cumresponse
Generates the cumulative response plot. It plots the cumulative percentage of target class observations up until that ntile. It helps answering the question: When we apply the model and select up until ntile X, what is the expected percentage of target class observations in the selection?
plot_multiplot
Generates a canvas with all four evaluation plots - cumulative gains, cumulative lift, response and cumulative response - combined on one canvas
plot_costsrevs
It plots the cumulative costs and revenues up until that ntile when the model is used for campaign selection. It can be used to answer the following business question: When we apply the model and select up until ntile X, what are the expected costs and revenues of the campaign?
plot_profit
Generates the Profit plot. It plots the cumulative profit up until that ntile when the model is used for campaign selection. It can be used to answer the following business question: When we apply the model and select up until ntile X, what is the expected profit of the campaign?
plot_roi
Generates the Return on Investment plot. It plots the cumulative revenues as a percentage of investments up until that ntile when the model is used for campaign selection. It can be used to answer the following business question: When we apply the model and select up until ntile X, what is the expected investment of the campaign?
Jurriaan Nagelkerke <jurriaan.nagelkerke@gmail.com> [aut, cre]
Pieter Marcus <pieter.marcus@persgroep.net> [aut]
https://github.com/modelplot/modelplotr for details on the package
https://modelplot.github.io/ for our blog posts on using modelplotr
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | ## Not run:
# load example data (Bank clients with/without a term deposit - see ?bank_td for details)
data("bank_td")
# prepare data for training model for binomial target has_td and train models
train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE)
train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
#train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles)
# setting caret cross validation, here tuned for speed (not accuracy!)
fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE)
# random forest using ranger package, here tuned for speed (not accuracy!)
rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl,
tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10))
# mnl model using glmnet package
mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl)
# load modelplotr
library(modelplotr)
# transform datasets and model objects to input for modelplotr
scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"),
dataset_labels = list("train data","test data"),
models = list("rf","mnl"),
model_labels = list("random forest","multinomial logit"),
target_column="has_td",
ntiles=100)
# set scope for analysis (default: no comparison)
plot_input <- plotting_scope(prepared_input = scores_and_ntiles)
head(plot_input)
# ALL PLOTS, with defaults
plot_cumgains(data=plot_input)
plot_cumlift(data=plot_input)
plot_response(data=plot_input)
plot_cumresponse(data=plot_input)
plot_multiplot(data=plot_input)
# financial plots - these need some financial parameters
plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
plot_profit(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
plot_roi(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
# CHANGING THE SCOPE OF ANALYSIS
# changing the scope - compare models:
plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope="compare_models")
plot_cumgains(data=plot_input)
# changing the scope - compare datasets:
plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope="compare_datasets")
plot_roi(data = plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
# changing the scope - compare target classes:
plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope="compare_targetclasses")
plot_response(data=plot_input)
# HIGHLIGHTING OPTIONS
plot_input <- plotting_scope(prepared_input = scores_and_ntiles,
scope = 'compare_datasets',select_model_label = 'random forest')
plot_cumgains(data=plot_input,highlight_ntile=20)
plot_cumlift(data=plot_input,highlight_ntile=20,highlight_how = 'plot')
plot_response(data=plot_input,highlight_ntile=20,highlight_how = 'text')
plot_cumresponse(data=plot_input,highlight_ntile=20,highlight_how = 'plot_text')
plot_costsrevs(data=plot_input,fixed_costs = 1000,variable_costs_per_unit = 10,
profit_per_unit = 50,highlight_ntile='max_roi')
plot_profit(data=plot_input,fixed_costs = 1500,variable_costs_per_unit = 10,profit_per_unit = 50)
plot_roi(data=plot_input,fixed_costs = 1500,variable_costs_per_unit = 10,profit_per_unit = 50)
# OTHER PLOT CUSTOMIZATIONS
# customize line colors
plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope = 'compare_models')
plot_cumgains(data=plot_input,custom_line_colors = c('pink','navyblue'))
# customize all textual elements of plots
plot_input <- plotting_scope(prepared_input = scores_and_ntiles)
mytexts <- customize_plot_text(plot_input = plot_input)
mytexts$cumresponse$plottitle <- 'Expected conversion rate for Campaign XYZ'
mytexts$cumresponse$plotsubtitle <- 'proposed selection: best 15 percentiles according to our model'
mytexts$cumresponse$y_axis_label <- '% Conversion'
mytexts$cumresponse$x_axis_label <- 'percentiles (percentile = 1% of customers)'
mytexts$cumresponse$annotationtext <-
"Selecting up until the &NTL percentile with model &MDL has an expected conversion rate of &VALUE"
plot_cumresponse(data=plot_input,custom_plot_text = mytexts,highlight_ntile = 15)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.