Description Usage Arguments Value See Also Examples
View source: R/plottingmodelplots.R
Generates the cumulative lift plot, often referred to as lift plot or index plot, helps you answer the question: When we apply the model and select the best X ntiles, how many times better is that than using no model at all?
1 2 3 4 5 6 7 8 9 |
data |
Dataframe. Dataframe needs to be created with |
highlight_ntile |
Integer. Specifying the ntile at which the plot is annotated and/or performances are highlighted. |
highlight_how |
String. How to annotate the plot. Possible values: "plot_text","plot", "text". Default is "plot_text", both highlighting the ntile and value on the plot as well as in text below the plot. "plot" only highligths the plot, but does not add text below the plot explaining the plot at chosen ntile. "text" adds text below the plot explaining the plot at chosen ntile but does not highlight the plot. |
save_fig |
Logical. Save plot to file? Default = FALSE. When set to TRUE, saved plots are optimized for 18x12cm. |
save_fig_filename |
String. Filename of saved plot. Default the plot is saved as tempdir()/plotname.png. |
custom_line_colors |
Vector of Strings. Specifying colors for the lines in the plot. When not specified, colors from the RColorBrewer palet "Set1" are used. |
custom_plot_text |
List. List with customized textual elements for plot. Create a list with defaults
by using |
ggplot object. Lift plot.
modelplotr
for generic info on the package moddelplotr
plotting_scope
for details on the function plotting_scope
that
transforms a dataframe created with prepare_scores_and_ntiles
or aggregate_over_ntiles
to
a dataframe in the required format for all modelplotr plots.
aggregate_over_ntiles
for details on the function aggregate_over_ntiles
that
aggregates the output of prepare_scores_and_ntiles
to create a dataframe with aggregated actuals and predictions.
In most cases, you do not need to use it since the plotting_scope
function will call this function automatically.
https://github.com/modelplot/modelplotr for details on the package
https://modelplot.github.io/ for our blog on the value of the model plots
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # load example data (Bank clients with/without a term deposit - see ?bank_td for details)
data("bank_td")
# prepare data for training model for binomial target has_td and train models
train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE)
train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
#train models using caret... (or use mlr or H2o or keras ... see ?prepare_scores_and_ntiles)
# setting caret cross validation, here tuned for speed (not accuracy!)
fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE)
# random forest using ranger package, here tuned for speed (not accuracy!)
rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl,
tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10))
# mnl model using glmnet package
mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl)
# load modelplotr
library(modelplotr)
# transform datasets and model objects to input for modelplotr
scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"),
dataset_labels = list("train data","test data"),
models = list("rf","mnl"),
model_labels = list("random forest","multinomial logit"),
target_column="has_td",
ntiles=100)
plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope="compare_datasets")
plot_cumlift(data=plot_input)
plot_cumlift(data=plot_input,custom_line_colors=c("orange","purple"))
plot_cumlift(data=plot_input,highlight_ntile=2)
|
Package modelplotr loaded! Happy model plotting!
Loading required package: lattice
Loading required package: ggplot2
... scoring caret model "rf" on dataset "train".
... scoring caret model "mnl" on dataset "train".
... scoring caret model "rf" on dataset "test".
... scoring caret model "mnl" on dataset "test".
Data preparation step 1 succeeded! Dataframe created.
Warning message:
`select_()` is deprecated as of dplyr 0.7.0.
Please use `select()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
Data preparation step 2 succeeded! Dataframe created.
"prepared_input" aggregated...
Data preparation step 3 succeeded! Dataframe created.
Datasets "test data", "train data" compared for model "multinomial logit" and target value "term.deposit".
Warning message:
`group_by_()` is deprecated as of dplyr 0.7.0.
Please use `group_by()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
Plot annotation for plot: Cumulative lift
- When we select 2% with the highest probability according to model multinomial logit in test data, this selection for term.deposit cases is 6.9 times better than selecting without a model.
- When we select 2% with the highest probability according to model multinomial logit in train data, this selection for term.deposit cases is 7.7 times better than selecting without a model.
Warning message:
Vectorized input to `element_text()` is not officially supported.
Results may be unexpected or may change in future versions of ggplot2.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.