Description Usage Arguments Value When you build input for plotting_scope() yourself See Also Examples

View source: R/dataprepmodelplots.R

Build a dataframe in the required format for all modelplotr plots, relevant to the selected scope of evaluation.
Each record in this dataframe represents a unique combination of datasets, models, target classes and ntiles.
As an input, plotting_scope can handle both a dataframe created with `aggregate_over_ntiles`

as well as a dataframe
created with `prepare_scores_and_ntiles`

(or created otherwise with similar layout).
There are four perspectives:

- "no_comparison" (default)
In this perspective, you're interested in the performance of one model on one dataset for one target class. Therefore, only one line is plotted in the plots. The parameters

`select_model_label`

,`select_dataset_label`

and`select_targetclass`

determine which group is plotted. When not specified, the first alphabetic model, the first alphabetic dataset and the smallest (when`select_smallest_targetclass=TRUE`

) or first alphabetic target value are selected- "compare_models"
In this perspective, you're interested in how well different models perform in comparison to each other on the same dataset and for the same target value. This results in a comparison between models available in ntiles_aggregate$model_label for a selected dataset (default: first alphabetic dataset) and for a selected target value (default: smallest (when

`select_smallest_targetclass=TRUE`

) or first alphabetic target value).- "compare_datasets"
In this perspective, you're interested in how well a model performs in different datasets for a specific model on the same target value. This results in a comparison between datasets available in ntiles_aggregate$dataset_label for a selected model (default: first alphabetic model) and for a selected target value (default: smallest (when

`select_smallest_targetclass=TRUE`

) or first alphabetic target value).- "compare_targetclasses"
In this perspective, you're interested in how well a model performs for different target values on a specific dataset.This resuls in a comparison between target classes available in ntiles_aggregate$target_class for a selected model (default: first alphabetic model) and for a selected dataset (default: first alphabetic dataset).

1 2 3 4 5 6 7 8 | ```
plotting_scope(
prepared_input,
scope = "no_comparison",
select_model_label = NA,
select_dataset_label = NA,
select_targetclass = NA,
select_smallest_targetclass = TRUE
)
``` |

`prepared_input` |
Dataframe. Dataframe created with |

`scope` |
String. Evaluation type of interest. Possible values: "compare_models","compare_datasets", "compare_targetclasses","no_comparison". Default is NA, equivalent to "no_comparison". |

`select_model_label` |
String. Selected model when scope is "compare_datasets" or "compare_targetclasses" or "no_comparison". Needs to be identical to model descriptions as specified in model_labels (or models when model_labels is not specified). When scope is "compare_models", select_model_label can be used to take a subset of available models. |

`select_dataset_label` |
String. Selected dataset when scope is compare_models or compare_targetclasses or no_comparison. Needs to be identical to dataset descriptions as specified in dataset_labels (or datasets when dataset_labels is not specified). When scope is "compare_datasets", select_dataset_label can be used to take a subset of available datasets. |

`select_targetclass` |
String. Selected target value when scope is compare_models or compare_datasets or no_comparison. Default is smallest value when select_smallest_targetclass=TRUE, otherwise first alphabetical value. When scope is "compare_targetclasses", select_targetclass can be used to take a subset of available target classes. |

`select_smallest_targetclass` |
Boolean. Select the target value with the smallest number of cases in dataset as group of interest. Default is True, hence the target value with the least observations is selected. |

Dataframe `plot_input`

is a subset of `ntiles_aggregate`

.

To make plots with modelplotr, is not required to use the function prepare_scores_and_ntiles to generate the required input data. You can create your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , In that case, make sure the input dataframe contains the folowing columns & formats:

column | type | definition |

model_label | Factor | Name of the model object |

dataset_label | Factor | Datasets to include in the plot as factor levels |

y_true | Factor | Target with actual values |

prob_[tv1] | Decimal | Probability according to model for target value 1 |

prob_[tv2] | Decimal | Probability according to model for target value 2 |

... | ... | ... |

prob_[tvn] | Decimal | Probability according to model for target value n |

ntl_[tv1] | Integer | Ntile based on probability according to model for target value 1 |

ntl_[tv2] | Integerl | Ntile based on probability according to model for target value 2 |

... | ... | ... |

ntl_[tvn] | Integer | Ntile based on probability according to model for target value n |

See build_input_yourself for an example to build the required input yourself.

`modelplotr`

for generic info on the package `moddelplotr`

`aggregate_over_ntiles`

for details on the function `aggregate_over_ntiles`

that
generates the required input.

`prepare_scores_and_ntiles`

for details on the function `prepare_scores_and_ntiles`

that generates the required input.

`build_input_yourself`

for an example to build the required input yourself.
filters the output of `aggregate_over_ntiles`

to prepare it for the required evaluation.

https://github.com/modelplot/modelplotr for details on the package

https://modelplot.github.io/ for our blog on the value of the model plots

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | ```
## Not run:
# load example data (Bank clients with/without a term deposit - see ?bank_td for details)
data("bank_td")
# prepare data for training model for binomial target has_td and train models
train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE)
train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
#train models using mlr...
trainTask <- mlr::makeClassifTask(data = train, target = "has_td")
testTask <- mlr::makeClassifTask(data = test, target = "has_td")
mlr::configureMlr() # this line is needed when using mlr without loading it (mlr::)
task = mlr::makeClassifTask(data = train, target = "has_td")
lrn = mlr::makeLearner("classif.randomForest", predict.type = "prob")
rf = mlr::train(lrn, task)
lrn = mlr::makeLearner("classif.multinom", predict.type = "prob")
mnl = mlr::train(lrn, task)
#... or train models using caret...
# setting caret cross validation, here tuned for speed (not accuracy!)
fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE)
# random forest using ranger package, here tuned for speed (not accuracy!)
rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl,
tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10))
# mnl model using glmnet package
mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl)
#... or train models using h2o...
h2o::h2o.init()
h2o::h2o.no_progress()
h2o_train = h2o::as.h2o(train)
h2o_test = h2o::as.h2o(test)
gbm <- h2o::h2o.gbm(y = "has_td",
x = setdiff(colnames(train), "has_td"),
training_frame = h2o_train,
nfolds = 5)
#... or train models using keras.
x_train <- as.matrix(train[,-1]); y=train[,1]; y_train <- keras::to_categorical(as.numeric(y)-1)
`%>%` <- magrittr::`%>%`
nn <- keras::keras_model_sequential() %>%
keras::layer_dense(units = 16,kernel_initializer = "uniform",activation = 'relu',
input_shape = NCOL(x_train))%>%
keras::layer_dense(units=16,kernel_initializer="uniform",activation='relu') %>%
keras::layer_dense(units=length(levels(train[,1])),activation='softmax')
nn %>% keras::compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=c('accuracy'))
nn %>% keras::fit(x_train,y_train,epochs = 20,batch_size = 1028,verbose=0)
# preparation steps
scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"),
dataset_labels = list("train data","test data"),
models = list("rf","mnl", "gbm","nn"),
model_labels = list("random forest","multinomial logit",
"gradient boosting machine","artificial neural network"),
target_column="has_td")
plot_input <- plotting_scope(prepared_input = scores_and_ntiles)
plot_cumgains(data = plot_input)
plot_cumlift(data = plot_input)
plot_response(data = plot_input)
plot_cumresponse(data = plot_input)
plot_multiplot(data = plot_input)
plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
plot_profit(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
plot_roi(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.