Description Usage Arguments Value When you build input for plotting_scope() yourself See Also Examples
View source: R/dataprepmodelplots.R
Build a dataframe in the required format for all modelplotr plots, relevant to the selected scope of evaluation.
Each record in this dataframe represents a unique combination of datasets, models, target classes and ntiles.
As an input, plotting_scope can handle both a dataframe created with aggregate_over_ntiles
as well as a dataframe
created with prepare_scores_and_ntiles
(or created otherwise with similar layout).
There are four perspectives:
In this perspective, you're interested in the performance of one model on one dataset
for one target class. Therefore, only one line is plotted in the plots.
The parameters select_model_label
, select_dataset_label
and select_targetclass
determine which group is
plotted. When not specified, the first alphabetic model, the first alphabetic dataset and
the smallest (when select_smallest_targetclass=TRUE
) or first alphabetic target value are selected
In this perspective, you're interested in how well different models perform in comparison to
each other on the same dataset and for the same target value. This results in a comparison between models available
in ntiles_aggregate$model_label for a selected dataset (default: first alphabetic dataset) and for a selected target value
(default: smallest (when select_smallest_targetclass=TRUE
) or first alphabetic target value).
In this perspective, you're interested in how well a model performs in different datasets
for a specific model on the same target value. This results in a comparison between datasets available in
ntiles_aggregate$dataset_label for a selected model (default: first alphabetic model) and for a selected target value (default:
smallest (when select_smallest_targetclass=TRUE
) or first alphabetic target value).
In this perspective, you're interested in how well a model performs for different target values on a specific dataset.This resuls in a comparison between target classes available in ntiles_aggregate$target_class for a selected model (default: first alphabetic model) and for a selected dataset (default: first alphabetic dataset).
1 2 3 4 5 6 7 8 | plotting_scope(
prepared_input,
scope = "no_comparison",
select_model_label = NA,
select_dataset_label = NA,
select_targetclass = NA,
select_smallest_targetclass = TRUE
)
|
prepared_input |
Dataframe. Dataframe created with |
scope |
String. Evaluation type of interest. Possible values: "compare_models","compare_datasets", "compare_targetclasses","no_comparison". Default is NA, equivalent to "no_comparison". |
select_model_label |
String. Selected model when scope is "compare_datasets" or "compare_targetclasses" or "no_comparison". Needs to be identical to model descriptions as specified in model_labels (or models when model_labels is not specified). When scope is "compare_models", select_model_label can be used to take a subset of available models. |
select_dataset_label |
String. Selected dataset when scope is compare_models or compare_targetclasses or no_comparison. Needs to be identical to dataset descriptions as specified in dataset_labels (or datasets when dataset_labels is not specified). When scope is "compare_datasets", select_dataset_label can be used to take a subset of available datasets. |
select_targetclass |
String. Selected target value when scope is compare_models or compare_datasets or no_comparison. Default is smallest value when select_smallest_targetclass=TRUE, otherwise first alphabetical value. When scope is "compare_targetclasses", select_targetclass can be used to take a subset of available target classes. |
select_smallest_targetclass |
Boolean. Select the target value with the smallest number of cases in dataset as group of interest. Default is True, hence the target value with the least observations is selected. |
Dataframe plot_input
is a subset of ntiles_aggregate
.
To make plots with modelplotr, is not required to use the function prepare_scores_and_ntiles to generate the required input data. You can create your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , In that case, make sure the input dataframe contains the folowing columns & formats:
column | type | definition |
model_label | Factor | Name of the model object |
dataset_label | Factor | Datasets to include in the plot as factor levels |
y_true | Factor | Target with actual values |
prob_[tv1] | Decimal | Probability according to model for target value 1 |
prob_[tv2] | Decimal | Probability according to model for target value 2 |
... | ... | ... |
prob_[tvn] | Decimal | Probability according to model for target value n |
ntl_[tv1] | Integer | Ntile based on probability according to model for target value 1 |
ntl_[tv2] | Integerl | Ntile based on probability according to model for target value 2 |
... | ... | ... |
ntl_[tvn] | Integer | Ntile based on probability according to model for target value n |
See build_input_yourself for an example to build the required input yourself.
modelplotr
for generic info on the package moddelplotr
aggregate_over_ntiles
for details on the function aggregate_over_ntiles
that
generates the required input.
prepare_scores_and_ntiles
for details on the function prepare_scores_and_ntiles
that generates the required input.
build_input_yourself
for an example to build the required input yourself.
filters the output of aggregate_over_ntiles
to prepare it for the required evaluation.
https://github.com/modelplot/modelplotr for details on the package
https://modelplot.github.io/ for our blog on the value of the model plots
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | ## Not run:
# load example data (Bank clients with/without a term deposit - see ?bank_td for details)
data("bank_td")
# prepare data for training model for binomial target has_td and train models
train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE)
train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
#train models using mlr...
trainTask <- mlr::makeClassifTask(data = train, target = "has_td")
testTask <- mlr::makeClassifTask(data = test, target = "has_td")
mlr::configureMlr() # this line is needed when using mlr without loading it (mlr::)
task = mlr::makeClassifTask(data = train, target = "has_td")
lrn = mlr::makeLearner("classif.randomForest", predict.type = "prob")
rf = mlr::train(lrn, task)
lrn = mlr::makeLearner("classif.multinom", predict.type = "prob")
mnl = mlr::train(lrn, task)
#... or train models using caret...
# setting caret cross validation, here tuned for speed (not accuracy!)
fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE)
# random forest using ranger package, here tuned for speed (not accuracy!)
rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl,
tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10))
# mnl model using glmnet package
mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl)
#... or train models using h2o...
h2o::h2o.init()
h2o::h2o.no_progress()
h2o_train = h2o::as.h2o(train)
h2o_test = h2o::as.h2o(test)
gbm <- h2o::h2o.gbm(y = "has_td",
x = setdiff(colnames(train), "has_td"),
training_frame = h2o_train,
nfolds = 5)
#... or train models using keras.
x_train <- as.matrix(train[,-1]); y=train[,1]; y_train <- keras::to_categorical(as.numeric(y)-1)
`%>%` <- magrittr::`%>%`
nn <- keras::keras_model_sequential() %>%
keras::layer_dense(units = 16,kernel_initializer = "uniform",activation = 'relu',
input_shape = NCOL(x_train))%>%
keras::layer_dense(units=16,kernel_initializer="uniform",activation='relu') %>%
keras::layer_dense(units=length(levels(train[,1])),activation='softmax')
nn %>% keras::compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=c('accuracy'))
nn %>% keras::fit(x_train,y_train,epochs = 20,batch_size = 1028,verbose=0)
# preparation steps
scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"),
dataset_labels = list("train data","test data"),
models = list("rf","mnl", "gbm","nn"),
model_labels = list("random forest","multinomial logit",
"gradient boosting machine","artificial neural network"),
target_column="has_td")
plot_input <- plotting_scope(prepared_input = scores_and_ntiles)
plot_cumgains(data = plot_input)
plot_cumlift(data = plot_input)
plot_response(data = plot_input)
plot_cumresponse(data = plot_input)
plot_multiplot(data = plot_input)
plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
plot_profit(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
plot_roi(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.