plotting_scope: Build dataframe with formatted input for all plots.

Description Usage Arguments Value When you build input for plotting_scope() yourself See Also Examples

View source: R/dataprepmodelplots.R

Description

Build a dataframe in the required format for all modelplotr plots, relevant to the selected scope of evaluation. Each record in this dataframe represents a unique combination of datasets, models, target classes and ntiles. As an input, plotting_scope can handle both a dataframe created with aggregate_over_ntiles as well as a dataframe created with prepare_scores_and_ntiles (or created otherwise with similar layout). There are four perspectives:

"no_comparison" (default)

In this perspective, you're interested in the performance of one model on one dataset for one target class. Therefore, only one line is plotted in the plots. The parameters select_model_label, select_dataset_label and select_targetclass determine which group is plotted. When not specified, the first alphabetic model, the first alphabetic dataset and the smallest (when select_smallest_targetclass=TRUE) or first alphabetic target value are selected

"compare_models"

In this perspective, you're interested in how well different models perform in comparison to each other on the same dataset and for the same target value. This results in a comparison between models available in ntiles_aggregate$model_label for a selected dataset (default: first alphabetic dataset) and for a selected target value (default: smallest (when select_smallest_targetclass=TRUE) or first alphabetic target value).

"compare_datasets"

In this perspective, you're interested in how well a model performs in different datasets for a specific model on the same target value. This results in a comparison between datasets available in ntiles_aggregate$dataset_label for a selected model (default: first alphabetic model) and for a selected target value (default: smallest (when select_smallest_targetclass=TRUE) or first alphabetic target value).

"compare_targetclasses"

In this perspective, you're interested in how well a model performs for different target values on a specific dataset.This resuls in a comparison between target classes available in ntiles_aggregate$target_class for a selected model (default: first alphabetic model) and for a selected dataset (default: first alphabetic dataset).

Usage

1
2
3
4
5
6
7
8
plotting_scope(
  prepared_input,
  scope = "no_comparison",
  select_model_label = NA,
  select_dataset_label = NA,
  select_targetclass = NA,
  select_smallest_targetclass = TRUE
)

Arguments

prepared_input

Dataframe. Dataframe created with prepare_scores_and_ntiles or dataframe created with aggregate_over_ntiles or a dataframe that is created otherwise with similar layout as the output of these functions (see ?prepare_scores_and_ntiles and ?aggregate_over_ntiles for layout details).

scope

String. Evaluation type of interest. Possible values: "compare_models","compare_datasets", "compare_targetclasses","no_comparison". Default is NA, equivalent to "no_comparison".

select_model_label

String. Selected model when scope is "compare_datasets" or "compare_targetclasses" or "no_comparison". Needs to be identical to model descriptions as specified in model_labels (or models when model_labels is not specified). When scope is "compare_models", select_model_label can be used to take a subset of available models.

select_dataset_label

String. Selected dataset when scope is compare_models or compare_targetclasses or no_comparison. Needs to be identical to dataset descriptions as specified in dataset_labels (or datasets when dataset_labels is not specified). When scope is "compare_datasets", select_dataset_label can be used to take a subset of available datasets.

select_targetclass

String. Selected target value when scope is compare_models or compare_datasets or no_comparison. Default is smallest value when select_smallest_targetclass=TRUE, otherwise first alphabetical value. When scope is "compare_targetclasses", select_targetclass can be used to take a subset of available target classes.

select_smallest_targetclass

Boolean. Select the target value with the smallest number of cases in dataset as group of interest. Default is True, hence the target value with the least observations is selected.

Value

Dataframe plot_input is a subset of ntiles_aggregate.

When you build input for plotting_scope() yourself

To make plots with modelplotr, is not required to use the function prepare_scores_and_ntiles to generate the required input data. You can create your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , In that case, make sure the input dataframe contains the folowing columns & formats:

column type definition
model_label Factor Name of the model object
dataset_label Factor Datasets to include in the plot as factor levels
y_true Factor Target with actual values
prob_[tv1] Decimal Probability according to model for target value 1
prob_[tv2] Decimal Probability according to model for target value 2
... ... ...
prob_[tvn] Decimal Probability according to model for target value n
ntl_[tv1] Integer Ntile based on probability according to model for target value 1
ntl_[tv2] Integerl Ntile based on probability according to model for target value 2
... ... ...
ntl_[tvn] Integer Ntile based on probability according to model for target value n

See build_input_yourself for an example to build the required input yourself.

See Also

modelplotr for generic info on the package moddelplotr

vignette('modelplotr')

aggregate_over_ntiles for details on the function aggregate_over_ntiles that generates the required input.

prepare_scores_and_ntiles for details on the function prepare_scores_and_ntiles that generates the required input.

build_input_yourself for an example to build the required input yourself. filters the output of aggregate_over_ntiles to prepare it for the required evaluation.

https://github.com/modelplot/modelplotr for details on the package

https://modelplot.github.io/ for our blog on the value of the model plots

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
## Not run: 
# load example data (Bank clients with/without a term deposit - see ?bank_td for details)
data("bank_td")

# prepare data for training model for binomial target has_td and train models
train_index =  sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE)
train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]

#train models using mlr...
trainTask <- mlr::makeClassifTask(data = train, target = "has_td")
testTask <- mlr::makeClassifTask(data = test, target = "has_td")
mlr::configureMlr() # this line is needed when using mlr without loading it (mlr::)
task = mlr::makeClassifTask(data = train, target = "has_td")
lrn = mlr::makeLearner("classif.randomForest", predict.type = "prob")
rf = mlr::train(lrn, task)
lrn = mlr::makeLearner("classif.multinom", predict.type = "prob")
mnl = mlr::train(lrn, task)
#... or train models using caret...
# setting caret cross validation, here tuned for speed (not accuracy!)
fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE)
# random forest using ranger package, here tuned for speed (not accuracy!)
rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl,
                  tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10))
# mnl model using glmnet package
mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl)
#... or train models using h2o...
h2o::h2o.init()
h2o::h2o.no_progress()
h2o_train = h2o::as.h2o(train)
h2o_test = h2o::as.h2o(test)
gbm <- h2o::h2o.gbm(y = "has_td",
                          x = setdiff(colnames(train), "has_td"),
                          training_frame = h2o_train,
                          nfolds = 5)
#... or train models using keras.
x_train <- as.matrix(train[,-1]); y=train[,1]; y_train <- keras::to_categorical(as.numeric(y)-1)
`%>%` <- magrittr::`%>%`
nn <- keras::keras_model_sequential() %>%
keras::layer_dense(units = 16,kernel_initializer = "uniform",activation = 'relu',
                   input_shape = NCOL(x_train))%>%
  keras::layer_dense(units=16,kernel_initializer="uniform",activation='relu') %>%
  keras::layer_dense(units=length(levels(train[,1])),activation='softmax')
nn %>% keras::compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=c('accuracy'))
nn %>% keras::fit(x_train,y_train,epochs = 20,batch_size = 1028,verbose=0)

# preparation steps
scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"),
                      dataset_labels = list("train data","test data"),
                      models = list("rf","mnl", "gbm","nn"),
                      model_labels = list("random forest","multinomial logit",
                                          "gradient boosting machine","artificial neural network"),
                      target_column="has_td")
plot_input <- plotting_scope(prepared_input = scores_and_ntiles)
plot_cumgains(data = plot_input)
plot_cumlift(data = plot_input)
plot_response(data = plot_input)
plot_cumresponse(data = plot_input)
plot_multiplot(data = plot_input)
plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
plot_profit(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
plot_roi(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)

## End(Not run)

modelplotr documentation built on Oct. 23, 2020, 8:20 p.m.