prepare_scores_and_ntiles: Build a dataframe containing Actuals, Probabilities and...

Description Usage Arguments Value When you build scores_and_ntiles yourself See Also Examples

View source: R/dataprepmodelplots.R

Description

Build dataframe object that contains actuals and predictions on the target variable for each dataset in datasets and each model in models

Usage

1
2
3
4
5
6
7
8
prepare_scores_and_ntiles(
  datasets,
  dataset_labels,
  models,
  model_labels,
  target_column,
  ntiles = 10
)

Arguments

datasets

List of Strings. A list of the names of the dataframe objects to include in model evaluation. All dataframes need to contain a target variable and feature variables.

dataset_labels

List of Strings. A list of labels for the datasets, shown in plots. When dataset_labels is not specified, the names from datasets are used.

models

List of Strings. List of the names of the model objects, containing parameters to apply models to datasets. To use this function, model objects need to be generated by the mlr package or the caret package or the h20 package or the keras package. Modelplotr automatically detects whether the model is built using mlr or caret or h2o or keras.

model_labels

List of Strings. Labels for the models, shown in plots. When model_labels is not specified, the names from moddels are used.

target_column

String. Name of the target variable in datasets. Target can be either binary or multinomial. Continuous targets are not supported.

ntiles

Integer. Number of ntiles. The ntile parameter represents the specified number of equally sized buckets the observations in each dataset are grouped into. By default, observations are grouped in 10 equally sized buckets, often referred to as deciles.

Value

Dataframe. A dataframe is built, based on the datasets and models specified. It contains the dataset name, actuals on the target_column , the predicted probabilities for each target class (eg. unique target value) and attribution to ntiles in the dataset for each target class.

When you build scores_and_ntiles yourself

To make plots with modelplotr, is not required to use this function to generate input for function plotting_scope You can create your own dataframe containing actuals and predictions and ntiles, See build_input_yourself for an example to build the required input for plotting_scope or aggregate_over_ntiles yourself, within r or even outside of r.

See Also

modelplotr for generic info on the package moddelplotr

vignette('modelplotr')

plotting_scope for details on the function plotting_scope that transforms a dataframe created with prepare_scores_and_ntiles or aggregate_over_ntiles to a dataframe in the required format for all modelplotr plots.

aggregate_over_ntiles for details on the function aggregate_over_ntiles that aggregates the output of prepare_scores_and_ntiles to create a dataframe with aggregated actuals and predictions. In most cases, you do not need to use it since the plotting_scope function will call this function automatically.

https://github.com/modelplot/modelplotr for details on the package

https://modelplot.github.io/ for our blog on the value of the model plots

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
## Not run: 
# load example data (Bank clients with/without a term deposit - see ?bank_td for details)
data("bank_td")

# prepare data for training model for binomial target has_td and train models
train_index =  sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE)
train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]

#train models using mlr...
trainTask <- mlr::makeClassifTask(data = train, target = "has_td")
testTask <- mlr::makeClassifTask(data = test, target = "has_td")
mlr::configureMlr() # this line is needed when using mlr without loading it (mlr::)
task = mlr::makeClassifTask(data = train, target = "has_td")
lrn = mlr::makeLearner("classif.randomForest", predict.type = "prob")
rf = mlr::train(lrn, task)
lrn = mlr::makeLearner("classif.multinom", predict.type = "prob")
mnl = mlr::train(lrn, task)
#... or train models using caret...
# setting caret cross validation, here tuned for speed (not accuracy!)
fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE)
# random forest using ranger package, here tuned for speed (not accuracy!)
rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl,
                  tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10))
# mnl model using glmnet package
mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl)
#... or train models using h2o...
h2o::h2o.init()
h2o::h2o.no_progress()
h2o_train = h2o::as.h2o(train)
h2o_test = h2o::as.h2o(test)
gbm <- h2o::h2o.gbm(y = "has_td",
                          x = setdiff(colnames(train), "has_td"),
                          training_frame = h2o_train,
                          nfolds = 5)
#... or train models using keras.
x_train <- as.matrix(train[,-1]); y=train[,1]; y_train <- keras::to_categorical(as.numeric(y)-1);
`%>%` <- magrittr::`%>%`
nn <- keras::keras_model_sequential() %>%
keras::layer_dense(units = 16,kernel_initializer = "uniform",activation = 'relu',
                   input_shape = NCOL(x_train))%>%
  keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu') %>%
  keras::layer_dense(units = length(levels(train[,1])),activation='softmax')
nn %>% keras::compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=c('accuracy'))
nn %>% keras::fit(x_train,y_train,epochs = 20,batch_size = 1028,verbose=0)

# preparation steps
scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"),
                      dataset_labels = list("train data","test data"),
                      models = list("rf","mnl", "gbm","nn"),
                      model_labels = list("random forest","multinomial logit",
                                          "gradient boosting machine","artificial neural network"),
                      target_column="has_td")
plot_input <- plotting_scope(prepared_input = scores_and_ntiles)
plot_cumgains(data = plot_input)
plot_cumlift(data = plot_input)
plot_response(data = plot_input)
plot_cumresponse(data = plot_input)
plot_multiplot(data = plot_input)
plot_costsrevs(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
plot_profit(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)
plot_roi(data=plot_input,fixed_costs=1000,variable_costs_per_unit=10,profit_per_unit=50)

## End(Not run)

modelplotr documentation built on Oct. 23, 2020, 8:20 p.m.