aggregate_over_ntiles: Build a dataframe with aggregated evaluation measures

Description Usage Arguments Value When you build input for aggregate_over_ntiles() yourself See Also Examples

View source: R/dataprepmodelplots.R

Description

Build a dataframe with aggregated actuals and predictions. Records in this dataframe represent the unique combinations of models [m], datasets [d], targetvalues [t] and ntiles [n]. The size of this dataframe therefore is (m*d*t*n) rows and 23 columns.

In most cases, you do not need to use function since the plotting_scope function will call this function automatically.

Usage

1
aggregate_over_ntiles(prepared_input)

Arguments

prepared_input

Dataframe resulting from function prepare_scores_and_ntiles or a data frame that meets requirements as specified in the section below: When you build input for aggregate_over_ntiles() yourself .

Value

Dataframe object is returned, containing:

column type definition
model_label String Name of the model object
dataset_label Factor Datasets to include in the plot as factor levels
target_class String or Integer Target classes to include in the plot
ntile Integer Ntile groups based on model probability for target class
neg Integer Number of cases not belonging to target class in dataset in ntile
pos Integer Number of cases belonging to target class in dataset in ntile
tot Integer Total number of cases in dataset in ntile
pct Decimal Percentage of cases in dataset in ntile that belongs to target class (pos/tot)
negtot Integer Total number of cases not belonging to target class in dataset
postot Integer Total number of cases belonging to target class in dataset
tottot Integer Total number of cases in dataset
pcttot Decimal Percentage of cases in dataset that belongs to target class (postot / tottot)
cumneg Integer Cumulative number of cases not belonging to target class in dataset from ntile 1 up until ntile
cumpos Integer Cumulative number of cases belonging to target class in dataset from ntile 1 up until ntile
cumtot Integer Cumulative number of cases in dataset from ntile 1 up until ntile
cumpct Integer Cumulative percentage of cases belonging to target class in dataset from ntile 1 up until ntile (cumpos/cumtot)
gain Decimal Gains value for dataset for ntile (pos/postot)
cumgain Decimal Cumulative gains value for dataset for ntile (cumpos/postot)
gain_ref Decimal Lower reference for gains value for dataset for ntile (ntile/#ntiles)
gain_opt Decimal Upper reference for gains value for dataset for ntile
lift Decimal Lift value for dataset for ntile (pct/pcttot)
cumlift Decimal Cumulative lift value for dataset for ntile ((cumpos/cumtot)/pcttot)
cumlift_ref Decimal Reference value for Cumulative lift value (constant: 1)

When you build input for aggregate_over_ntiles() yourself

To make plots with modelplotr, is not required to use the function prepare_scores_and_ntiles to generate the required input data. You can create your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , In that case, make sure the input dataframe contains the folowing columns & formats:

column type definition
model_label Factor Name of the model object
dataset_label Factor Datasets to include in the plot as factor levels
y_true Factor Target with actual values
prob_[tv1] Decimal Probability according to model for target value 1
prob_[tv2] Decimal Probability according to model for target value 2
... ... ...
prob_[tvn] Decimal Probability according to model for target value n
ntl_[tv1] Integer Ntile based on probability according to model for target value 1
ntl_[tv2] Integerl Ntile based on probability according to model for target value 2
... ... ...
ntl_[tvn] Integer Ntile based on probability according to model for target value n

See build_input_yourself for an example to build the required input yourself.

See Also

modelplotr for generic info on the package moddelplotr

vignette('modelplotr')

prepare_scores_and_ntiles for details on the function prepare_scores_and_ntiles that generates the required input.

plotting_scope for details on the function plotting_scope that filters the output of aggregate_over_ntiles to prepare it for the required evaluation.

build_input_yourself for an example to build the required input yourself.

https://github.com/modelplot/modelplotr for details on the package

https://modelplot.github.io/ for our blog on the value of the model plots

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
## Not run: 
# load example data (Bank clients with/without a term deposit - see ?bank_td for details)
data("bank_td")

# prepare data for training model for binomial target has_td and train models
train_index =  sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE)
train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]

#train models using mlr...
trainTask <- mlr::makeClassifTask(data = train, target = "has_td")
testTask <- mlr::makeClassifTask(data = test, target = "has_td")
mlr::configureMlr() # this line is needed when using mlr without loading it (mlr::)
task = mlr::makeClassifTask(data = train, target = "has_td")
lrn = mlr::makeLearner("classif.randomForest", predict.type = "prob")
rf = mlr::train(lrn, task)
lrn = mlr::makeLearner("classif.multinom", predict.type = "prob")
mnl = mlr::train(lrn, task)
#... or train models using caret...
# setting caret cross validation, here tuned for speed (not accuracy!)
fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE)
# random forest using ranger package, here tuned for speed (not accuracy!)
rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl,
                  tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10))
# mnl model using glmnet package
mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl)
#... or train models using h2o...
h2o::h2o.init()
h2o::h2o.no_progress()
h2o_train = h2o::as.h2o(train)
h2o_test = h2o::as.h2o(test)
gbm <- h2o::h2o.gbm(y = "has_td",
                          x = setdiff(colnames(train), "has_td"),
                          training_frame = h2o_train,
                          nfolds = 5)
#... or train models using keras.
x_train <- as.matrix(train[,-1]); y=train[,1]; y_train <- keras::to_categorical(as.numeric(y)-1);
`%>%` <- magrittr::`%>%`
nn <- keras::keras_model_sequential() %>%
keras::layer_dense(units = 16,kernel_initializer = "uniform",activation = 'relu',
                   input_shape = NCOL(x_train))%>%
  keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu') %>%
  keras::layer_dense(units = length(levels(train[,1])),activation='softmax')
nn %>% keras::compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=c('accuracy'))
nn %>% keras::fit(x_train,y_train,epochs = 20,batch_size = 1028,verbose=0)

# preparation steps
scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"),
                      dataset_labels = list("train data","test data"),
                      models = list("rf","mnl", "gbm","nn"),
                      model_labels = list("random forest","multinomial logit",
                                          "gradient boosting machine","artificial neural network"),
                      target_column="has_td")
aggregated <- aggregate_over_ntiles(prepared_input=scores_and_ntiles)
head(aggregated)
plot_input <- plotting_scope(prepared_input = aggregated)
head(plot_input)

## End(Not run)

modelplotr documentation built on Oct. 23, 2020, 8:20 p.m.