Description Usage Arguments Value When you build input for aggregate_over_ntiles() yourself See Also Examples
View source: R/dataprepmodelplots.R
Build a dataframe with aggregated actuals and predictions.
Records in this dataframe represent the unique combinations of models [m], datasets [d], targetvalues [t] and ntiles [n].
The size of this dataframe therefore is (m*d*t*n) rows and 23 columns.
In most cases, you do not need to use function
since the plotting_scope
function will call this function automatically.
1 | aggregate_over_ntiles(prepared_input)
|
prepared_input |
Dataframe resulting from function |
Dataframe object is returned, containing:
column | type | definition |
model_label | String | Name of the model object |
dataset_label | Factor | Datasets to include in the plot as factor levels |
target_class | String or Integer | Target classes to include in the plot |
ntile | Integer | Ntile groups based on model probability for target class |
neg | Integer | Number of cases not belonging to target class in dataset in ntile |
pos | Integer | Number of cases belonging to target class in dataset in ntile |
tot | Integer | Total number of cases in dataset in ntile |
pct | Decimal | Percentage of cases in dataset in ntile that belongs to target class (pos/tot) |
negtot | Integer | Total number of cases not belonging to target class in dataset |
postot | Integer | Total number of cases belonging to target class in dataset |
tottot | Integer | Total number of cases in dataset |
pcttot | Decimal | Percentage of cases in dataset that belongs to target class (postot / tottot) |
cumneg | Integer | Cumulative number of cases not belonging to target class in dataset from ntile 1 up until ntile |
cumpos | Integer | Cumulative number of cases belonging to target class in dataset from ntile 1 up until ntile |
cumtot | Integer | Cumulative number of cases in dataset from ntile 1 up until ntile |
cumpct | Integer | Cumulative percentage of cases belonging to target class in dataset from ntile 1 up until ntile (cumpos/cumtot) |
gain | Decimal | Gains value for dataset for ntile (pos/postot) |
cumgain | Decimal | Cumulative gains value for dataset for ntile (cumpos/postot) |
gain_ref | Decimal | Lower reference for gains value for dataset for ntile (ntile/#ntiles) |
gain_opt | Decimal | Upper reference for gains value for dataset for ntile |
lift | Decimal | Lift value for dataset for ntile (pct/pcttot) |
cumlift | Decimal | Cumulative lift value for dataset for ntile ((cumpos/cumtot)/pcttot) |
cumlift_ref | Decimal | Reference value for Cumulative lift value (constant: 1) |
To make plots with modelplotr, is not required to use the function prepare_scores_and_ntiles to generate the required input data. You can create your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , In that case, make sure the input dataframe contains the folowing columns & formats:
column | type | definition |
model_label | Factor | Name of the model object |
dataset_label | Factor | Datasets to include in the plot as factor levels |
y_true | Factor | Target with actual values |
prob_[tv1] | Decimal | Probability according to model for target value 1 |
prob_[tv2] | Decimal | Probability according to model for target value 2 |
... | ... | ... |
prob_[tvn] | Decimal | Probability according to model for target value n |
ntl_[tv1] | Integer | Ntile based on probability according to model for target value 1 |
ntl_[tv2] | Integerl | Ntile based on probability according to model for target value 2 |
... | ... | ... |
ntl_[tvn] | Integer | Ntile based on probability according to model for target value n |
See build_input_yourself
for an example to build the required input yourself.
modelplotr
for generic info on the package moddelplotr
prepare_scores_and_ntiles
for details on the function prepare_scores_and_ntiles
that generates the required input.
plotting_scope
for details on the function plotting_scope
that
filters the output of aggregate_over_ntiles
to prepare it for the required evaluation.
build_input_yourself
for an example to build the required input yourself.
https://github.com/modelplot/modelplotr for details on the package
https://modelplot.github.io/ for our blog on the value of the model plots
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ## Not run:
# load example data (Bank clients with/without a term deposit - see ?bank_td for details)
data("bank_td")
# prepare data for training model for binomial target has_td and train models
train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE)
train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
#train models using mlr...
trainTask <- mlr::makeClassifTask(data = train, target = "has_td")
testTask <- mlr::makeClassifTask(data = test, target = "has_td")
mlr::configureMlr() # this line is needed when using mlr without loading it (mlr::)
task = mlr::makeClassifTask(data = train, target = "has_td")
lrn = mlr::makeLearner("classif.randomForest", predict.type = "prob")
rf = mlr::train(lrn, task)
lrn = mlr::makeLearner("classif.multinom", predict.type = "prob")
mnl = mlr::train(lrn, task)
#... or train models using caret...
# setting caret cross validation, here tuned for speed (not accuracy!)
fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE)
# random forest using ranger package, here tuned for speed (not accuracy!)
rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl,
tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10))
# mnl model using glmnet package
mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl)
#... or train models using h2o...
h2o::h2o.init()
h2o::h2o.no_progress()
h2o_train = h2o::as.h2o(train)
h2o_test = h2o::as.h2o(test)
gbm <- h2o::h2o.gbm(y = "has_td",
x = setdiff(colnames(train), "has_td"),
training_frame = h2o_train,
nfolds = 5)
#... or train models using keras.
x_train <- as.matrix(train[,-1]); y=train[,1]; y_train <- keras::to_categorical(as.numeric(y)-1);
`%>%` <- magrittr::`%>%`
nn <- keras::keras_model_sequential() %>%
keras::layer_dense(units = 16,kernel_initializer = "uniform",activation = 'relu',
input_shape = NCOL(x_train))%>%
keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu') %>%
keras::layer_dense(units = length(levels(train[,1])),activation='softmax')
nn %>% keras::compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=c('accuracy'))
nn %>% keras::fit(x_train,y_train,epochs = 20,batch_size = 1028,verbose=0)
# preparation steps
scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"),
dataset_labels = list("train data","test data"),
models = list("rf","mnl", "gbm","nn"),
model_labels = list("random forest","multinomial logit",
"gradient boosting machine","artificial neural network"),
target_column="has_td")
aggregated <- aggregate_over_ntiles(prepared_input=scores_and_ntiles)
head(aggregated)
plot_input <- plotting_scope(prepared_input = aggregated)
head(plot_input)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.