build_input_yourself: Example: build required input from a custom model
In modelplotr: Plots to Evaluate the Business Performance of Predictive Models

Description When you build input for plotting_scope() yourself Examples

It's very easy to apply modelplotr to predictive models that are developed in caret, mlr, h2o or keras. However, also for models that are developed differently, even those built outside of R, it only takes a bit more work to use modelplotr on top of these models. In this section we introduce the required format and an example.

To make plots with modelplotr, is not required to use the function prepare_scores_and_ntiles to generate the required input data. You can create your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , In that case, make sure the input dataframe contains the folowing columns & formats:

column	type	definition
model_label	Factor	Name of the model object
dataset_label	Factor	Datasets to include in the plot as factor levels
y_true	Factor	Target with actual values
prob_[tv1]	Decimal	Probability according to model for target value 1
prob_[tv2]	Decimal	Probability according to model for target value 2
...	...	...
prob_[tvn]	Decimal	Probability according to model for target value n
ntl_[tv1]	Integer	Ntile based on probability according to model for target value 1
ntl_[tv2]	Integerl	Ntile based on probability according to model for target value 2
...	...	...
ntl_[tvn]	Integer	Ntile based on probability according to model for target value n

# load example data (Bank clients with/without a term deposit - see ?bank_td for details)
data("bank_td")
library(dplyr)
# prepare data for training model for binomial target has_td and train models
train_index =  sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE)
train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]

#train logistic regression model with stats package
glm.model <- glm(has_td ~.,family=binomial(link='logit'),data=train)
#score model
prob_no.term.deposit <- stats::predict(glm.model,newdata=train,type='response')
prob_term.deposit <- 1-prob_no.term.deposit
#set number of ntiles
ntiles = 10
# determine cutoffs
cutoffs = c(stats::quantile(prob_term.deposit,probs = seq(0,1,1/ntiles),na.rm = TRUE))
#calculate ntile values
ntl_term.deposit <- (ntiles+1)-as.numeric(cut(prob_term.deposit,breaks=cutoffs,include.lowest=TRUE))
ntl_no.term.deposit <- (ntiles+1)-ntl_term.deposit
# create scored data frame
scores_and_ntiles <- train %>%
    select(has_td) %>%
    mutate(model_label=factor('logistic regression'),
           dataset_label=factor('train data'),
           y_true=factor(has_td),
           prob_term.deposit = prob_term.deposit,
           prob_no.term.deposit = prob_no.term.deposit,
           ntl_term.deposit = ntl_term.deposit,
           ntl_no.term.deposit = ntl_no.term.deposit) %>%
    select(-has_td)

# add test data
#score model on test data
prob_no.term.deposit <- stats::predict(glm.model,newdata=test,type='response')
prob_term.deposit <- 1-prob_no.term.deposit
#set number of ntiles
ntiles = 10
# determine cutoffs
cutoffs = c(stats::quantile(prob_term.deposit,probs = seq(0,1,1/ntiles),na.rm = TRUE))
#calculate ntile values
ntl_term.deposit <- (ntiles+1)-as.numeric(cut(prob_term.deposit,breaks=cutoffs,include.lowest=TRUE))
ntl_no.term.deposit <- (ntiles+1)-ntl_term.deposit
scores_and_ntiles <- scores_and_ntiles %>%
  rbind(
   test %>%
    select(has_td) %>%
    mutate(model_label=factor('logistic regression'),
           dataset_label=factor('test data'),
           y_true=factor(has_td),
           prob_term.deposit = prob_term.deposit,
           prob_no.term.deposit = prob_no.term.deposit,
           ntl_term.deposit = ntl_term.deposit,
           ntl_no.term.deposit = ntl_no.term.deposit) %>%
    select(-has_td)
    )

plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope='compare_datasets')
plot_cumgains()