build_input_yourself: Example: build required input from a custom model

Description When you build input for plotting_scope() yourself Examples

Description

It's very easy to apply modelplotr to predictive models that are developed in caret, mlr, h2o or keras. However, also for models that are developed differently, even those built outside of R, it only takes a bit more work to use modelplotr on top of these models. In this section we introduce the required format and an example.

When you build input for plotting_scope() yourself

To make plots with modelplotr, is not required to use the function prepare_scores_and_ntiles to generate the required input data. You can create your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , In that case, make sure the input dataframe contains the folowing columns & formats:

column type definition
model_label Factor Name of the model object
dataset_label Factor Datasets to include in the plot as factor levels
y_true Factor Target with actual values
prob_[tv1] Decimal Probability according to model for target value 1
prob_[tv2] Decimal Probability according to model for target value 2
... ... ...
prob_[tvn] Decimal Probability according to model for target value n
ntl_[tv1] Integer Ntile based on probability according to model for target value 1
ntl_[tv2] Integerl Ntile based on probability according to model for target value 2
... ... ...
ntl_[tvn] Integer Ntile based on probability according to model for target value n

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# load example data (Bank clients with/without a term deposit - see ?bank_td for details)
data("bank_td")
library(dplyr)
# prepare data for training model for binomial target has_td and train models
train_index =  sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE)
train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]

#train logistic regression model with stats package
glm.model <- glm(has_td ~.,family=binomial(link='logit'),data=train)
#score model
prob_no.term.deposit <- stats::predict(glm.model,newdata=train,type='response')
prob_term.deposit <- 1-prob_no.term.deposit
#set number of ntiles
ntiles = 10
# determine cutoffs
cutoffs = c(stats::quantile(prob_term.deposit,probs = seq(0,1,1/ntiles),na.rm = TRUE))
#calculate ntile values
ntl_term.deposit <- (ntiles+1)-as.numeric(cut(prob_term.deposit,breaks=cutoffs,include.lowest=TRUE))
ntl_no.term.deposit <- (ntiles+1)-ntl_term.deposit
# create scored data frame
scores_and_ntiles <- train %>%
    select(has_td) %>%
    mutate(model_label=factor('logistic regression'),
           dataset_label=factor('train data'),
           y_true=factor(has_td),
           prob_term.deposit = prob_term.deposit,
           prob_no.term.deposit = prob_no.term.deposit,
           ntl_term.deposit = ntl_term.deposit,
           ntl_no.term.deposit = ntl_no.term.deposit) %>%
    select(-has_td)

# add test data
#score model on test data
prob_no.term.deposit <- stats::predict(glm.model,newdata=test,type='response')
prob_term.deposit <- 1-prob_no.term.deposit
#set number of ntiles
ntiles = 10
# determine cutoffs
cutoffs = c(stats::quantile(prob_term.deposit,probs = seq(0,1,1/ntiles),na.rm = TRUE))
#calculate ntile values
ntl_term.deposit <- (ntiles+1)-as.numeric(cut(prob_term.deposit,breaks=cutoffs,include.lowest=TRUE))
ntl_no.term.deposit <- (ntiles+1)-ntl_term.deposit
scores_and_ntiles <- scores_and_ntiles %>%
  rbind(
   test %>%
    select(has_td) %>%
    mutate(model_label=factor('logistic regression'),
           dataset_label=factor('test data'),
           y_true=factor(has_td),
           prob_term.deposit = prob_term.deposit,
           prob_no.term.deposit = prob_no.term.deposit,
           ntl_term.deposit = ntl_term.deposit,
           ntl_no.term.deposit = ntl_no.term.deposit) %>%
    select(-has_td)
    )

plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope='compare_datasets')
plot_cumgains()

jurrr/modelplotr documentation built on May 12, 2019, 5:46 a.m.