build_input_yourself: Example: build required input from a custom model In jurrr/modelplotr: Plots to Evaluate the Business Performance of Predictive Models

Description

It's very easy to apply modelplotr to predictive models that are developed in caret, mlr, h2o or keras. However, also for models that are developed differently, even those built outside of R, it only takes a bit more work to use modelplotr on top of these models. In this section we introduce the required format and an example.

When you build input for plotting_scope() yourself

To make plots with modelplotr, is not required to use the function prepare_scores_and_ntiles to generate the required input data. You can create your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , In that case, make sure the input dataframe contains the folowing columns & formats:

 column type definition model_label Factor Name of the model object dataset_label Factor Datasets to include in the plot as factor levels y_true Factor Target with actual values prob_[tv1] Decimal Probability according to model for target value 1 prob_[tv2] Decimal Probability according to model for target value 2 ... ... ... prob_[tvn] Decimal Probability according to model for target value n ntl_[tv1] Integer Ntile based on probability according to model for target value 1 ntl_[tv2] Integerl Ntile based on probability according to model for target value 2 ... ... ... ntl_[tvn] Integer Ntile based on probability according to model for target value n

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59``` ```# load example data (Bank clients with/without a term deposit - see ?bank_td for details) data("bank_td") library(dplyr) # prepare data for training model for binomial target has_td and train models train_index = sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE) train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')] #train logistic regression model with stats package glm.model <- glm(has_td ~.,family=binomial(link='logit'),data=train) #score model prob_no.term.deposit <- stats::predict(glm.model,newdata=train,type='response') prob_term.deposit <- 1-prob_no.term.deposit #set number of ntiles ntiles = 10 # determine cutoffs cutoffs = c(stats::quantile(prob_term.deposit,probs = seq(0,1,1/ntiles),na.rm = TRUE)) #calculate ntile values ntl_term.deposit <- (ntiles+1)-as.numeric(cut(prob_term.deposit,breaks=cutoffs,include.lowest=TRUE)) ntl_no.term.deposit <- (ntiles+1)-ntl_term.deposit # create scored data frame scores_and_ntiles <- train %>% select(has_td) %>% mutate(model_label=factor('logistic regression'), dataset_label=factor('train data'), y_true=factor(has_td), prob_term.deposit = prob_term.deposit, prob_no.term.deposit = prob_no.term.deposit, ntl_term.deposit = ntl_term.deposit, ntl_no.term.deposit = ntl_no.term.deposit) %>% select(-has_td) # add test data #score model on test data prob_no.term.deposit <- stats::predict(glm.model,newdata=test,type='response') prob_term.deposit <- 1-prob_no.term.deposit #set number of ntiles ntiles = 10 # determine cutoffs cutoffs = c(stats::quantile(prob_term.deposit,probs = seq(0,1,1/ntiles),na.rm = TRUE)) #calculate ntile values ntl_term.deposit <- (ntiles+1)-as.numeric(cut(prob_term.deposit,breaks=cutoffs,include.lowest=TRUE)) ntl_no.term.deposit <- (ntiles+1)-ntl_term.deposit scores_and_ntiles <- scores_and_ntiles %>% rbind( test %>% select(has_td) %>% mutate(model_label=factor('logistic regression'), dataset_label=factor('test data'), y_true=factor(has_td), prob_term.deposit = prob_term.deposit, prob_no.term.deposit = prob_no.term.deposit, ntl_term.deposit = ntl_term.deposit, ntl_no.term.deposit = ntl_no.term.deposit) %>% select(-has_td) ) plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope='compare_datasets') plot_cumgains() ```

jurrr/modelplotr documentation built on May 12, 2019, 5:46 a.m.