prepare_scores_and_ntiles_keras: Build a dataframe containing Actuals, Probabilities and...
In modelplotr: Plots to Evaluate the Business Performance of Predictive Models

Description Usage Arguments Value When you build scores_and_ntiles yourself See Also Examples

Build dataframe object that contains actuals and predictions on the target variable for each input list in inputlists and each (sequential/functional API) keras model in models

prepare_scores_and_ntiles_keras(
  inputlists,
  inputlist_labels,
  outputlists,
  select_output_index = 1,
  models,
  model_labels,
  targetclass_labels,
  ntiles = 10
)

`inputlists`	List of Strings. A list of list names, referring to the input list objects to include in model evaluation.
`inputlist_labels`	List of Strings. A list of labels for the inputlists, shown in plots. When inputlist_labels is not specified, the names from `inputlists` are used.
`outputlists`	List of Strings. A list of list names, referring to the output list objects to include in model evaluation.
`select_output_index`	Integer. The index of the output of `outputlists` to evaluate and show in plots. Only relevant for multi-output models, default index value for multi-output models: 1.
`models`	List of Strings. List of the names of the keras model objects, containing parameters to apply models to datasets. To use this function, model objects need to be generated by the keras package. Both models created with `keras_model_sequential()` as well as models created with the keras functional API are supported by modelplotr.
`model_labels`	List of Strings. Labels for the models, shown in plots. When model_labels is not specified, the names from `moddels` are used.
`targetclass_labels`	List of Strings. A list of names to use in plots for the target class values for the selected output. If not specified, the model output column indices are used. Specify the labels in the same order as the model output columns.
`ntiles`	Integer. Number of ntiles. The ntile parameter represents the specified number of equally sized buckets the observations in each dataset are grouped into. By default, observations are grouped in 10 equally sized buckets, often referred to as deciles.

Dataframe. A dataframe is built, based on the datasets and models specified. It contains the dataset name, actuals on the target_column , the predicted probabilities for each target class (eg. unique target value) and attribution to ntiles in the dataset for each target class.

To make plots with modelplotr, is not required to use this function to generate input for function plotting_scope You can create your own dataframe containing actuals and predictions and ntiles, See build_input_yourself for an example to build the required input for plotting_scope or aggregate_over_ntiles yourself, within r or even outside of r.

modelplotr for generic info on the package moddelplotr

vignette('modelplotr')

plotting_scope for details on the function plotting_scope that transforms a dataframe created with prepare_scores_and_ntiles or aggregate_over_ntiles to a dataframe in the required format for all modelplotr plots.

aggregate_over_ntiles for details on the function aggregate_over_ntiles that aggregates the output of prepare_scores_and_ntiles to create a dataframe with aggregated actuals and predictions. In most cases, you do not need to use it since the plotting_scope function will call this function automatically.

https://github.com/modelplot/modelplotr for details on the package

https://modelplot.github.io/ for our blog on the value of the model plots

## Not run: 
# load example data (Bank clients with/without a term deposit - see ?bank_td for details)
data("bank_td")

# prepare data for training model for binomial target has_td and train models
train_index =  sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE)
train = bank_td[train_index,]
test = bank_td[-train_index,]

train_seq = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
test_seq = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]


#train keras models using keras_model_sequential() .
x_train <- as.matrix(train[,-c(1:2)]); y_train <- 2-as.numeric(train[,1]);
input_train = list(x_train); output_train = list(y_train)
x_test  <- as.matrix(test[,-c(1:2)]);  y_test <- 2-as.numeric(test[,1]);
input_test = list(x_test); output_test = list(y_test)

`%>%` <- magrittr::`%>%`
nn_seq <- keras::keras_model_sequential() %>%
 keras::layer_dense(units = 16,kernel_initializer = "uniform",activation = 'relu',
                    input_shape = NCOL(x_train))%>%
 keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu') %>%
 keras::layer_dense(units = 1,activation='sigmoid')
nn_seq %>% keras::compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=c('accuracy'))
nn_seq %>% keras::fit(input_train,output_train,epochs = 20,batch_size = 1028,verbose=0)

scores_and_ntiles <- prepare_scores_and_ntiles_keras(inputlists = list("input_train","input_test"),
                          inputlist_labels = list("train data","test data"),
                          models = list("nn_seq"),
                          model_labels = list("keras sequential model"),
                          outputlists = list("output_train","output_test"),
                          select_output_index = 1,
                          targetclass_labels = list("no.term.deposit","term.deposit"),
                          ntiles = 10)

plot_input <- plotting_scope(prepared_input = scores_and_ntiles,scope = "compare_datasets")
plot_cumgains(data = plot_input)
plot_cumlift(data = plot_input)
plot_response(data = plot_input)
plot_cumresponse(data = plot_input)
plot_multiplot(data = plot_input)


#... or train keras models using keras functional api (multi-input / multi-output is supported).
x1_train <- as.matrix(train[,c(3:4)]); y1_train <- as.numeric(train[,1])-1;
x2_train <- as.matrix(train[,c(5:7)]); y2_train <- keras::to_categorical(as.numeric(train[,2])-1,
                                                                         num_classes = 4);
input_train = list(x1_train,x2_train); output_train = list(y1_train,y2_train)
x1_test <- as.matrix(test[,c(3:4)]); y1_test <- as.numeric(test[,1])-1;
x2_test <- as.matrix(test[,c(5:7)]); y2_test <- keras::to_categorical(as.numeric(test[,2])-1,
                                                                         num_classes = 4);
input_test = list(x1_test,x2_test); output_test = list(y1_test,y2_test)

x1_input <- keras::layer_input(shape = NCOL(x1_train))
x2_input <- keras::layer_input(shape = NCOL(x2_train))
concatenated <- keras::layer_concatenate(list(x1_input, x2_input)) %>%
 keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu') %>%
 keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu')
y1_output <- concatenated %>% keras::layer_dense(1, activation = "sigmoid", name = "has_td")
y2_output <- concatenated %>% keras::layer_dense(4, activation = "softmax", name = "td_type")
nn_api <- keras::keras_model(list(x1_input,x2_input), list(y1_output,y2_output))
nn_api %>% keras::compile(optimizer = "rmsprop",
                         loss = c("binary_crossentropy","categorical_crossentropy"))
nn_api %>% keras::fit(list(x1_train, x2_train),list(y1_train, y2_train),20,batch_size = 1028)

scores_and_ntiles <- prepare_scores_and_ntiles_keras(inputlists = list("input_train","input_test"),
                          inputlist_labels = list("train data","test data"),
                          models = list("nn_api"),
                          model_labels = list("keras api model"),
                          outputlists = list("output_train","output_test"),
                          select_output_index = 2,
                          targetclass_labels = list('no.td','td.type.A','td.type.B','td.type.C'),
                          ntiles = 100)
plot_input <- plotting_scope(prepared_input=scores_and_ntiles,scope="compare_targetclasses")
plot_cumgains(data = plot_input)
plot_cumlift(data = plot_input)
plot_response(data = plot_input)
plot_cumresponse(data = plot_input)
plot_multiplot(data = plot_input)

## End(Not run)