aggregate_over_ntiles: Build a dataframe with aggregated evaluation measures
In jurrr/modelplotr: Plots to Evaluate the Business Performance of Predictive Models

Description Usage Arguments Value When you build input for aggregate_over_ntiles() yourself See Also Examples

Build a dataframe with aggregated actuals and predictions. Records in this dataframe represent the unique combinations of models [m], datasets [d], targetvalues [t] and ntiles [n]. The size of this dataframe therefore is (m*d*t*n) rows and 23 columns.

In most cases, you do not need to use function since the plotting_scope function will call this function automatically.

1	aggregate_over_ntiles(prepared_input)

prepared_input

Dataframe resulting from function prepare_scores_and_ntiles or a data frame that meets requirements as specified in the section below: When you build input for aggregate_over_ntiles() yourself .

Dataframe object is returned, containing:

column	type	definition
model_label	String	Name of the model object
dataset_label	Factor	Datasets to include in the plot as factor levels
target_class	String or Integer	Target classes to include in the plot
ntile	Integer	Ntile groups based on model probability for target class
neg	Integer	Number of cases not belonging to target class in dataset in ntile
pos	Integer	Number of cases belonging to target class in dataset in ntile
tot	Integer	Total number of cases in dataset in ntile
pct	Decimal	Percentage of cases in dataset in ntile that belongs to target class (pos/tot)
negtot	Integer	Total number of cases not belonging to target class in dataset
postot	Integer	Total number of cases belonging to target class in dataset
tottot	Integer	Total number of cases in dataset
pcttot	Decimal	Percentage of cases in dataset that belongs to target class (postot / tottot)
cumneg	Integer	Cumulative number of cases not belonging to target class in dataset from ntile 1 up until ntile
cumpos	Integer	Cumulative number of cases belonging to target class in dataset from ntile 1 up until ntile
cumtot	Integer	Cumulative number of cases in dataset from ntile 1 up until ntile
cumpct	Integer	Cumulative percentage of cases belonging to target class in dataset from ntile 1 up until ntile (cumpos/cumtot)
gain	Decimal	Gains value for dataset for ntile (pos/postot)
cumgain	Decimal	Cumulative gains value for dataset for ntile (cumpos/postot)
gain_ref	Decimal	Lower reference for gains value for dataset for ntile (ntile/#ntiles)
gain_opt	Decimal	Upper reference for gains value for dataset for ntile
lift	Decimal	Lift value for dataset for ntile (pct/pcttot)
cumlift	Decimal	Cumulative lift value for dataset for ntile ((cumpos/cumtot)/pcttot)
cumlift_ref	Decimal	Reference value for Cumulative lift value (constant: 1)

To make plots with modelplotr, is not required to use the function prepare_scores_and_ntiles to generate the required input data. You can create your own dataframe containing actuals and probabilities and ntiles (1st ntile = (1/#ntiles) percent with highest model probability, last ntile = (1/#ntiles) percent with lowest probability according to model) , In that case, make sure the input dataframe contains the folowing columns & formats:

column	type	definition
model_label	Factor	Name of the model object
dataset_label	Factor	Datasets to include in the plot as factor levels
y_true	Factor	Target with actual values
prob_[tv1]	Decimal	Probability according to model for target value 1
prob_[tv2]	Decimal	Probability according to model for target value 2
...	...	...
prob_[tvn]	Decimal	Probability according to model for target value n
ntl_[tv1]	Integer	Ntile based on probability according to model for target value 1
ntl_[tv2]	Integerl	Ntile based on probability according to model for target value 2
...	...	...
ntl_[tvn]	Integer	Ntile based on probability according to model for target value n

See build_input_yourself for an example to build the required input yourself.

modelplotr for generic info on the package moddelplotr

vignette('modelplotr')

prepare_scores_and_ntiles for details on the function prepare_scores_and_ntiles that generates the required input.

plotting_scope for details on the function plotting_scope that filters the output of aggregate_over_ntiles to prepare it for the required evaluation.

build_input_yourself for an example to build the required input yourself.

https://github.com/modelplot/modelplotr for details on the package

https://modelplot.github.io/ for our blog on the value of the model plots

## Not run: 
# load example data (Bank clients with/without a term deposit - see ?bank_td for details)
data("bank_td")

# prepare data for training model for binomial target has_td and train models
train_index =  sample(seq(1, nrow(bank_td)),size = 0.5*nrow(bank_td) ,replace = FALSE)
train = bank_td[train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]
test = bank_td[-train_index,c('has_td','duration','campaign','pdays','previous','euribor3m')]

#train models using mlr...
trainTask <- mlr::makeClassifTask(data = train, target = "has_td")
testTask <- mlr::makeClassifTask(data = test, target = "has_td")
mlr::configureMlr() # this line is needed when using mlr without loading it (mlr::)
task = mlr::makeClassifTask(data = train, target = "has_td")
lrn = mlr::makeLearner("classif.randomForest", predict.type = "prob")
rf = mlr::train(lrn, task)
lrn = mlr::makeLearner("classif.multinom", predict.type = "prob")
mnl = mlr::train(lrn, task)
#... or train models using caret...
# setting caret cross validation, here tuned for speed (not accuracy!)
fitControl <- caret::trainControl(method = "cv",number = 2,classProbs=TRUE)
# random forest using ranger package, here tuned for speed (not accuracy!)
rf = caret::train(has_td ~.,data = train, method = "ranger",trControl = fitControl,
                  tuneGrid = expand.grid(.mtry = 2,.splitrule = "gini",.min.node.size=10))
# mnl model using glmnet package
mnl = caret::train(has_td ~.,data = train, method = "glmnet",trControl = fitControl)
#... or train models using h2o...
h2o::h2o.init()
h2o::h2o.no_progress()
h2o_train = h2o::as.h2o(train)
h2o_test = h2o::as.h2o(test)
gbm <- h2o::h2o.gbm(y = "has_td",
                          x = setdiff(colnames(train), "has_td"),
                          training_frame = h2o_train,
                          nfolds = 5)
#... or train models using keras.
x_train <- as.matrix(train[,-1]); y=train[,1]; y_train <- keras::to_categorical(as.numeric(y)-1);
`%>%` <- magrittr::`%>%`
nn <- keras::keras_model_sequential() %>%
keras::layer_dense(units = 16,kernel_initializer = "uniform",activation = 'relu',
                   input_shape = NCOL(x_train))%>%
  keras::layer_dense(units = 16,kernel_initializer = "uniform", activation='relu') %>%
  keras::layer_dense(units = length(levels(train[,1])),activation='softmax')
nn %>% keras::compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=c('accuracy'))
nn %>% keras::fit(x_train,y_train,epochs = 20,batch_size = 1028,verbose=0)

# preparation steps
scores_and_ntiles <- prepare_scores_and_ntiles(datasets=list("train","test"),
                      dataset_labels = list("train data","test data"),
                      models = list("rf","mnl", "gbm","nn"),
                      model_labels = list("random forest","multinomial logit",
                                          "gradient boosting machine","artificial neural network"),
                      target_column="has_td")
aggregated <- aggregate_over_ntiles(prepared_input=scores_and_ntiles)
head(aggregated)
plot_input <- plotting_scope(prepared_input = aggregated)
head(plot_input)

## End(Not run)

jurrr/modelplotr documentation built on Oct. 15, 2020, 10:37 p.m.

jurrr/modelplotr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jurrr/modelplotr
Plots to Evaluate the Business Performance of Predictive Models

aggregate_over_ntiles: Build a dataframe with aggregated evaluation measures
In jurrr/modelplotr: Plots to Evaluate the Business Performance of Predictive Models

Description

Usage

Arguments

Value

When you build input for aggregate_over_ntiles() yourself

See Also

Examples

Related to aggregate_over_ntiles in jurrr/modelplotr...

R Package Documentation

Browse R Packages

We want your feedback!

jurrr/modelplotr Plots to Evaluate the Business Performance of Predictive Models

aggregate_over_ntiles: Build a dataframe with aggregated evaluation measures In jurrr/modelplotr: Plots to Evaluate the Business Performance of Predictive Models

Description

Usage

Arguments

Value

When you build input for aggregate_over_ntiles() yourself

See Also

Examples

Related to aggregate_over_ntiles in jurrr/modelplotr...

R Package Documentation

Browse R Packages

We want your feedback!

jurrr/modelplotr
Plots to Evaluate the Business Performance of Predictive Models

aggregate_over_ntiles: Build a dataframe with aggregated evaluation measures
In jurrr/modelplotr: Plots to Evaluate the Business Performance of Predictive Models