plot_pred_improve: Plot the Model Performance Improvement of Each Predictor...

Description Usage Arguments Value Figures Author(s) References Examples

Description

Plot the model performance improvement of each predictor relative to the null model. If the outcome is categorical, then a logisitic regression model is used and the area under the ROC curve is used to assess performance. If the outcome is numeric, then an ordinary least squares model is used and the root mean squared error (RMSE) is used to assess performance.

The results are estimated across resamples and the p-value is determined using a one-sided paired t-test of the predictor results and the null model results in each case. The p-values are adjusted using the Benjamini-Hochberg method to control the false discovery rate.

Usage

1
plot_pred_improve(data, outcome, seed, folds = 10, repeats = 3)

Arguments

data

The dataframe containing the predictors and the outcome.

outcome

The outcome variable name.

seed

A numeric seed for reproducibility. L'Ecuyer-CMRG is used as the RNG kind.

folds

Defaults to 10. The number of folds to use with repeated cross-validation.

repeats

Defaults to 3. The number of repeats to use with repeated cross-validation.

Value

Returns a ggplot object with the improvement value on the x-axis and the negative log10 of the adjusted p-value on the y-axis.

A vertical dashed red line marks the 0 improvement level on the x-axis while a horizontal dashed red line marks the 0.05 p-value level after adjustment.

Figures

Author(s)

Andrew Kostandy (andrew.kostandy@gmail.com)

References

This technique was discussed in the book Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn and Kjell Johnson.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
library(tidyverse)
library(mlbench)

data(BreastCancer)
dat <- BreastCancer %>% select(-Id)

dat <- dat %>% modify_at(c(1:9), as.numeric) %>% mutate(Class = fct_rev(Class))

plot_pred_improve(data = dat, outcome = Class,
                  seed = 42, folds = 10, repeats = 3)

AndrewKostandy/MLtoolkit documentation built on May 7, 2019, 9:51 p.m.