pred_improve: Get the Model Performance Improvement of Each Predictor...
In AndrewKostandy/MLtoolkit: Functions to Help with Machine Learning & Feature Engineering Tasks

Description Usage Arguments Value Author(s) References Examples

Get the model performance improvement of each predictor relative to the null model. If the outcome is categorical, then a logisitic regression model is used and the area under the ROC curve is used to assess performance. If the outcome is numeric, then an ordinary least squares model is used and the root mean squared error (RMSE) is used to assess performance.

The results are estimated across resamples and the p-value is determined using a one-sided paired t-test of the predictor results and the null model results in each case. The p-values are adjusted using the Benjamini-Hochberg method to control the false discovery rate.

1	pred_improve(data, outcome, seed, folds = 10, repeats = 3)

`data`	The dataframe containing the predictors and the outcome.
`outcome`	The outcome variable name.
`seed`	A numeric seed for reproducibility. L'Ecuyer-CMRG is used as the RNG kind.
`folds`	Defaults to 10. The number of folds to use with repeated cross-validation.
`repeats`	Defaults to 3. The number of repeats to use with repeated cross-validation.

Returns a dataframe in descending order of improvement that includes the following columns:

`predictor`	The predictor names.
`improvement`	The AUROC or RMSE improvement between using the predictor and a null model.
`significance`	The p-value of the one-sided paired t-test of the predictor results and the null model results calculated across resamples. The p-values are adjusted using the Benjamini-Hochberg method to control the false discovery rate.

Andrew Kostandy (andrew.kostandy@gmail.com)

This technique was discussed in the book Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn and Kjell Johnson.

library(tidyverse)
library(mlbench)

data(BreastCancer)
dat <- BreastCancer %>% select(-Id)

dat <- dat %>% modify_at(c(1:9), as.numeric) %>% mutate(Class = fct_rev(Class))

pred_improve(data = dat, outcome = Class,
             seed = 42, folds = 10, repeats = 3)

AndrewKostandy/MLtoolkit documentation built on May 7, 2019, 9:51 p.m.

AndrewKostandy/MLtoolkit index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

AndrewKostandy/MLtoolkit
Functions to Help with Machine Learning & Feature Engineering Tasks

pred_improve: Get the Model Performance Improvement of Each Predictor...
In AndrewKostandy/MLtoolkit: Functions to Help with Machine Learning & Feature Engineering Tasks

Description

Usage

Arguments

Value

Author(s)

References

Examples

Related to pred_improve in AndrewKostandy/MLtoolkit...

R Package Documentation

Browse R Packages

We want your feedback!

AndrewKostandy/MLtoolkit Functions to Help with Machine Learning & Feature Engineering Tasks

pred_improve: Get the Model Performance Improvement of Each Predictor... In AndrewKostandy/MLtoolkit: Functions to Help with Machine Learning & Feature Engineering Tasks

Description

Usage

Arguments

Value

Author(s)

References

Examples

Related to pred_improve in AndrewKostandy/MLtoolkit...

R Package Documentation

Browse R Packages

We want your feedback!

AndrewKostandy/MLtoolkit
Functions to Help with Machine Learning & Feature Engineering Tasks

pred_improve: Get the Model Performance Improvement of Each Predictor...
In AndrewKostandy/MLtoolkit: Functions to Help with Machine Learning & Feature Engineering Tasks