backwd_stepwise_glm: Automated Backward Stepwise GLM

Description Usage Arguments Value Examples

Description

Takes in a dataframe and the dependent variable (in quotes) as arguments, splits the data into testing and training, and uses automated backward stepwise selection to build a series of multiple regression models on the training data. Each model is then evaluated on the test data and model evaluation metrics are computed for each model. These metrics are provided as plots. Additionally, the model metrics are ranked and average rank is taken. The model with the best average ranking among the metrics is displayed (along with its formula). By default, metrics are all given the same relative importance (i.e., weights) when calculating average model metric rank, but if the user desires to give more weight to one or more metrics than the others they can specify these weights as arguments (default for weights is 1). As of v 0.2.0, only the family = gauissian(link = 'identity') argument is provided within the glm function.

Usage

1
2
backwd_stepwise_glm(data, dv, aic_wt = 1, r_wt = 1, mae_wt = 1,
  r_squ_wt = 1, train_prop = 0.7, random_seed = 7)

Arguments

data

A dataframe with one column as the dependent variable and the others as independent variables

dv

The column name of the (continuous) dependent variable (must be in quotes, i.e., 'Dependent_Variable')

aic_wt

Weight given to the rank value of the AIC of the model fitted on the training data (used when calculating mean model performance, default = 1)

r_wt

Weight given to the rank value of the Pearson Correlation between the predicted and actual values on the test data (used when calculating mean model performance, default = 1)

mae_wt

Weight given to the rank value of Mean Absolute Error on the test data (used when calculating mean model performance, default = 1)

r_squ_wt

Weight given to the rank value of R-Squared on the test data (used when calculating mean model performance, default = 1)

train_prop

Proportion of the data used for the training data set

random_seed

Random seed to use when splitting into training and testing data

Value

This function returns a plot for each metric by model and the best overall model with the formula used when fitting that model

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
dt <- mtcars
stepwise_model <- backwd_stepwise_glm(data = dt,
                                      dv = 'mpg',
                                      aic_wt = 1,
                                      r_wt = 0.8,
                                      mae_wt = 1,
                                      r_squ_wt = 0.8,
                                      train_prop = 0.6,
                                      random_seed = 5)
stepwise_model

Example output

Loading required package: caret
Loading required package: lattice
Loading required package: ggplot2
Loading required package: formula.tools
[1] Success: There were 9 models build and there were 10 independent variables in the origninal model
[1] "Model 9 has the lowest AIC: 83.745"
[1] "Model 6 has the highest r: 0.8931"
[1] "Model 6 has the lowest MAE: 2.2556"
[1] "Model 6 has the highest R-Squared: 0.7388"
[1] "Model 6 has the lowest weighted average rank (lower is better)\n amongst model evaluation metrics"

Call:  glm(formula = paste(model_formulas_vector[best_model]), family = gaussian(link = "identity"), 
    data = dt_train)

Coefficients:
(Intercept)           wt           hp         disp         qsec         gear  
   23.94094     -4.67457     -0.03285      0.01543      0.34559      1.53229  

Degrees of Freedom: 18 Total (i.e. Null);  13 Residual
Null Deviance:	    519.2 
Residual Deviance: 49.12 	AIC: 85.97

AutoStepwiseGLM documentation built on May 1, 2019, 10:52 p.m.