backwd_stepwise_glm: Automated Backward Stepwise GLM
In AutoStepwiseGLM: Builds Stepwise GLMs via Train and Test Approach

Description Usage Arguments Value Examples

Takes in a dataframe and the dependent variable (in quotes) as arguments, splits the data into testing and training, and uses automated backward stepwise selection to build a series of multiple regression models on the training data. Each model is then evaluated on the test data and model evaluation metrics are computed for each model. These metrics are provided as plots. Additionally, the model metrics are ranked and average rank is taken. The model with the best average ranking among the metrics is displayed (along with its formula). By default, metrics are all given the same relative importance (i.e., weights) when calculating average model metric rank, but if the user desires to give more weight to one or more metrics than the others they can specify these weights as arguments (default for weights is 1). As of v 0.2.0, only the family = gauissian(link = 'identity') argument is provided within the glm function.

1 2	backwd_stepwise_glm(data, dv, aic_wt = 1, r_wt = 1, mae_wt = 1, r_squ_wt = 1, train_prop = 0.7, random_seed = 7)

`data`	A dataframe with one column as the dependent variable and the others as independent variables
`dv`	The column name of the (continuous) dependent variable (must be in quotes, i.e., 'Dependent_Variable')
`aic_wt`	Weight given to the rank value of the AIC of the model fitted on the training data (used when calculating mean model performance, default = 1)
`r_wt`	Weight given to the rank value of the Pearson Correlation between the predicted and actual values on the test data (used when calculating mean model performance, default = 1)
`mae_wt`	Weight given to the rank value of Mean Absolute Error on the test data (used when calculating mean model performance, default = 1)
`r_squ_wt`	Weight given to the rank value of R-Squared on the test data (used when calculating mean model performance, default = 1)
`train_prop`	Proportion of the data used for the training data set
`random_seed`	Random seed to use when splitting into training and testing data

This function returns a plot for each metric by model and the best overall model with the formula used when fitting that model

dt <- mtcars
stepwise_model <- backwd_stepwise_glm(data = dt,
                                      dv = 'mpg',
                                      aic_wt = 1,
                                      r_wt = 0.8,
                                      mae_wt = 1,
                                      r_squ_wt = 0.8,
                                      train_prop = 0.6,
                                      random_seed = 5)
stepwise_model

Loading required package: caret
Loading required package: lattice
Loading required package: ggplot2
Loading required package: formula.tools
[1] Success: There were 9 models build and there were 10 independent variables in the origninal model
[1] "Model 9 has the lowest AIC: 83.745"
[1] "Model 6 has the highest r: 0.8931"
[1] "Model 6 has the lowest MAE: 2.2556"
[1] "Model 6 has the highest R-Squared: 0.7388"
[1] "Model 6 has the lowest weighted average rank (lower is better)\n amongst model evaluation metrics"

Call:  glm(formula = paste(model_formulas_vector[best_model]), family = gaussian(link = "identity"), 
    data = dt_train)

Coefficients:
(Intercept)           wt           hp         disp         qsec         gear  
   23.94094     -4.67457     -0.03285      0.01543      0.34559      1.53229  

Degrees of Freedom: 18 Total (i.e. Null);  13 Residual
Null Deviance:	    519.2 
Residual Deviance: 49.12 	AIC: 85.97