Description Usage Arguments Details Value Warning See Also Examples
This stepwise variable selection procedure (with iterations between the 'forward' and 'backward' steps) can be applied to obtain the best candidate final linear regression model.
1 2 | My.stepwise.lm(Y, variable.list, in.variable = "NULL", data, sle = 0.15,
sls = 0.15)
|
Y |
The response variable. |
variable.list |
A list of covariates to be selected. |
in.variable |
A list of covariate(s) to be always included in the regression model. |
data |
The data to be analyzed. |
sle |
The chosen significance level for entry (SLE). |
sls |
The chosen significance level for stay (SLS). |
The goal of regression analysis is to find one or a few parsimonious regression models that fit the observed data well for effect estimation and/or outcome prediction. To ensure a good quality of analysis, the model-fitting techniques for (1) variable selection, (2) goodness-of-fit assessment, and (3) regression diagnostics and remedies should be used in regression analysis. The stepwise variable selection procedure (with iterations between the 'forward' and 'backward' steps) is one of the best ways to obtaining the best candidate final regression model. All the bivariate significant and non-significant relevant covariates and some of their interaction terms (or moderators) are put on the 'variable list' to be selected. The significance levels for entry (SLE) and for stay (SLS) are suggested to be set at 0.15 or larger for being conservative. Then, with the aid of substantive knowledge, the best candidate final regression model is identified manually by dropping the covariates with p value > 0.05 one at a time until all regression coefficients are significantly different from 0 at the chosen alpha level of 0.05. Since the statistical testing at each step of the stepwise variable selection procedure is conditioning on the other covariates in the regression model, the multiple testing problem is not of concern. Any discrepancy between the results of bivariate analysis and regression analysis is likely due to the confounding effects of uncontrolled covariates in bivariate analysis or the masking effects of intermediate variables (or mediators) in regression analysis.
A model object representing the identified "Stepwise Final Model" with the values of variance inflating factor (VIF) for all included covarites is displayed.
The value of variance inflating factor (VIF) is bigger than 10 in continuous covariates or VIF is bigger than 2.5 in categorical covariates indicate the occurrence of multicollinearity problem among some of the covariates in the fitted regression model.
My.stepwise.glm
My.stepwise.coxph
1 2 3 4 5 6 7 8 9 10 | data("LifeCycleSavings")
names(LifeCycleSavings)
dim(LifeCycleSavings)
my.variable.list <- c("pop15", "pop75", "dpi")
My.stepwise.lm(Y = "sr", variable.list = my.variable.list, in.variable = c("ddpi"),
data = LifeCycleSavings)
my.variable.list <- c("pop15", "pop75", "dpi", "ddpi")
My.stepwise.lm(Y = "sr", variable.list = my.variable.list,
data = LifeCycleSavings, sle = 0.25, sls = 0.25)
|
[1] "sr" "pop15" "pop75" "dpi" "ddpi"
[1] 50 5
# --------------------------------------------------------------------------------------------------
### iter num = 0, Initial Model
Call:
lm(formula = as.formula(paste(Y, paste(in.variable, collapse = "+"),
sep = "~")), data = data)
Residuals:
Min 1Q Median 3Q Max
-8.5535 -3.7349 0.9835 2.7720 9.3104
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.8830 1.0110 7.797 4.46e-10 ***
ddpi 0.4758 0.2146 2.217 0.0314 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.311 on 48 degrees of freedom
Multiple R-squared: 0.0929, Adjusted R-squared: 0.074
F-statistic: 4.916 on 1 and 48 DF, p-value: 0.03139
# --------------------------------------------------------------------------------------------------
### iter num = 1, Forward Selection by LR Test: + pop15
Call:
lm(formula = sr ~ ddpi + pop15, data = data)
Residuals:
Min 1Q Median 3Q Max
-7.5831 -2.8632 0.0453 2.2273 10.4753
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.59958 2.33439 6.682 2.48e-08 ***
ddpi 0.44283 0.19240 2.302 0.025837 *
pop15 -0.21638 0.06033 -3.586 0.000796 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.861 on 47 degrees of freedom
Multiple R-squared: 0.2878, Adjusted R-squared: 0.2575
F-statistic: 9.496 on 2 and 47 DF, p-value: 0.0003438
--------------- Variance Inflating Factor (VIF) ---------------
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
ddpi pop15
1.002293 1.002293
# --------------------------------------------------------------------------------------------------
### iter num = 2, Forward Selection by LR Test: + pop75
Call:
lm(formula = sr ~ ddpi + pop15 + pop75, data = data)
Residuals:
Min 1Q Median 3Q Max
-8.2539 -2.6159 -0.3913 2.3344 9.7070
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.1247 7.1838 3.915 0.000297 ***
ddpi 0.4278 0.1879 2.277 0.027478 *
pop15 -0.4518 0.1409 -3.206 0.002452 **
pop75 -1.8354 0.9984 -1.838 0.072473 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.767 on 46 degrees of freedom
Multiple R-squared: 0.3365, Adjusted R-squared: 0.2933
F-statistic: 7.778 on 3 and 46 DF, p-value: 0.0002646
--------------- Variance Inflating Factor (VIF) ---------------
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
ddpi pop15 pop75
1.004186 5.745478 5.736014
# ==================================================================================================
*** Stepwise Final Model (in.lr.test: sle = 0.15; out.lr.test: sls = 0.15):
Call:
lm(formula = sr ~ ddpi + pop15 + pop75, data = data)
Residuals:
Min 1Q Median 3Q Max
-8.2539 -2.6159 -0.3913 2.3344 9.7070
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.1247 7.1838 3.915 0.000297 ***
ddpi 0.4278 0.1879 2.277 0.027478 *
pop15 -0.4518 0.1409 -3.206 0.002452 **
pop75 -1.8354 0.9984 -1.838 0.072473 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.767 on 46 degrees of freedom
Multiple R-squared: 0.3365, Adjusted R-squared: 0.2933
F-statistic: 7.778 on 3 and 46 DF, p-value: 0.0002646
--------------- Variance Inflating Factor (VIF) ---------------
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
ddpi pop15 pop75
1.004186 5.745478 5.736014
# --------------------------------------------------------------------------------------------------
### iter num = 0, Initial Model
Call:
lm(formula = as.formula(paste(Y, paste(in.variable, collapse = "+"),
sep = "~")), data = data)
Residuals:
Min 1Q Median 3Q Max
-9.071 -2.701 0.839 2.946 11.429
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.6710 0.6336 15.26 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.48 on 49 degrees of freedom
# --------------------------------------------------------------------------------------------------
### iter num = 1, Forward Selection by LR Test: + pop15
Call:
lm(formula = sr ~ pop15, data = data)
Residuals:
Min 1Q Median 3Q Max
-8.637 -2.374 0.349 2.022 11.155
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.49660 2.27972 7.675 6.85e-10 ***
pop15 -0.22302 0.06291 -3.545 0.000887 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.03 on 48 degrees of freedom
Multiple R-squared: 0.2075, Adjusted R-squared: 0.191
F-statistic: 12.57 on 1 and 48 DF, p-value: 0.0008866
--------------- Variance Inflating Factor (VIF) ---------------
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
# --------------------------------------------------------------------------------------------------
### iter num = 2, Forward Selection by LR Test: + ddpi
Call:
lm(formula = sr ~ pop15 + ddpi, data = data)
Residuals:
Min 1Q Median 3Q Max
-7.5831 -2.8632 0.0453 2.2273 10.4753
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.59958 2.33439 6.682 2.48e-08 ***
pop15 -0.21638 0.06033 -3.586 0.000796 ***
ddpi 0.44283 0.19240 2.302 0.025837 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.861 on 47 degrees of freedom
Multiple R-squared: 0.2878, Adjusted R-squared: 0.2575
F-statistic: 9.496 on 2 and 47 DF, p-value: 0.0003438
--------------- Variance Inflating Factor (VIF) ---------------
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
pop15 ddpi
1.002293 1.002293
# --------------------------------------------------------------------------------------------------
### iter num = 3, Forward Selection by LR Test: + pop75
Call:
lm(formula = sr ~ pop15 + ddpi + pop75, data = data)
Residuals:
Min 1Q Median 3Q Max
-8.2539 -2.6159 -0.3913 2.3344 9.7070
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.1247 7.1838 3.915 0.000297 ***
pop15 -0.4518 0.1409 -3.206 0.002452 **
ddpi 0.4278 0.1879 2.277 0.027478 *
pop75 -1.8354 0.9984 -1.838 0.072473 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.767 on 46 degrees of freedom
Multiple R-squared: 0.3365, Adjusted R-squared: 0.2933
F-statistic: 7.778 on 3 and 46 DF, p-value: 0.0002646
--------------- Variance Inflating Factor (VIF) ---------------
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
pop15 ddpi pop75
5.745478 1.004186 5.736014
# ==================================================================================================
*** Stepwise Final Model (in.lr.test: sle = 0.25; out.lr.test: sls = 0.25):
Call:
lm(formula = sr ~ pop15 + ddpi + pop75, data = data)
Residuals:
Min 1Q Median 3Q Max
-8.2539 -2.6159 -0.3913 2.3344 9.7070
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.1247 7.1838 3.915 0.000297 ***
pop15 -0.4518 0.1409 -3.206 0.002452 **
ddpi 0.4278 0.1879 2.277 0.027478 *
pop75 -1.8354 0.9984 -1.838 0.072473 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.767 on 46 degrees of freedom
Multiple R-squared: 0.3365, Adjusted R-squared: 0.2933
F-statistic: 7.778 on 3 and 46 DF, p-value: 0.0002646
--------------- Variance Inflating Factor (VIF) ---------------
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
pop15 ddpi pop75
5.745478 1.004186 5.736014
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.