My.stepwise.glm: Stepwise Variable Selection Procedure for Generalized Linear...

Description Usage Arguments Details Value Warning See Also Examples

View source: R/My.stepwise.r

Description

This stepwise variable selection procedure (with iterations between the 'forward' and 'backward' steps) can be applied to obtain the best candidate final generalized linear model.

Usage

1
2
My.stepwise.glm(Y, variable.list, in.variable = "NULL", data, sle = 0.15,
  sls = 0.15, myfamily, myoffset = "NULL")

Arguments

Y

The response variable.

variable.list

A list of covariates to be selected.

in.variable

A list of covariate(s) to be always included in the regression model.

data

The data to be analyzed.

sle

The chosen significance level for entry (SLE).

sls

The chosen significance level for stay (SLS).

myfamily

The 'family' for the sepcified generalized linear model as in glm().

myoffset

The 'offset' for the sepcified generalized linear model as in glm().

Details

The goal of regression analysis is to find one or a few parsimonious regression models that fit the observed data well for effect estimation and/or outcome prediction. To ensure a good quality of analysis, the model-fitting techniques for (1) variable selection, (2) goodness-of-fit assessment, and (3) regression diagnostics and remedies should be used in regression analysis. The stepwise variable selection procedure (with iterations between the 'forward' and 'backward' steps) is one of the best ways to obtaining the best candidate final regression model. All the bivariate significant and non-significant relevant covariates and some of their interaction terms (or moderators) are put on the 'variable list' to be selected. The significance levels for entry (SLE) and for stay (SLS) are suggested to be set at 0.15 or larger for being conservative. Then, with the aid of substantive knowledge, the best candidate final regression model is identified manually by dropping the covariates with p value > 0.05 one at a time until all regression coefficients are significantly different from 0 at the chosen alpha level of 0.05. Since the statistical testing at each step of the stepwise variable selection procedure is conditioning on the other covariates in the regression model, the multiple testing problem is not of concern. Any discrepancy between the results of bivariate analysis and regression analysis is likely due to the confounding effects of uncontrolled covariates in bivariate analysis or the masking effects of intermediate variables (or mediators) in regression analysis.

Value

A model object representing the identified "Stepwise Final Model" with the values of variance inflating factor (VIF) for all included covarites is displayed.

Warning

The value of variance inflating factor (VIF) is bigger than 10 in continuous covariates or VIF is bigger than 2.5 in categorical covariates indicate the occurrence of multicollinearity problem among some of the covariates in the fitted regression model.

See Also

My.stepwise.lm

My.stepwise.coxph

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
data("iris")
names(iris)
my.data <- iris[51:150, ]
my.data$Width <- (my.data$Sepal.Width + my.data$Petal.Width)/2
names(my.data)
dim(my.data)
my.data$Species1 <- ifelse(my.data$Species == "virginica", 1, 0)
my.variable.list <- c("Sepal.Length", "Petal.Length")
My.stepwise.glm(Y = "Species1", variable.list = my.variable.list,
    in.variable = c("Width"), data = my.data, myfamily = "binomial")

my.variable.list <- c("Sepal.Length", "Sepal.Width", "Width")
My.stepwise.glm(Y = "Species1", variable.list = my.variable.list,
    data = my.data, sle = 0.25, sls = 0.25, myfamily = "binomial")

Example output

[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
[6] "Width"       
[1] 100   6
# --------------------------------------------------------------------------------------------------
# Initial Model:

Call:
glm(formula = as.formula(paste(Y, paste(in.variable, collapse = "+"), 
    sep = "~")), family = binomial(logit), data = data)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.00174  -0.67631  -0.00153   0.56553   2.60904  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -18.250      3.699  -4.934 8.04e-07 ***
Width          8.044      1.627   4.943 7.70e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.629  on 99  degrees of freedom
Residual deviance:  77.335  on 98  degrees of freedom
AIC: 81.335

Number of Fisher Scoring iterations: 6

# -------------------------------------------------------------------------------------------------- 
### iter num = 1, Forward Selection by LR Test: + Petal.Length 

Call:
glm(formula = Species1 ~ Width + Petal.Length, family = binomial(logit), 
    data = data)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.96297  -0.10813  -0.00005   0.07046   2.64996  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -46.543     12.380  -3.759 0.000170 ***
Width           2.137      2.487   0.859 0.390172    
Petal.Length    8.572      2.387   3.592 0.000329 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.629  on 99  degrees of freedom
Residual deviance:  32.677  on 97  degrees of freedom
AIC: 38.677

Number of Fisher Scoring iterations: 8

--------------- Variance Inflating Factor (VIF) --------------- 
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
       Width Petal.Length 
    1.010299     1.010299 
# -------------------------------------------------------------------------------------------------- 
### iter num = 2, Forward Selection by LR Test: + Sepal.Length 

Call:
glm(formula = Species1 ~ Width + Petal.Length + Sepal.Length, 
    family = binomial(logit), data = data)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.61739  -0.03752   0.00001   0.01995   1.53322  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)   
(Intercept)   -43.103     14.434  -2.986  0.00282 **
Width           3.697      2.956   1.251  0.21096   
Petal.Length   12.847      4.014   3.201  0.00137 **
Sepal.Length   -4.495      1.791  -2.511  0.01206 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.629  on 99  degrees of freedom
Residual deviance:  22.205  on 96  degrees of freedom
AIC: 30.205

Number of Fisher Scoring iterations: 9

--------------- Variance Inflating Factor (VIF) --------------- 
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
       Width Petal.Length Sepal.Length 
    1.188266     2.294930     2.564059 
# ================================================================================================== 
*** Stepwise Final Model (in.lr.test: sle = 0.15; out.lr.test: sls = 0.15): 

Call:
glm(formula = Species1 ~ Width + Petal.Length + Sepal.Length, 
    family = binomial(logit), data = data)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.61739  -0.03752   0.00001   0.01995   1.53322  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)   
(Intercept)   -43.103     14.434  -2.986  0.00282 **
Width           3.697      2.956   1.251  0.21096   
Petal.Length   12.847      4.014   3.201  0.00137 **
Sepal.Length   -4.495      1.791  -2.511  0.01206 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.629  on 99  degrees of freedom
Residual deviance:  22.205  on 96  degrees of freedom
AIC: 30.205

Number of Fisher Scoring iterations: 9

--------------- Variance Inflating Factor (VIF) --------------- 
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
       Width Petal.Length Sepal.Length 
    1.188266     2.294930     2.564059 
# --------------------------------------------------------------------------------------------------
# Initial Model:

Call:
glm(formula = as.formula(paste(Y, paste(in.variable, collapse = "+"), 
    sep = "~")), family = binomial(logit), data = data)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.177  -1.177   0.000   1.177   1.177  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)      0.0        0.2       0        1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.63  on 99  degrees of freedom
Residual deviance: 138.63  on 99  degrees of freedom
AIC: 140.63

Number of Fisher Scoring iterations: 2

# -------------------------------------------------------------------------------------------------- 
### iter num = 1, Forward Selection by LR Test: + Width 

Call:
glm(formula = Species1 ~ Width, family = binomial(logit), data = data)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.00174  -0.67631  -0.00153   0.56553   2.60904  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -18.250      3.699  -4.934 8.04e-07 ***
Width          8.044      1.627   4.943 7.70e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.629  on 99  degrees of freedom
Residual deviance:  77.335  on 98  degrees of freedom
AIC: 81.335

Number of Fisher Scoring iterations: 6

--------------- Variance Inflating Factor (VIF) --------------- 
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
# -------------------------------------------------------------------------------------------------- 
### iter num = 2, Forward Selection by LR Test: + Sepal.Width 

Call:
glm(formula = Species1 ~ Width + Sepal.Width, family = binomial(logit), 
    data = data)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.79060  -0.12811  -0.00554   0.08197   2.29438  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -14.379      5.097  -2.821  0.00478 ** 
Width         31.401      7.323   4.288 1.80e-05 ***
Sepal.Width  -19.607      4.882  -4.016 5.92e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.629  on 99  degrees of freedom
Residual deviance:  27.399  on 97  degrees of freedom
AIC: 33.399

Number of Fisher Scoring iterations: 7

--------------- Variance Inflating Factor (VIF) --------------- 
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
      Width Sepal.Width 
   11.72704    11.72704 
# -------------------------------------------------------------------------------------------------- 
### iter num = 3, Forward Selection by LR Test: + Sepal.Length 

Call:
glm(formula = Species1 ~ Width + Sepal.Width + Sepal.Length, 
    family = binomial(logit), data = data)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.62754  -0.12171  -0.00435   0.06825   2.32596  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -20.287      8.055  -2.519 0.011785 *  
Width          31.845      7.961   4.000 6.33e-05 ***
Sepal.Width   -20.746      5.469  -3.794 0.000149 ***
Sepal.Length    1.295      1.089   1.189 0.234357    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.629  on 99  degrees of freedom
Residual deviance:  25.902  on 96  degrees of freedom
AIC: 33.902

Number of Fisher Scoring iterations: 8

--------------- Variance Inflating Factor (VIF) --------------- 
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
       Width  Sepal.Width Sepal.Length 
   11.260830    12.139373     1.274881 
# ================================================================================================== 
*** Stepwise Final Model (in.lr.test: sle = 0.25; out.lr.test: sls = 0.25): 

Call:
glm(formula = Species1 ~ Width + Sepal.Width + Sepal.Length, 
    family = binomial(logit), data = data)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.62754  -0.12171  -0.00435   0.06825   2.32596  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -20.287      8.055  -2.519 0.011785 *  
Width          31.845      7.961   4.000 6.33e-05 ***
Sepal.Width   -20.746      5.469  -3.794 0.000149 ***
Sepal.Length    1.295      1.089   1.189 0.234357    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.629  on 99  degrees of freedom
Residual deviance:  25.902  on 96  degrees of freedom
AIC: 33.902

Number of Fisher Scoring iterations: 8

--------------- Variance Inflating Factor (VIF) --------------- 
Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
       Width  Sepal.Width Sepal.Length 
   11.260830    12.139373     1.274881 

My.stepwise documentation built on May 2, 2019, 4:03 p.m.