abe: Augmented Backward Elimination

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/abe.R

Description

Function "abe" performs Augmented backward elimination where variable selection is based on the change-in-estimate and significance or information criteria. It can also make a backward-selection based on significance or information criteria only by turning off the change-in-estimate criterion.

Usage

1
2
3
abe(fit, data = NULL, include = NULL, active = NULL, tau = 0.05,
  exp.beta = TRUE, exact = FALSE, criterion = "alpha", alpha = 0.2,
  type.test = "Chisq", type.factor = NULL, verbose = T)

Arguments

fit

An object of a class "lm", "glm" or "coxph" representing the fit. Note, the functions should be fitted with argument x=TRUE and y=TRUE.

data

data frame used when fitting the object fit.

include

a vector containing the names of variables that will be included in the final model. These variables are used as only passive variables during modeling. These variables might be exposure variables of interest or known confounders. They will never be dropped from the working model in the selection process, but they will be used passively in evaluating change-in-estimate criteria of other variables. Note, variables which are not specified as include or active in the model fit are assumed to be active and passive variables.

active

a vector containing the names of active variables. These less important explanatory variables will only be used as active, but not as passive variables when evaluating the change-in-estimate criterion.

tau

Value that specifies the threshold of the relative change-in-estimate criterion. Default is set to 0.05.

exp.beta

Logical specifying if exponent is used in formula to standardize the criterion. Default is set to TRUE.

exact

Logical, specifies if the method will use exact change-in-estimate or its approximation. Default is set to FALSE, which means that the method will use approximation proposed by Dunkler et al. Note, setting to TRUE can severely slow down the algorithm, but setting to FALSE can in some cases lead to a poor approximation of the change-in-estimate criterion.

criterion

String that specifies the strategy to select variables for the black list. Currently supported options are significance level 'alpha', Akaike information criterion 'AIC' and Bayesian information criterion 'BIC'. If you are using significance level, in that case you have to specify the value of 'alpha' (see parameter alpha) and type of the test statistic (see parameter type.test). Default is set to "alpha".

alpha

Value that specifies the level of significance as explained above. Default is set to 0.2.

type.test

String that specifies which test should be performed in case the criterion = "alpha". Possible values are "F" and "Chisq" (default) for class "lm", "Rao", "LRT", "Chisq" (default), "F" for class "glm" and "Chisq" for class "coxph". See also drop1.

type.factor

String that specifies how to treat factors, see details, possible values are "factor" and "individual".

verbose

Logical that specifies if the variable selection process should be printed. Note: this can severely slow down the algorithm.

Details

Using the default settings ABE will perform augmented backward elimination based on significance. The level of significance will be set to 0.2. All variables will be treated as "passive or active". Approximated change-in-estimate will be used. Threshold of the relative change-in-estimate criterion will be 0.05. Setting tau to a very large number (e.g. Inf) turns off the change-in-estimate criterion, and ABE will only perform backward elimination. Specifying "alpha" = 0 will include variables only because of the change-in-estimate criterion, as then variables are not safe from exclusion because of their p-values. Specifying "alpha" = 1 will always include all variables.

When using type.factor="individual" each dummy variable of a factor is treated as an individual explanatory variable, hence only this dummy variable can be removed from the model (warning: use sensible coding for the reference group). Using type.factor="factor" will look at the significance of removing all dummy variables of the factor and can drop the entire variable from the model.

Value

An object of class "lm", "glm" or "coxph" representing the model chosen by abe method.

Author(s)

Rok Blagus, rok.blagus@mf.uni-lj.si

Sladana Babic

References

Daniela Dunkler, Max Plischke, Karen Lefondre, and Georg Heinze. Augmented backward elimination: a pragmatic and purposeful way to develop statistical models. PloS one, 9(11):e113677, 2014.

See Also

abe.boot, lm, glm and coxph

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# simulate some data:

set.seed(1)
n=100
x1<-runif(n)
x2<-runif(n)
x3<-runif(n)
y<--5+5*x1+5*x2+ rnorm(n,sd=5)
dd<-data.frame(y,x1,x2,x3)

# fit a simple model containing only numeric covariates
fit<-lm(y~x1+x2+x3,x=TRUE,y=TRUE,data=dd)

# perform ABE with "x1" as only passive and "x2" as only active
# using the exact change in the estimate of 5% and significance
# using 0.2 as a threshold
abe.fit<-abe(fit,data=dd,include="x1",active="x2",
tau=0.05,exp.beta=FALSE,exact=TRUE,criterion="alpha",alpha=0.2,
type.test="Chisq",verbose=TRUE)

summary(abe.fit)

# similar example, but turn off the change-in-estimate and perform
# only backward elimination

abe.fit<-abe(fit,data=dd,include="x1",active="x2",
tau=Inf,exp.beta=FALSE,exact=TRUE,criterion="alpha",alpha=0.2,
type.test="Chisq",verbose=TRUE)

summary(abe.fit)

# an example with the model containing categorical covariates:
dd$x3<-rbinom(n,size=3,prob=1/3)
dd$y1<--5+5*x1+5*x2+ rnorm(n,sd=5)
fit<-lm(y1~x1+x2+factor(x3),x=TRUE,y=TRUE,data=dd)

# treat "x3" as a single covariate:

abe.fit.fact<-abe(fit,data=dd,include="x1",active="x2",
tau=0.05,exp.beta=FALSE,exact=TRUE,criterion="alpha",alpha=0.2,
type.test="Chisq",verbose=TRUE,type.factor="factor")

summary(abe.fit.fact)

# treat each dummy of "x3" as a separate covariate:

abe.fit.ind<-abe(fit,data=dd,include="x1",active="x2",
tau=0.05,exp.beta=FALSE,exact=TRUE,criterion="alpha",alpha=0.2,
type.test="Chisq",verbose=TRUE,type.factor="individual")

summary(abe.fit.ind)

Example output

Model under investigation:
lm(formula = y ~ x1 + x2 + x3, data = dd, x = TRUE, y = TRUE)
Criterion for non-passive variables: x2 : 0.1107 , x3 : 0.9205
   black list:  x3 : 0.9205 
           Investigating change in b or exp(b) due to omitting variable  x3  ;  x1 : 0.0014 


Model under investigation:
lm(formula = y ~ x1 + x2, data = dd, x = TRUE, y = TRUE)
Criterion for non-passive variables: x2 : 0.1106
black list: empty 


Final model:
lm(formula = y ~ x1 + x2, data = dd, x = TRUE, y = TRUE)



Call:
lm(formula = y ~ x1 + x2, data = dd, x = TRUE, y = TRUE)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.0130  -3.4283  -0.4639   3.4122  12.7056 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -3.434      1.519  -2.261   0.0260 *
x1             4.116      1.978   2.081   0.0401 *
x2             3.079      1.947   1.581   0.1170  
---
Signif. codes:  0***0.001**0.01*0.05.’ 0.1 ‘ ’ 1

Residual standard error: 5.265 on 97 degrees of freedom
Multiple R-squared:  0.06682,	Adjusted R-squared:  0.04758 
F-statistic: 3.473 on 2 and 97 DF,  p-value: 0.03494



Model under investigation:
lm(formula = y ~ x1 + x2 + x3, data = dd, x = TRUE, y = TRUE)
Criterion for non-passive variables: x2 : 0.1107 , x3 : 0.9205
   black list:  x3 : 0.9205 
           Investigating change in b or exp(b) due to omitting variable  x3  ;  x1 : 0.0014 


Model under investigation:
lm(formula = y ~ x1 + x2, data = dd, x = TRUE, y = TRUE)
Criterion for non-passive variables: x2 : 0.1106
black list: empty 


Final model:
lm(formula = y ~ x1 + x2, data = dd, x = TRUE, y = TRUE)



Call:
lm(formula = y ~ x1 + x2, data = dd, x = TRUE, y = TRUE)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.0130  -3.4283  -0.4639   3.4122  12.7056 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -3.434      1.519  -2.261   0.0260 *
x1             4.116      1.978   2.081   0.0401 *
x2             3.079      1.947   1.581   0.1170  
---
Signif. codes:  0***0.001**0.01*0.05.’ 0.1 ‘ ’ 1

Residual standard error: 5.265 on 97 degrees of freedom
Multiple R-squared:  0.06682,	Adjusted R-squared:  0.04758 
F-statistic: 3.473 on 2 and 97 DF,  p-value: 0.03494



Model under investigation:
lm(formula = y1 ~ x1 + x2 + factor(x3), data = dd, x = TRUE, 
    y = TRUE)
Criterion for non-passive variables: x2 : 0.0019 , factor(x3) : 0.0335
black list: empty 


Final model:
lm(formula = y1 ~ x1 + x2 + factor(x3), data = dd, x = TRUE, 
    y = TRUE)



Call:
lm(formula = y1 ~ x1 + x2 + factor(x3), data = dd, x = TRUE, 
    y = TRUE)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.2619  -2.9392  -0.1022   3.2385   9.0229 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -6.231      1.611  -3.867 0.000203 ***
x1             3.406      1.849   1.842 0.068601 .  
x2             5.594      1.814   3.084 0.002682 ** 
factor(x3)1    2.817      1.247   2.260 0.026150 *  
factor(x3)2    1.397      1.474   0.948 0.345714    
factor(x3)3    6.886      3.009   2.289 0.024339 *  
---
Signif. codes:  0***0.001**0.01*0.05.’ 0.1 ‘ ’ 1

Residual standard error: 4.86 on 94 degrees of freedom
Multiple R-squared:  0.1907,	Adjusted R-squared:  0.1476 
F-statistic: 4.429 on 5 and 94 DF,  p-value: 0.001149



Model under investigation:
y1 ~ x1 + x2 + x3.1 + x3.2 + x3.3
<environment: 0x55a4dbeeecb8>
Criterion for non-passive variables: x2 : 0.0019 , x3.1 : 0.0215 , x3.2 : 0.3295 , x3.3 : 0.0199
   black list:  x3.2 : 0.3295 
           Investigating change in b or exp(b) due to omitting variable  x3.2  ;  x1 : 0.0095, x3.1 : 0.0675, x3.3 : 0.0223 


Final model:
y1 ~ x1 + x2 + x3.1 + x3.2 + x3.3
<environment: 0x55a4dbeeecb8>



Call:
lm(formula = y1 ~ x1 + x2 + x3.1 + x3.2 + x3.3, data = df, x = TRUE, 
    y = TRUE)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.2619  -2.9392  -0.1022   3.2385   9.0229 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -6.231      1.611  -3.867 0.000203 ***
x1             3.406      1.849   1.842 0.068601 .  
x2             5.594      1.814   3.084 0.002682 ** 
x3.1           2.817      1.247   2.260 0.026150 *  
x3.2           1.397      1.474   0.948 0.345714    
x3.3           6.886      3.009   2.289 0.024339 *  
---
Signif. codes:  0***0.001**0.01*0.05.’ 0.1 ‘ ’ 1

Residual standard error: 4.86 on 94 degrees of freedom
Multiple R-squared:  0.1907,	Adjusted R-squared:  0.1476 
F-statistic: 4.429 on 5 and 94 DF,  p-value: 0.001149

abe documentation built on May 2, 2019, 6:49 a.m.

Related to abe in abe...