knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
knitr::opts_chunk$set(fig.width=8, fig.height=5)
library(regSel)

First we will examine the lin_model function of the regSel package. This will be compared to the 'lm' function in R and the 'ols_regress' function of the olsrr package.

We will start by generating random data:

set.seed(123)
x = rnorm(100)
x2 = rnorm(100)
x3 = rnorm(100)
y = rnorm(100)
df = data.frame(y, x, x2, x3)

The help page for the 'lin_model' function can be viewed as follows:

help(lin_model)

We will now run each model technique with and compare the output of the three models. We expect all return values to be true.

#Here is basic useage of the lin_model function
mod_regSel = lin_model(y~x*x2+x3, data = df)

#This will be compared with the results of lm and ols_regress
mod_lm = lm(y~x*x2+x3, data = df)
mod_olsrr = ols_regress(y~x*x2+x3, data = df)

# We will first compare the beta estimates of the three models
all.equal(mod_regSel$betas, mod_lm$coefficients)
all.equal(mod_regSel$betas, mod_olsrr$betas)

# We will also compare the standard errors of the betas and the test statistics output by olsrr
all.equal(mod_regSel$se_beta, mod_olsrr$std_errors)
all.equal(mod_regSel$test_statistic, mod_olsrr$tvalues)

# Finally we will compare the residuals of the three models
all.equal(mod_regSel$res, unname(mod_olsrr$model$residuals))
all.equal(mod_regSel$res, unname(mod_lm$residuals))

We will now compare the computing speed of the three functions.

library(bench)
comparison_betas = bench::mark(lin_model(y~x*x2+x3, data = df)$betas, 
                               lm(y~x*x2+x3, data = df)$coefficients,
                               ols_regress(y~x*x2+x3, data = df)$betas)

comparison_betas
plot(comparison_betas)

As you can see, the 'lin_mode'l function is close to the speed of the 'lm' function and substantially faster than the function from 'olsrr'

The 'lin_model' function is also capable of handling weights (input as a vector), similar to the 'lm' function. The 'olsrr' package does not have this capability.

#Here is an example of basic useage of the lin_model function with weights
wmod_regSel = lin_model(y~x*x2+x3, data = df, weights = 1:100)
wmod_lm = lm(y~x*x2+x3, data = df, weights = 1:100)

# We will compare the beta estimates of the two models
all.equal(wmod_regSel$betas, wmod_lm$coefficients)

We can compare the speeds of the two functions using weights:

comparison_wbetas = bench::mark(lin_model(y~x*x2+x3, data = df, weights = 1:100)$betas, 
                               lm(y~x*x2+x3, data = df, weights = 1:100)$coefficients)
comparison_wbetas
plot(comparison_wbetas)

Again, the 'lin_model' function is only slightly slower than the 'lm' function.

The 'regSel' package is also capable of performing backwards selection using the 'back_select' function. This is similar to the 'ols_step_backward_p' function in 'olsrr'.

The help page for the 'back_select' function can be viewed as follows:

help(back_select)

We will first compare the outputs of the two functions. Note that 'back_select' takes a model formula as input while 'ols_step_backward_p' is given a 'lm' model. Also note that 'ols_step_backward_p' cannot handle removing every variable from a model. When this occurs, an error will be returned by this function.

Therefore, we will use the 'rivers' dataset from the 'olsrr' package to produce a model still containing some covariates.

head(rivers)
mod = lm(Nitrogen~Agr*Forest,data = rivers)

#Here is an example of the basic useage of the back_select function

back_regSel = back_select(Nitrogen~Agr*Forest, rivers, prem = .1)

#It will be compared with the output of the olsrr backwards selection function
back_olsrr = ols_step_backward_p(mod, prem = .1)
#Then we will compare the equality of the coefficients and residuals and the removed covariates
all.equal(back_regSel$betas, back_olsrr$model$coefficients)
all.equal(back_regSel$res, unname(back_olsrr$model$residuals))
all.equal(back_regSel$removed_vars, rev(back_olsrr$removed))

Finally we will compare the efficiency of the two techniques.

select_comparison = bench::mark(back_select(Nitrogen~Agr*Forest, rivers, prem = .1)$betas, ols_step_backward_p(mod, prem = .1)$model$coefficients)
plot(select_comparison)

As you can see, the 'back_select' function is substantially faster than the function from the 'olsrr' package and does not have the extensive printout from each run.

This is the end of the tutorial for the 'regSel' package. Hopefully you now have a better understanding of the efficiency and application of the 'lin_model' and 'back_select' functions.



EvanWie/regSel documentation built on Nov. 26, 2019, 2:11 a.m.