knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The LinearRegression package aims to provide almost every piece of information needed for linear regression, except diagnostics. The LR( ) function mimics the output from lm( ), summary(lm( )), confint( ), hatvalues(lm( )), rstandard( ), and rstudent( ). The ANOVA( ) function mimics the output from anova(lm( )) function for sequential sums of squares and output from car::Anova(lm( ), type="III") function for partial sums of squares.

library(LinearRegression)

Example of the using LR( ) function.

LR(mpg ~ cyl + wt, mtcars)
LR(mpg ~ cyl + wt, mtcars, include.intercept = FALSE)

You can either fit the linear regression model with or without the intercept. The LR( ) function does not explicitly return any result until you extract the desired output with $.

model = LR(mpg ~ cyl + wt + qsec, mtcars)
names(model)

The LR( ) function is very versatile, it contains lots of information. We can choose from above to obtain the desired output. Let's demonstrate via a few examples and compare the result with the original r function to see if they yields the same results.

model$coeff_summary
model.r = lm(mpg ~ cyl + wt + qsec, mtcars)
summary(model.r)$coefficients
all.equal(as.numeric(model$coeff_summary), as.numeric(summary(model.r)$coefficients))

Since we are only interested in whether LR( ) and lm( ) yield the same numeric results, I coerced the both outputs to "numeric" class. Through all.equal( ), we observed that the outputs are indeed equal. What about the its speed? We would next compare their speed via bench::mark( ).

library(bench)
plot(mark(as.numeric(model$coeff_summary), as.numeric(summary(model.r)$coefficients)))

Using this naive built-in data set mtcars, LR( ) function perform better than lm( ) function. One of the main reason why LR( ) outperforms the built-in r function is that there is no need to call for another function for coefficients summary.

Next, let's demonstrate two more example with LR( ) and compare it with the original r function.

(a) Usage of to.predict parameter

MyPredict = LR(cyl ~ mpg + wt, mtcars, 
               to.predict = matrix(c(mean(mtcars$mpg), 18, mean(mtcars$wt), 4), 2, 2))$predicted
print(MyPredict)
RPredict = predict(lm(cyl~mpg+wt,mtcars), newdata=data.frame(mpg=c(mean(mtcars$mpg),18), 
                                                             wt = c(mean(mtcars$wt),4)))
print(RPredict)
all.equal(as.numeric(MyPredict), as.numeric(RPredict))
plot(mark(as.numeric(MyPredict), as.numeric(RPredict)))

Both function yields the same predicted values;however, the built-in r function performs slightly better than LR( ) in terms of speed.

(b) Usage of include.intercept parameter

model2 = LR(mpg ~ cyl + wt + qsec, mtcars, include.intercept = FALSE)
model.r2 = lm(mpg ~ -1 + cyl + wt + qsec, mtcars)
model2$CI
confint(model.r2)
all.equal(as.numeric(model2$CI), as.numeric(confint(model.r2)))
plot(mark(as.numeric(model2$CI), as.numeric(confint(model.r2))))

Again, same numeric outputs from both functions. However, LR( ) outperforms the built-in r function because we do not need to call another function to compute the confidence interval of the fitted model.

Example of the using ANOVA( ) function.

We could choose either to obtain an ANOVA table for sequential sums of squares or partial sums of squares.

Note: ANOVA( ) automatically fit the model with an intercept.

(a) Example of obtaining sequential sums of squares.

ANOVA(mpg ~ cyl + wt + disp, mtcars, type = "Sequential")
anova(lm(mpg ~ cyl + wt + disp, mtcars))
MyNumeric = as.numeric(unlist(ANOVA(mpg ~ cyl + wt, mtcars, type = "Sequential")))
RNumeric = as.numeric(unlist(anova(lm(mpg ~ cyl + wt, mtcars))))
all.equal(MyNumeric, RNumeric)
plot(mark(as.numeric(unlist(ANOVA(mpg ~ cyl + wt, mtcars, type = "Sequential"))), as.numeric(unlist(anova(lm(mpg ~ cyl + wt, mtcars))))))

From above, we have shown that ANOVA( ) and built-in function anova( ) output the same numeric results via all.equal( ). ANOVA( ) actually outperforms anova( ) in terms of speed.

(b) Example of obtaining partial sums of squares.

require("car")
ANOVA(mpg ~ cyl + wt + disp, mtcars, type = "Partial")
Anova(lm(mpg ~ cyl + wt + disp, mtcars), type = "III")
MyNumeric = as.numeric(unlist(ANOVA(mpg ~ cyl + wt + disp, mtcars, type = "Partial")))
RNumeric = as.numeric(unlist(Anova(lm(mpg ~ cyl + wt + disp, mtcars), type = "III")))
all.equal(MyNumeric, RNumeric)
plot(mark(as.numeric(unlist(ANOVA(mpg ~ cyl + wt + disp, mtcars, type = "Partial"))), as.numeric(unlist(Anova(lm(mpg ~ cyl + wt + disp, mtcars), type = "III")))))

Again, we have shown that ANOVA( ) and original r function Anova( ) in cars package output the same numeric results via all.equal( ). ANOVA( ) is more efficient than anova( ) in terms of speed.

Tips: Since ANOVA( ) outputs a data.frame, we can indexing or selecting desired element with all means of data.frame such as ANOVA(mpg ~ cyl + wt + disp, mtcars, type = "Sequential")["Sum Sq"].



AChenAC/LinearRegression documentation built on Dec. 17, 2021, 6:41 a.m.