In ZimanJiang/mylm: an own package for linear regression in R

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(stats)
library(datasets)
library(zmjianglm)

Introduction

The main function mylm is trying to do the linear regression procedures including estimation and inference and get some important values. mylm combines lm and summary.lm in pacakge stats together. Users can choose to print or not to print the output and can also choose either a simple version or a complete version to print. The simple version is similar to the output of lm and the complete version is similar to the output of summary.lm. The function would return a list of elements which is almost the same as summary.lm no matter what printing style is chosen. And the function predictlm is written for get predicted value in a new data set based on the fitted model. The example data set is mtcars, which is included in package datasets. More information could be viewed by:

?mtcars

Usage

Usage of `mylm`

To use the function mylm:

mylm(mpg~wt,mtcars,"nothing")

To print out the simple version of output:

mylm(mpg~wt,mtcars,"simple")

To print out the complete version of output:

mylm(mpg~wt,mtcars,"summary")

Do multiple linear regression:

mylm(mpg~wt+drat,mtcars,"summary")

Add categorical covariate:

mylm(mpg~wt+drat+factor(cyl),mtcars,"summary")

Add interaction term:

mylm(mpg~wt+drat+I(wt*drat),mtcars,"summary")

Do transformation on covariate:

mylm(mpg~wt+drat+I(wt^2),mtcars,"summary")

Usage of `predictlm`

To use function predictlm

n <- nrow(mtcars)
train_n <- floor(n*0.7)
traindata <- mtcars[1:train_n, ]#the original data set
testdata <- mtcars[(train_n+1):n, ]#the new data set

mylm1 <- mylm(mpg~wt,traindata,"nothing")
predicted_value <- predictlm(mylm1,testdata)

Comparison

Correctness

The correctness is tested by testthat. All the tests are passed. Basically, I compared the result of summary.lm and mylm as well as the result of predict.lm and predictlm. The example of correctness testing is as the following:

#mylm
summary(lm(mpg~wt+drat,mtcars))
cat("--------","\n")
mylm(mpg~wt+drat,mtcars,"summary")
cat("--------","\n")
all.equal(unname(mylm(mpg~wt+drat,mtcars,"nothing")$coefficients[,1]),unname(lm(mpg~wt+drat,mtcars)$coefficients), tolerance = 1e-6)

#predictlm
n <- nrow(mtcars)
train_n <- floor(n*0.7)
traindata <- mtcars[1:train_n, ]#the original data set
testdata <- mtcars[(train_n+1):n, ]#the new data set
mylm1 <- mylm(mpg~wt,traindata,"nothing")
lm1 <- lm(mpg~wt,traindata)
all.equal(sum(predictlm(mylm1,testdata)-predict(lm1,testdata)), 0, tolerance = 1e-6)

Efficiency

I compare the efficiency between mylm and summary.lm. mylm has a lower efficiency.

system.time(for(i in 1:100)
  mylm(mpg~wt+drat,mtcars,"nothing"))
system.time(for(i in 1:100)
  summary(lm(mpg~wt+drat,mtcars)))

I also compare the efficiency between predictlm and predict.lm. The efficiency of predictlm is a bit lower than predict.lm.

n <- nrow(mtcars)
train_n <- floor(n*0.7)
traindata <- mtcars[1:train_n, ]#the original data set
testdata <- mtcars[(train_n+1):n, ]#the new data set
system.time(for(i in 1:100)
  predictlm(mylm(mpg~wt+drat,mtcars,"nothing"),testdata))
system.time(for(i in 1:100)
  predict(lm(mpg~wt+drat,traindata),testdata))