knitr::opts_chunk$set(
  collapse = TRUE,
  fig.width=7,
  comment = "#>"
)
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(tidyverse)
library(ggbeeswarm)
library(bench)
library(nlme)
library(stats)
library(LinearModel)

Introduction

This is a linear regression model package.
$$Y = X\beta + \epsilon$$ where $\epsilon$ follows normal distribution $N(0,\sigma^2I_n)$. This package include three main function. linearFit() fit linear regression model over data. lmPredict() predict response value of new observations, based on the fitted model. GLHtest() conduct Genaralized Linear Hypothesis testing. We will use a simulated data to introduce these functions.

Examples

Simulate Data

Generate simulated data: suppose true linear model is $Y = X\beta + \epsilon$, where we let $\sigma^2 = 0.1$. Set constant $\beta = (\beta_0, \beta_1, \beta_2, \beta_3, \beta_4)$. We generate $n=1000$ observations with $p=4$ indicator variables, combined with intercept term.

  # simulate data
  beta = c(-1,0,1,2,1.5)
  n = 1e3; p = 4
  set.seed(100)
  X = matrix(runif(n*p, -100, 100),n,p)
  error = rnorm(n, 0, 0.1)
  Y = cbind(rep(1,n),X) %*% beta + error

Fit the models

Use both linearFit() and originial function lm() to fit the model on the simulated data.

lm1 = linearFit(X,Y)
lm2 = lm(Y~X)
# Print coefficients from both models and the true value.
print(c(lm1$coefficients))
print(lm2$coefficients)
print(beta) # this is the true 

From the result, we know the estimators is pretty precise. Single variable testing results are saved in lm$test, the 95% confidence interval of each $\hat\beta_i$ is stored in lm1$CI $\hat var(\hat\beta)$ is saved in lm1$cov_beta, $\hat\sigma^2$ is svaed in lm$MSE.

print(lm1$test)
print(c(lm1$MSE, lm1$df.res, lm1$R_square))

Predict with models

Randomly generate a new observation, predict its Y-value

  mylm = linearFit(X,Y)
  newX = matrix(runif(p,-100,100),ncol=p)
  lmPredict(mylm,newX)

GLH testing

Test $H_0: \beta_2 = \beta_3 = \beta_4$.

# contrast matrix 
T = matrix(c(0,0,1,-1,0,
             0,0,0,1,-1), nrow=2, byrow = TRUE)
GLHtest(lm1, T)

Correctness and Efficiency

Generate a $1e4 * 30$ indicator variable matrix, we compare it with original lm function as regard to correctness and efficiency.

  n = 1e4; p = 30
  beta = runif(p+1, -5, 5)
  X = matrix(rnorm(n*p, 10, 0.1),n,p)
  error = rnorm(n, 0, 2)
  Y = cbind(rep(1,n),X) %*% beta + error
  compare = bench::mark( as.vector(lm(Y~X)$coefficients), as.vector(linearFit(X,Y)$coefficients) )
  print(compare)
  plot(compare)

My implementation fail to use rcpp to optimize, and is nearly 10 times slower than lm function.



xietian99/LinearModel documentation built on Nov. 27, 2019, 12:31 a.m.