knitr::opts_chunk$set(echo = TRUE)
library(lmcoef)

lm_coef

Assume now we have a vector for the response variable, y, with length of n, and a n*m matrix for the independent variable x.

y = c(78.5, 74.3, 104.3, 87.6, 95.9, 109.2, 102.7, 72.5, 93.1, 115.9, 83.8, 113.3, 109.4)
x = cbind(X1 = c(7, 1, 11, 11, 7, 11, 3, 1, 2, 21, 1, 11, 10), 
                     X2 = c(26, 29, 56, 31, 52, 55, 71, 31, 54, 47, 40, 66, 68), 
                     X3 = c(6, 15, 8, 8, 6, 9, 17, 22, 18, 4, 23, 9, 8), 
                     X4 = c(60, 52, 20, 47, 33, 22, 6, 44, 22, 26, 34, 12, 12))
data = data.frame(cbind(y,x))

To use the function lm_coef:

rst = lm_coef(y,x)

To compare the result for the estimated coefficients of this new implement with the original lm function:

lm_est = summary(lm(y~., data))$coefficients
lm_coef_est = rst$coefficients

row.names(lm_est) <- NULL
colnames(lm_est) <- NULL

row.names(lm_coef_est) <- NULL
colnames(lm_coef_est) <- NULL

all.equal(lm_est, lm_coef_est)

Since 'anova' object must have colnames, To compare the result for the ANOVA table of this new implement with the original lm function, we just simply observe two outputs and they are the same:

lm_anova = anova(lm(y~., data))
lm_coef_anova = rst$anova

lm_anova
lm_coef_anova

To compare the result for the residuals of this new implement with the original lm function:

lm_res = as.vector(summary(lm(y~x))$residuals)
lm_coef_res = as.vector(rst$residuals)

all.equal(lm_res, lm_coef_res)

To compare the result for the residuals of this new implement with the original lm function:

lm_fit = as.vector(summary(lm(y~x))$fitted.values)
lm_coef_fit = as.vector(rst$fitted.values)

all.equal(lm_res, lm_coef_res)

We compare the speed of these two implements - the new implement is slower than the original function:

library(microbenchmark)
microbenchmark(summary(lm(y~x))$coefficients, lm_coef(y,x)$coefficients)
microbenchmark(anova(lm(y~x)), lm_coef(y,x)$anova)
microbenchmark(summary(lm(y~x))$residuals, lm_coef(y,x)$residuals)
microbenchmark(summary(lm(y~x))$fitted.values, lm_coef(y,x)$fitted.values)

cpr_coef

cpr_coef is a integrated function that could achieve a single-step estimation of constraint polynomial regression. Currently, no fucntion like this has been included in any R package. The conventional implementation was based on the lm function and needed prior calculation before finally using lm. This new implementation would save much time when user facing a condition as the following example.

As an example, assume now we know we would like to fit a polynomial regression with the power (p) of 3, and we have a group of outcome data y, that is:

p = 3
y = c(0.5 ,1 ,1.5)

For each y, if the following constraint is proposed for the given model:

for $i = 1,2,...,n$ and $\epsilon_i \sim N(0,\sigma^2)$,

$$Y_i = \beta_0 + \sum_{j=1}^p{\beta_j * (X_i) ^ j} + \epsilon_i$$

where coefficient $\beta_j = \beta_{j-1} / j$ for $j = 2,...,p$, and predictor $X_i = i / n$,

the function cpr_coef could be implemented:

rst = cpr_coef(p,y)

To compare the result for the estimated coefficients of this new implement with the original lm function:

# Using lm function
lm_res <- function(p, y){
  n = length(y)
  x = (1:n) / n
  x_final = x / p
  ## apply Horner's rule to compute polynomial ##
  if(p > 1){
    for(j in seq(p-1, 1, by = -1)){
      x_final = x_final + 1
      x_final = x / j * x_final
    }
  }
  b1 = summary(lm(y ~ x_final))$coefficients[2,1]
  b0 = summary(lm(y ~ x_final))$coefficients[1,1]
  beta_n = c(b1)
  for (k in 1:(p-1)){
    beta_n = append(beta_n, beta_n[k] / (k+1))
  }
  beta = c(b0, beta_n)
  return(beta)
}

lm_est = lm_res(p,y)
cpr_coef_est = as.vector(cpr_coef(p,y)$coefficients)

row.names(cpr_coef_est) <- NULL
colnames(cpr_coef_est) <- NULL

all.equal(lm_est, cpr_coef_est)

By comparing the speed of these two implements, we could observe that the cpr_coef function could save much time:

library(microbenchmark)
microbenchmark(lm_res(p,y), cpr_coef(p,y)$coefficients)


hzhang0311/lmcoef documentation built on Dec. 20, 2021, 5:55 p.m.