source('_format.R')
Chapter 10 of the text discusses fitting regression models. Thankfully, R's support for such models is very good. Again, we must be careful to set the contrasts of unordered factors to "contr.sum" as follows:
options(contrasts=c(unordered='contr.sum', ordered='contr.poly'))
The example investigates the relationship of reaction temperature and catalyst feed rate to the viscosity of a polymer. First, let's do the regression through matrix operations. Define the $X$ matrix using the model.matrix
command:
X <- model.matrix(~ReactionTemperature + CatalystFeedRate, data=Table10.2) print(X)
\noindent The response $y$ is just the Viscosity
column:
y <- Table10.2$Viscosity print(y)
\noindent Now we solve $(X^\mathrm{T} X)^{-1} X^\mathrm{T} y$. We can find the inverse $(X^\mathrm{T} X)^{-1}$ with solve(t(X) %*% X)
and then multiply by $X^\mathrm{T} y$, but it's numerically more stable to supply $X^\mathrm{T} y$ as the second argument to solve
:
beta.hat <- solve(t(X) %*% X, t(X) %*% y) print(beta.hat)
\noindent Now, to do this using lm
is far simpler:
model <- lm(Viscosity ~ ReactionTemperature + CatalystFeedRate, data=Table10.2) print(summary(model))
Table 10.3 gives many types of residuals and diagnostic measures, many of which we'll create below:
Table10.3 <- data.frame( 'y'=Table10.2$Viscosity, 'Predicted'=predict(model), 'Residual'=resid(model), 'hii'=hatvalues(model), 'StudentizedResids'=rstudent(model), 'CooksD'=cooks.distance(model) )
kable(Table10.3) %>% kable_styling()
Consider the viscosity data from Table 10.2. If we wish to test $H_0: \beta_2 = 0$ versus $H_1: \beta_2 \ne 0$ by partial $F$-test we can use the anova
function:
model01 <- lm(Viscosity ~ ReactionTemperature, data=Table10.2) model012 <- lm(Viscosity ~ ReactionTemperature + CatalystFeedRate, data=Table10.2) print(anova(model01, model012))
Parameter confidence intervals can be retrieved with the confint
function:
model <- lm(Viscosity ~ ReactionTemperature + CatalystFeedRate, data=Table10.2) confint(model)
\noindent We can also get other intervals. For example, the prediction interval for the response over ReactionTemperature
at CatalystFeedRate
set to 10.5 is given below:
x1 <- seq(from=80, to=100, length.out=100) x2 <- rep(10.5, 100) y <-predict(model, newdata=data.frame('ReactionTemperature'=x1, 'CatalystFeedRate'=x2), interval="confidence") plot(x1, y[,1], type='l', ylim=c(2248, 2434), main='Prediction Interval', xlab='Reaction Temperature', ylab='Viscosity') lines(x1, y[,2], lty=2) lines(x1, y[,3], lty=2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.