knitr::opts_chunk$set(echo = TRUE)
Multi-collinearity can be found in the following regression.
library(Intro2MLR) data(ftc2)
names(ftc2) fit1 <- lm(CO ~ TAR + NICOTINE + WEIGHT, data = ftc2) summary(fit1) car::vif(fit1) cor(ftc2[,1:3])
We should remove the X with most variance inflation. Note that NICOTINE is highly correlated with TAR. So we should remove NICOTINE or TAR -- we will choose to remove NICOTINE since it has the largest VIF. We will use the car
package to calculate the VIF
$$VIF_i = \frac{1}{1-R^2_i}$$
fit2 <- lm(CO ~ TAR + WEIGHT, data = ftc2) summary(fit2) car::vif(fit2)
Weight is shows very little evidence that it will impact predictions (see P value and T stats).
We will remove WEIGHT
fit3 <- lm(CO ~ TAR , data = ftc2) summary(fit3) anova(fit3,fit2)
Notice that the test produced by the anova
function produces evidence against:
$$H_0: \beta_2 = 0$$
where $\beta_2$ is the coefficient of WEIGHT
. You need to see that this is a situation where k-g=1
and hence the F test corresponds to a two tailed T test, where $F=T^2$.
sm1<-summary(fit1) sm2<-summary(fit2) sm3<-summary(fit3) sm2$coefficients[3,]
In this case $f_{calc} = t_{calc}^2 = r sm2$coefficients[3,3]^2
$ and the two pvalues are identical.
We will now trace the adjusted R squared.
sm1$adj.r.squared sm2$adj.r.squared sm3$adj.r.squared
The last model has the largest adjusted R squared.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.