knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(appregr) library(pander)
modelname<-'prostate' mdf <- appregr::getmodel(modelname) df <- mdf$data lm.fit <- mdf$fit
high.leverage <- appregr::checkleverage(lm.fit,df) pander(high.leverage, caption = "High Leverage Data Elements")
We've used the rule of thumb that points with a leverage greater than $\frac{2 p }{n}$ should be looked at.
resutls<- appregr::checkoutliers(lm.fit = lm.fit,df = df) pander(resutls$residualsrange, caption="Range of Studentized residuals") pander(resutls$correctedtval, caption = "Bonferroni corrected t-value") if(nrow(resutls$outliers)>=1) { pander(resutls$outliers, caption = "outliers") }
Here we look for studentized residuals that fall outside the interval given by the Bonferroni corrected t-values.
We plot the Cook's distances and the residual-leverage plot with level set contours of the Cook distance.
library(ggplot2) plot(lm.fit,which =4) plot(lm.fit,which = 5)
library(ggplot2) predictors <-names(lm.fit$coefficients) predictors <- predictors[2:length(predictors)] for(i in 1:length(predictors)) { predictor <- predictors[i] gg<- ggplot(data.frame(x=df[,predictor], y=residuals(lm.fit)),aes(x=x,y=y)) +geom_point() + geom_smooth(method="loess",se=FALSE)+ labs(subtitle="residual plot", y="residuals", x=predictor, title=paste(predictor, " versus residuals", sep = ''), caption = modelname) plot(gg) }
Partial regression plots - also referred to as added variable plots - attempt to show the effect of adding a variable to a model that already has one or more independent variables. For univariate linear regression, a plot of $x \sim y$ gives a good idea of the relationship between the predictor and the response. It's hard to visualize the individual predictor response relationship for more than one or two predictors. Partial regression plots attemnt to do this.
Let the full model be $y =\beta_0 + \beta_1 x_1 + \ldots \beta_n x_n + \epsilon$ Then to form the partial regression for $x_i$ we regress $y$ on all but $x_i$ and calculate the residuals $\epsilon_{y \sim x_{[j \neq i]}}$ we plot those against the residuals of $x_i \sim x_{[j \neq i]}$ $\epsilon_{x_i \sim x_{[j \neq i]}}$.
There are some nice properties of $\epsilon_{y \sim x_{[j \neq i]}} \sim \epsilon_{x_i \sim x_{[j \neq i]}}$ The least quares fit gives a line with intercept zero and slope equal to the coefficient $\beta_i$ and the residuals are the same as the residuals in the full modes.
We can see many different types of violations of linear model assumptions in the partial regression plot like high leverage points, non-linearity, non-constant variance, etc.
prr <- appregr::partialregression(lm.fit = lm.fit,df=df) predictors <-names(prr) for(i in 1:length(predictors)) { predictor <- predictors[i] responseresiduals <-prr[[predictor]]$responseresiduals covariateresiduals<- prr[[predictor]]$covariateresiduals plot(covariateresiduals,responseresiduals,xlab=paste(predictor, " residuals",sep=''),ylab="response residuals",main = paste("Partial regression plot for " , predictor,sep='')) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.