knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(GGally)
GGally::ggnostic()
ggnostic()
is a display wrapper to ggduo()
that displays full model diagnostics for each given explanatory variable. By default, ggduo()
displays the residuals, leave-one-out model sigma value, leverage points, and Cook's distance against each explanatory variable. The rows of the plot matrix can be expanded to include fitted values, standard error of the fitted values, standardized residuals, and any of the response variables. If the model is a linear model, stars are added according to the stats::anova
significance of each explanatory variable.
Most diagnostic plots contain reference line(s) to help determine if the model is fitting properly
".resid"
ggally_nostic_resid()
.stats::residuals
".std.resid"
ggally_nostic_std_resid()
.stats::rstandard
".sigma"
ggally_nostic_sigma()
.stats::influence
's value on sigma
".hat"
ggally_nostic_hat()
.stats::influence
's value on hat
".cooksd"
ggally_nostic_cooksd()
. See also stats::cooks.distance()
".fitted"
ggally_points()
.stats::predict
".se.fit"
ggally_nostic_se_fit()
.stats::fitted
ggally_points()
.Looking at the dataset datasets::state.x77
, we will fit a multiple regression model for Life Expectancy.
# make a data.frame and fix column names state <- as.data.frame(state.x77) colnames(state)[c(4, 6)] <- c("Life.Exp", "HS.Grad") str(state) # fit full model model <- lm(Life.Exp ~ ., data = state) # reduce to "best fit" model with model <- step(model, trace = FALSE) summary(model)
Next, we look at the variables for any high (|value| > 0.8) correlation values and general interaction behavior.
# look at variables for high correlation (none) ggscatmat(state, columns = c("Population", "Murder", "HS.Grad", "Frost"))
All variables appear to be ok. Next, we look at the model diagnostics.
# look at model diagnostics ggnostic(model)
Let's remove the largest data point first to try and define a better model.
# very high life expectancy state[11, ] state_no_hawaii <- state[-11, ] model_no_hawaii <- lm(Life.Exp ~ Population + Murder + HS.Grad + Frost, data = state_no_hawaii) ggnostic(model_no_hawaii)
There are no more outrageous Cook's distance values. The model without Hawaii appears to be a good fitting model.
summary(model) summary(model_no_hawaii)
Since there is only a marginal improvement by removing Hawaii, the original model should be used to explain life expectancy.
The following lines of code will display different model diagnostic plot matrices for the same statistical model. The first one is of the default settings. The second adds color according to the species
. Finally, the third displays all possible columns and uses ggally_smooth()
to display the fitted points and response variables.
flea_model <- step(lm(head ~ ., data = flea), trace = FALSE) summary(flea_model) # default output ggnostic(flea_model) # color'ed output ggnostic(flea_model, mapping = ggplot2::aes(color = species)) # full color'ed output ggnostic( flea_model, mapping = ggplot2::aes(color = species), columnsY = c("head", ".fitted", ".se.fit", ".resid", ".std.resid", ".hat", ".sigma", ".cooksd"), continuous = list(default = ggally_smooth, .fitted = ggally_smooth) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.