knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Here we will use the wine quality data (archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv) to present the breakDown package for lm
models.
library("breakDown") head(wine, 3)
Now let's create a liner model for quality
.
model <- lm(quality ~ fixed.acidity + volatile.acidity + citric.acid + residual.sugar + chlorides + free.sulfur.dioxide + total.sulfur.dioxide + density + pH + sulphates + alcohol, data = wine)
The common goodness-of-fit parameteres for lm model are R^2, adjusted R^2, AIC or BIC coefficients.
summary(model)$r.squared summary(model)$adj.r.squared BIC(model)
They assess the overall quality of fit. But how to understand the factors that drive predictions for a single observation?
With the breakDown
package!
library(breakDown) library(ggplot2) new_observation <- wine[1,] br <- broken(model, new_observation) br # different roundings print(br, digits = 2, rounding_function = signif) print(br, digits = 6, rounding_function = round) plot(br) + ggtitle("breakDown plot for predicted quality of a wine")
Use the baseline
argument to set the origin of plots.
br <- broken(model, new_observation, baseline = "Intercept") br plot(br) + ggtitle("breakDown plot for predicted quality of a wine")
Works for interactions as well
model <- lm(quality ~ (alcohol + density + residual.sugar)^2, data = wine) new_observation <- wine[1,] br <- broken(model, new_observation, baseline = "Intercept") br plot(br) + ggtitle("breakDown plot for predicted quality of a wine")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.