knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Here we will use the wine quality data (archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv) to present the breakDown package for lm models.

library("breakDown")
head(wine, 3)

Now let's create a liner model for quality.

model <- lm(quality ~ fixed.acidity + volatile.acidity + citric.acid + residual.sugar + chlorides + free.sulfur.dioxide + total.sulfur.dioxide + density + pH + sulphates + alcohol,
               data = wine)

The common goodness-of-fit parameteres for lm model are R^2, adjusted R^2, AIC or BIC coefficients.

summary(model)$r.squared
summary(model)$adj.r.squared
BIC(model)

They assess the overall quality of fit. But how to understand the factors that drive predictions for a single observation?

With the breakDown package!

library(breakDown)
library(ggplot2)

new_observation <- wine[1,]
br <- broken(model, new_observation)
br
# different roundings
print(br, digits = 2, rounding_function = signif)
print(br, digits = 6, rounding_function = round)
plot(br) + ggtitle("breakDown plot for predicted quality of a wine")

Use the baseline argument to set the origin of plots.

br <- broken(model, new_observation, baseline = "Intercept")
br
plot(br) + ggtitle("breakDown plot for predicted quality of a wine")

Works for interactions as well

model <- lm(quality ~ (alcohol + density  + residual.sugar)^2,
               data = wine)
new_observation <- wine[1,]

br <- broken(model, new_observation, baseline = "Intercept")
br
plot(br) + ggtitle("breakDown plot for predicted quality of a wine")


pbiecek/breakDown documentation built on March 15, 2024, 10:46 a.m.