Predicting Lego Set Price

knitr::opts_chunk$set(
  collapse = TRUE,
  out.width = '100%',
  fig.width = 6,
  fig.height = 4,
  comment = "#>"
)
library(brickset)
library(ggplot2)
library(dplyr)
theme_set(theme_minimal())

In this vignette we will show how the legosets dataset can be used to teach basic regression. This, of course, can be extended to other modeling techniques. To begin, we will filter the data frame to include sets from the last 10 years and remove any sets with missing values.

data(legosets)
last_year <- max(legosets$year)
legosets <- legosets |> 
    dplyr::select(year, US_retailPrice, pieces, minifigs, themeGroup) |>
    dplyr::filter(year %in% seq(last_year - 10, last_year)) |>
    na.omit()

Our goal is to predict US_retailPrice from pieces (number of Lego pieces in the set), minifigs (number of mini figures in the set), and themeGroup (the set theme).

lego_model <- US_retailPrice ~ pieces + minifigs

First, let's plot the data to see if there is a relationship between our dependent variable and indepedent variables.

ggplot(legosets, aes(x = pieces, y = US_retailPrice, size = minifigs, color = themeGroup)) +
    geom_point(alpha = 0.2)

The contingency table reveals that "Licensed" themes are the largest category. To help with interpreting our results we will convert the themeGroup variable to a factor and ensure that "Licensed" is our reference group.

table(legosets$themeGroup, useNA = 'ifany')
legosets$themeGroup <- as.factor(legosets$themeGroup)
legosets$themeGroup <- relevel(legosets$themeGroup, ref = 'Licensed')

Now we can run our linear regression.

lm_out <- lm(US_retailPrice ~ pieces + minifigs + themeGroup, data = legosets)
summary(lm_out)

The adjusted R-squared for our model is r round(100 * summary(lm_out)$adj.r.squared, digits = 1)%.

Finally, we can check that our residuals are normally distributed.

legosets$predicted <- predict(lm_out)
legosets$residuals <- resid(lm_out)
ggplot(legosets, aes(x = residuals)) + geom_histogram(binwidth = 10)


Try the brickset package in your browser

Any scripts or data that you put into this service are public.

brickset documentation built on Jan. 15, 2026, 1:06 a.m.