| cereal | R Documentation |
A multivariate dataset describing seventy-seven commonly available breakfast cereals, based on the information
now available on the FDA food label. The variable rating is a likely response variable in
statistical models.
data("cereal")
A data frame with 77 observations on the following 16 variables.
namecereal name, a character vector
mfrmanufacturer (A G K N P Q R), a character vector
typetype (cold/hot), a character vector
caloriescalories (number), a numeric vector
proteinprotein(g), a numeric vector
fatfat(g), a numeric vector
sodiumsodium(mg), a numeric vector
fiberdietary fiber(g), a numeric vector
carbocomplex carbohydrates(g), a numeric vector
sugarssugars(g), a numeric vector
potasspotassium(mg), a numeric vector
vitaminsvitamins & minerals (0, 25, or 100, respectively indicating "none added"; "enriched"; "FDA recommended"), a numeric vector
shelfdisplay shelf (1, 2, or 3, counting from the floor), a numeric vector
weightweight (in ounces) of one serving (serving size), a numeric vector
cupscups per serving, a numeric vector
ratinghealth rating of the cereal (unknown calculation method), a numeric vector
This dataset was used in the poster competition for the American Statistical association 1993 Statistical Graphics Exposition, titled Serial Correlation or Cereal Correlation ??.
The call for participation reads: "A multivariate dataset describing seventy-seven commonly available breakfast cereals, based on the information now available on the newly-mandated F&DA food label. What are you getting when you eat a bowl of cereal? Can you get a lot of fiber without a lot of calories? Can you describe what cereals are displayed on high, low, and middle shelves? The good news is that none of the cereals for which we collected data had any cholesterol, and manufacturers rarely use artificial sweeteners and colors, nowadays. However, there is still a lot of data for the consumer to understand while choosing a good breakfast cereal."
Further details on the variables and suggested analyses are available at https://community.amstat.org/jointscsg-section/dataexpo/dataexpo1993
The abbreviations for manufacturer, mfr, stand for:
AAmerican Home Food Products
GGeneral Mills
KKellog
NNabisco
PPost
QQuaker Oats
RRalston Purina
From the American Statistical Association 1993 Statistical Graphics Exposition, 'Serial Correlation or Cereal Correlation ??', https://community.amstat.org/jointscsg-section/dataexpo/dataexpo1993.
Jean Dos Santos, Breakfast Cereals: Data Analysis and Clustering, (Kaggle link doesn't work) Does a bunch of data cleaning and exploratory data analysis in R.
MASS::UScereal has a similar dataset with fewer observations and variables, but with the variables normalized to a portion of one US cup.
https://www.kaggle.com/datasets/crawford/80-cereals Essentially the same dataset
library(dplyr)
data(cereal)
str(cereal)
# Add explicit name of manufacturer
# names for manufacturers
mfr_names <- c(
"A" = "American Home Foods",
"G" = "General Mills",
"K" = "Kellog",
"N" = "Nabisco",
"P" = "Post",
"Q" = "Quaker Oats",
"R" = "Ralston Purina"
)
# recode `mfr` as `mfr_name`
cereal <- cereal |>
mutate(mfr_name = recode(mfr, !!!mfr_names))
# density plot of ratings
library(ggplot2)
ggplot(data = cereal,
aes(x = rating, fill = mfr_name, color = mfr_name)) +
geom_density(alpha = 0.1) +
theme_classic(base_size = 14) +
theme(legend.position = "bottom")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.