R/anscombe_quartet.R

#' Anscombe's Quartet Data
#'
#' This dataset contains 44 observations, 11 observations from 4 datasets
#' generated by Francis Anscombe to demonstrate that statistical summary
#' measures alone cannot capture the full relationship between two variables
#' (here, `x` and `y`). Anscombe emphasized the importance of visualizing data
#' prior to calculating summary statistics.
#'
#'  * Dataset 1 has a linear relationship between `x` and `y`
#'  * Dataset 2 has shows a nonlinear relationship between `x` and `y`
#'  * Dataset 3 has a linear relationship between `x` and `y` with a single outlier
#'  * Dataset 4 has shows no relationship between `x` and `y` with a single
#'   outlier that serves as a high-leverage point.
#'
#' In each of the datasets the following statistical summaries hold:
#'  * mean of `x`: 9
#'  * variance of `x`: 11
#'  * mean of `y`: 7.5
#'  * variance of y: 4.125
#'  * correlation between `x` and `y`: 0.816
#'  * linear regression between `x` and `y`: `y = 3 + 0.5x`
#'  * \eqn{R^2} for the regression: 0.67
#'
#' @references Anscombe, F. J. (1973). "Graphs in Statistical Analysis".
#'   American Statistician. 27 (1): 17–21. doi:10.1080/00031305.1973.10478966.
#'   JSTOR 2682899.
#'
#' @format A dataframe with 44 rows and 3 variables:
#'
#' * `dataset`: the dataset the values come from
#' * `x`: the x-variable
#' * `y`: the y-variable
"anscombe_quartet"

Try the quartets package in your browser

Any scripts or data that you put into this service are public.

quartets documentation built on April 14, 2023, 12:25 a.m.