anscombe_quartet | R Documentation |
This dataset contains 44 observations, 11 observations from 4 datasets
generated by Francis Anscombe to demonstrate that statistical summary
measures alone cannot capture the full relationship between two variables
(here, x
and y
). Anscombe emphasized the importance of visualizing data
prior to calculating summary statistics.
anscombe_quartet
A dataframe with 44 rows and 3 variables:
dataset
: the dataset the values come from
x
: the x-variable
y
: the y-variable
Dataset 1 has a linear relationship between x
and y
Dataset 2 has shows a nonlinear relationship between x
and y
Dataset 3 has a linear relationship between x
and y
with a single outlier
Dataset 4 has shows no relationship between x
and y
with a single
outlier that serves as a high-leverage point.
In each of the datasets the following statistical summaries hold:
mean of x
: 9
variance of x
: 11
mean of y
: 7.5
variance of y: 4.125
correlation between x
and y
: 0.816
linear regression between x
and y
: y = 3 + 0.5x
R^2
for the regression: 0.67
Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician. 27 (1): 17–21. doi:10.1080/00031305.1973.10478966. JSTOR 2682899.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.