knitr::opts_chunk$set(fig.height=3)
## Do not delete this! ## It loads the s20x library for you. If you delete it ## your document may not compile library(s20x)
For this question, we are getting historic. In 1886, Francis Galton presented a data set on a sample of 928 adult British children from 197 sets of parents. For each child, he had recorded their adult height and the average of their parent’s heights. He then analysed the relationship between their heights.
However, for this question, we are just interested in a simpler question. How do heights of people in Britian in 1886 compare to heights of people now? We will use the sample of children's adults heights to answer this. In particular, we wish to see if the average height in 1886 in Britain is different from the average height of 70 inches, which is today's estimated average adult height in Britain.
The data on the children's heights from Galton's 1886 dataset is in the file Galton3.csv, which contains the variable:
Variable | Description ----------|--------------------------------------- Height | the adult height (inches) of the child
Instructions:
We are interested in seeing if the average height if these British children (when they were adults) is different from the average height of 70 inches which is today's estimated average adult height.
load(system.file("extdata", "Galton.df.rda", package = "s20x"))
Galton.df=read.csv("Galton3.csv", header=T) hist(Galton.df$Height) summary(Galton.df$Height)
hist(Galton.df$Height) summary(Galton.df$Height)
The heights appear to be centred around 67 and reasonably symmetric (and looking roughly normal).
Formulas: $T = \frac{\bar{y}-\mu_0}{se(\bar{y})}$ and 95\% confidence interval $\bar{y} \pm t_{df, 0.975} \times se(\bar{y})$
NOTES: The R code mean(y) calculates $\bar{y}$. The standard error is $se(\bar{y}) = \frac{s}{\sqrt{n}}$ where $s$ is the standard deviation of $y$ and is calculated by sd(y), and $n$ is the number of data-points calculated by length(y). The degrees of freedom is $df = n-1$. The $t_{df,0.975}$ multiplier is given by the R code qt(0.975, df).
ybar = mean(Galton.df$Height) n = length(Galton.df$Height) se.ybar = sd(Galton.df$Height)/sqrt(n) # t-statistic for H0: mu=70 : (ybar - 70) / se.ybar # 95% confidence interval for the mean: ybar - qt(0.975, n-1) * se.ybar ybar + qt(0.975, n-1) * se.ybar ybar + c(-1, 1) * qt(0.975, n-1) * se.ybar
t.test(Galton.df$Height, mu=70)
Note: You should get exactly the same results from the manual calculations and using the $t.test$ function. Doing this was to give you practice using some R code. The $t.test$ function also delivers the p-value that we did not calculate above.
Galton.fit=lm(Height~1,data=Galton.df) normcheck(Galton.fit) cooks20x(Galton.fit) summary(Galton.fit); confint(Galton.fit) 70-confint(Galton.fit)
cf1 = as.data.frame(confint(Galton.fit)) resultConf1 = paste0(sprintf("%.1f", cf1$`2.5 %`), " and ", sprintf("%.1f", cf1$`97.5 %`))
cf2 = as.data.frame(70-confint(Galton.fit)) resultConf2 = paste0(sprintf("%.1f", cf2$`97.5 %`), " and ", sprintf("%.1f", cf2$`2.5 %`))
Having multiple children from the same family would have violated the independence assumption (and required a more complicated form of analysis).
As this data consists of one measurement (the child's height as as an adult) we have applied a one sample t-test to it, equivalent to an intercept only linear model (null model).
We have a random sample of 197 children (who were measured when adult), and we wished to see if their average height is the same as the current average height of people which is 70 inches. The child's height should be independent of each other. Checking the normality of the differences reveals no problems. There were no unduly influential points.
Our model is: $Height_i = \mu + \epsilon_i$ where $\epsilon_i \sim iid ~ N(0,\sigma^2)$
We are interested if the average height of these children (as adults) which was measured since 1886 is different from the current population average height of 70 inches.
There was evidence to suggest that British people have got taller on average since 1886.
We estimate the height of adults in 1886 to be, on average, between r resultConf1[1] inches.
Thus, the average increase in height was estimated as between r resultConf2[1] inches since 1886.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.