knitr::opts_chunk$set( collapse = TRUE, comment = "ws#>", fig.align = "center" ) # directory to Lab dirdl <- system.file("Lab8",package = "Intro2R") # create rmd link library(Intro2R)
This lab is about sampling distributions and the Central Limit Theorem or CLT for short.
The CLT is a very important distributional result which helps explain why it is that the Normal distribution is so important and pervasive.
If a random sample $Y_1,Y_2, Y_3, \ldots, Y_n$ is drawn from a population with finite mean $\mu$ and variance $\sigma^2$, then, when $n$ is large, the sampling distribution of the sample mean $\bar{Y}$ can be approximated by a normal density function.
The proof for this can be created by using moment generating functions. Essentially we transform the $\bar{Y}$ to $Z$ and then show that the MGF for Z is that of a Normal distribution as $n\rightarrow \infty$.
Since the CLT applies to $\bar{Y}$ it also applies to $T = \sum_{i=1}^{n}Y_i=n\bar{Y}$. So that $T\stackrel{n->\infty}{\sim} N$
This lab will teach and examine some basic algebraic skills related to distributional theory. Make sure you master the techniques. In addition there are some computing skills:
- Sample from any distribution using r ----(), example rbinom(), rpois(), runif(), etc (see ?distributions in R)
- How to create a statistic, usually the sum or the mean.
- How to store the statistic.
- How to repeat the procedure for a designated number of iterations.
- When finished learn how to create a histogram of the statistic with other graphs.
Suppose we have $Y_i\sim D( \ldots)$ where the population has a mean $\mu$ and variance $\sigma^2$. Also we will assume that the random variables $Y_i\stackrel{iid}{\sim}D$
Then the CLT says that provided n is sufficiently large $\bar{Y}\sim N(\mu_{\bar{Y}}, \sigma_{\bar{Y}}^2)$
The question is -- what are the values of $\bar{Y}$ and $\sigma_{\bar{Y}}^2$
The mean of a distribution is called the expected value. We need $E(\bar{Y})$
$$ E(\bar{Y})=E(Y)$$
The variance of a distribution is defined as $V(Y)=E((Y-\mu_Y)^2)$. Therefore, if $L = aY +b$, then $V(L)= a^2V(Y)$.
So that
$$V(\bar{Y})=V\left(\frac{\sum_i Y_i}{n}\right) = \frac{\sigma_{Y}^2}{n}$$
Notice that the variance of $\bar{Y}$ is smaller than the variance of $Y$.
This means that regardless of the distribution that $Y$ takes
$$\bar{Y}\stackrel{n\rightarrow \infty}{\sim} N(\mu_Y, \frac{\sigma_Y^2}{n})$$
Take an example. Suppose $Y_i\stackrel{iid}{\sim} Bern(p)$ and we wish to find the distribution of $\bar{Y}$.
$\mu_y = p$ and $\sigma_Y^2 = pq$, so
$$\bar{Y}\stackrel{n->\infty}{\sim} N(p, \frac{pq}{n})$$
Please start on the lab by downloading the following files and placing in the correct directories.
r rmdfiles("Lab8", "Intro2R")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.