knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "ws#>",
  fig.align = "center"
)

# directory to Lab
dirdl <- system.file("Lab8",package = "Intro2R")

# create rmd link

library(Intro2R)

Introduction

This lab is about sampling distributions and the Central Limit Theorem or CLT for short.

The CLT is a very important distributional result which helps explain why it is that the Normal distribution is so important and pervasive.

The central limit theorem

If a random sample $Y_1,Y_2, Y_3, \ldots, Y_n$ is drawn from a population with finite mean $\mu$ and variance $\sigma^2$, then, when $n$ is large, the sampling distribution of the sample mean $\bar{Y}$ can be approximated by a normal density function.

The proof for this can be created by using moment generating functions. Essentially we transform the $\bar{Y}$ to $Z$ and then show that the MGF for Z is that of a Normal distribution as $n\rightarrow \infty$.

$\bar{Y}$ and the sum

Since the CLT applies to $\bar{Y}$ it also applies to $T = \sum_{i=1}^{n}Y_i=n\bar{Y}$. So that $T\stackrel{n->\infty}{\sim} N$

Lab objectives

This lab will teach and examine some basic algebraic skills related to distributional theory. Make sure you master the techniques. In addition there are some computing skills:

  1. Sample from any distribution using r ----(), example rbinom(), rpois(), runif(), etc (see ?distributions in R)
  2. How to create a statistic, usually the sum or the mean.
  3. How to store the statistic.
  4. How to repeat the procedure for a designated number of iterations.
  5. When finished learn how to create a histogram of the statistic with other graphs.

Algebraic skills

Suppose we have $Y_i\sim D( \ldots)$ where the population has a mean $\mu$ and variance $\sigma^2$. Also we will assume that the random variables $Y_i\stackrel{iid}{\sim}D$

Then the CLT says that provided n is sufficiently large $\bar{Y}\sim N(\mu_{\bar{Y}}, \sigma_{\bar{Y}}^2)$

The question is -- what are the values of $\bar{Y}$ and $\sigma_{\bar{Y}}^2$

The mean of $\bar{Y}$, $\mu_{\bar{Y}}$

The mean of a distribution is called the expected value. We need $E(\bar{Y})$

$$ E(\bar{Y})=E(Y)$$

The variance of $\bar{Y}$, $\sigma_{\bar{Y}}^2$

The variance of a distribution is defined as $V(Y)=E((Y-\mu_Y)^2)$. Therefore, if $L = aY +b$, then $V(L)= a^2V(Y)$.

So that

$$V(\bar{Y})=V\left(\frac{\sum_i Y_i}{n}\right) = \frac{\sigma_{Y}^2}{n}$$

Notice that the variance of $\bar{Y}$ is smaller than the variance of $Y$.

Conclusion

This means that regardless of the distribution that $Y$ takes

$$\bar{Y}\stackrel{n\rightarrow \infty}{\sim} N(\mu_Y, \frac{\sigma_Y^2}{n})$$

Example

Take an example. Suppose $Y_i\stackrel{iid}{\sim} Bern(p)$ and we wish to find the distribution of $\bar{Y}$.

$\mu_y = p$ and $\sigma_Y^2 = pq$, so

$$\bar{Y}\stackrel{n->\infty}{\sim} N(p, \frac{pq}{n})$$

Finally

Please start on the lab by downloading the following files and placing in the correct directories.

r rmdfiles("Lab8", "Intro2R")



MATHSTATSOU/Intro2R documentation built on Feb. 20, 2025, 6:18 a.m.