$\$

# get some data and install a package that is needed #install.packages("latex2exp") # get some images that are used in this document SDS230::download_image("which_are_prob_densities.png") SDS230::download_image("area_pdf.png") SDS230::download_image("probability_area.png") SDS230::download_image("Combined_Cumulative_Distribution_Graphs.png") download.file("https://raw.githubusercontent.com/emeyers/SDS230/master/ClassMaterial/data/profiles_revised.csv", "profiles_revised.csv", mode = "wb")

knitr::opts_chunk$set(echo = TRUE) set.seed(230)

$\$

For loops are useful when you want to repeat a piece of code many times under similar conditions

Print the numbers from 1 to 50...

for (i in 1:50) { print(i) }

$\$

For loops are particular useful in combination with vectors that can store the results.

Create a vector with the squares of the numbers from 1 to 50.

# create a loop that creates a vector with the squares of the numbers from 1 to 50. # plot the results

$\$

Use a for loop to create a vector called `the_results`

that holds the values at multiples of 3 from 3 to 300; i.e., `the_results`

should hold the numbers 3, 6, 9, ..., 300

$\$

R has built in functions to generate data from different distributions. All these functions start with the letter `r`

.

We can set the random number generator **seed** to always get the same sequence of random numbers.

Let's get a sample of n = 200 random points from the uniform distribution using `runif()`

# set the seed to a specific number to always get the same sequence of random numbers set.seed(530) # generate n = 100 points from U(0, 1) using runif() function # plot a histogram of these random numbers

There are many other distributions we can get random numbers from including:

- Normal distributions:
`rnorm()`

- Exponential distributions:
`rexp()`

- Binomial distributions
`rbinom()`

And many more!

The first argument to all these functions is the number of random points you want to generate (`n`

) and then there are additional arguments that can be used to control the shape of the distribution (i.e., that set the "parameters" of the distribution),

# generate n = 1000 points from standard normal distribution N(0, 1) # plot a histogram of these random numbers

$\$

$\$

Probability density functions can be used to model random events. All **probability density functions**, *f(x)*, have these properties:

- The function are always non-negative.
- The area under the function integrates (sums) to 1.

Which of the following are probability density functions?

$\$

For continuous (quantitative) data, we use density function f(x) to find the probability (e.g., the long run frequency) that a random number X is between two values *a* and *b* using:

$P(a < X < b) = \int_{a}^{b}f(x)dx$

$\$

$\$

If we want to plot the true probability density function for the standard uniform distribution U(0, 1) we can use the `dunif()`

function. All density function in base R start with `d`

.

# the x-value domain for the density function f(x) # plot the probability density function

**Question:** Can you create a density plot for the standard normal distribution?

$\$

Cumulative probability distribution functions give us the probability of getting a random number X that is less than (or equal to) a particular value x; i.e., they give us $P(X \le x)$. For example, they could be used to give us the probability that a random number will be less than 2: $P(X \le 2)$.

Cumulative probability distribution functions are obtained by integrating a probability density function:

$P(X \le x) = F_X(x) = \int_{-\infty}^x f(x)dx$

where `f(x)`

is a probability density function and $F_X(x)$ is the cumulative distribution function.

$\$

To get the values that a random number X is less than a particular value x using R, we can use a series of functions that start with the letter `d`

.

For example, to get the probability a random number X generated from the standard uniform distribution U(0, 1) will be less than .25; i.e., $P(X \le .25)$ we can use `dunif()`

.

$\$

A distribution of statistics is called a **sampling distribution**.

Can you generate and plot an approximate sampling distribution for: * sample means $\bar{x}$'s * sample size n = 100 * for data that come from uniform distribution

Note the shape of the *sampling distribution* can be quite different from the shape of the data distribution (which is uniform here).

# create a sampling distribution of the mean using data from a uniform distribution sampling_dist <- NULL # plot a histogram of the sampling distribution of these means

$\$

The deviation of a sampling distribution is called the standard error (SE). Can you calculate (an approximate) standard error for the sampling distribution you created above?

$\$

We generate samples from an actual data set we have using the `sample()`

function.

Let's start by just generate a single sample of size n = 100 from the OkCupid users' heights and calculating the mean of this sample.

# read in the okcupid data profiles <- read.csv("profiles_revised.csv") # get the heights for the OkCupid data # get one random sample of heights from 100 people # get the mean of this sample

$\$

We can then create an approximation of a sampling distribution from the OkCupid users' data set by repeating this many times in a for loop.

# repeat the process 1,000 times sampling_dist <- NULL # plot a histogram of this sampling distribution

**Question:** What would have to be true for this to be an actual sampling distribution?

$\$

The central limit theorm (CTL) establishes that (in most situations) when independent random variables are added their (normalized) sum converges to a normal distribution.

Put another way, if we define the average random (i.i.d) sample {$X_1$, $X_2$, ..., $X_n$} of size *n* as:

$S_{n}:={\frac{X_{1}+\cdots +X_{n}}{n}}$

then the CTL tells us that:

$\sqrt{n}(S_{n} - \mu)$ $\xrightarrow {d} N(0,\sigma^{2})$

You will explore this more through simulations on homework 2.

$\$

emeyers/SDS230 documentation built on Jan. 13, 2023, 5:16 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.