knitr::opts_chunk$set(echo = TRUE)
library(SDS100)

$\$

Normal distributions

Normal density curves

We can plot a normal density curve using the dnorm(x_vals, mu, sigma) function.

The arguments to the dnorm() function are:

Try plotting a normal density curve with a mean of 20 and a standard deviation of 3.

x_vals <- seq(7, 33, length.out = 1000)

y_vals <- dnorm(x_vals, 20, 3)

plot(x_vals, y_vals, type = "l")

$\$

Normal cumulative distribution functions

We can get the probability of getting a random value less than x from a normal distribution using the pnorm(x, mu, sigma) function.

The arguments to the pnorm() function are:

Try to get the probability of getting a value less than 15 from a normal distribution with a mean of 20 and a standard deviation of 3.

pnorm(15, 20, 3)


# library(mosaic)
# xpnorm(15, 20, 3)

$\$

Normal quantile function

We can get the quantile value from a normal distribution using the qnorm() function. To get a quantile value, we give probability value p that is between 0 and 1. The function returns the value x such that P(X < x) = p.

Theqnorm(p, mu, sigma) function has the following arguments:

Try to get the quantile value such that 30% of a normal normal distribution with a mean of 20 and a standard deviation of 3 is less than the value returned.

qnorm(.3, 20, 3)


# xqnorm(.3, 20, 3)

$\$

The standard normal distribution

The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1.

We can transform any arbitrary normally distributed random variable X to a standard normal distribution using:

$$Z = \frac{(X - \mu)}{\sigma}$$

Conversely, we can transform any standard normally distributed random variable Z into an arbitrary normally distributed random variable X using:

$$X = \mu + \sigma \cdot Z$$

Let's explore this by generating a 10,000 normal random numbers in R using the rnorm(10000, mu, sigma) function and then transforming them to a standard normal distribution. Let's use a mean of 10 and a standard deviation of 3 for the numbers we generate.

# generate normally distributed random data with mean of 10 and a standard deviation of 3
rand_nums <- rnorm(10000, 10, 3)


# visualize the data
hist(rand_nums, breaks = 100)


# transform the data to a standard normal and plot it
transformed_random_data <- (rand_nums - 10)/3
hist(transformed_random_data, breaks = 100)


# look at the mean and standard deviation of the transformed data
mean(transformed_random_data)
sd(transformed_random_data)

$\$

Let's now generate standard normal data and transform it to a normal distribution with mean of 30 and a standard deviation of 5.

rand_nums <- rnorm(10000, 0, 1)

# visualize the data
hist(rand_nums, breaks = 100)


# transform the data to a standard normal and plot it
transformed_random_data <-  5 * rand_nums   + 30 
hist(transformed_random_data, breaks = 100)


# look at the mean and standard deviation of the transformed data
mean(transformed_random_data)
sd(transformed_random_data)

$\$

The Central Limit Theorem

The central limit theorem (CLT) establishes that, for identically distributed independent samples, the sample mean tends towards a normal distribution .

Let's explore this by generating random data in R that is right skewed (using the rexp()` function). We can then show that the sampling distribution of sample means is normally distributed.

library(SDS100)


# generate 100 points from an exponential distribution 
one_sample <- rexp(100)
hist(one_sample)


# take the mean of these points
mean(one_sample)



# create a sampling distribution with 10,000 statistics in it
sampling_dist <- do_it(10000) * {

  one_sample <- rexp(100)
  mean(one_sample)

}



# visualize the sampling distribution
hist(sampling_dist, breaks = 100)

$\$

Inference using normal distributions

Hypothesis test for a single proportion with a known SE

Do goalies guess the direction of a penalty shot less than 50% of the time?

From 1982 to 1994 there were 128 penalty shots in the World Cup.
Goal keepers correctly guessed the direction 41% of the time with SE* = 0.043

Step 1

$H_0$: $H_A$:

# Step 2: 
(z <- (.41 - .5)/.043)



# steps 3 and 4
pnorm(z, 0, 1)



# step 5

$\$

CI for a single mean with a known SE

A dataset of 200 ICU patients found that the average age of patients was 57.55 with a standard error of SE = 1.42

Use the normal distribution to compute a 90%, 95% and 99% CIs for the average age of patients in the ICU

57.55 - qnorm(c(.90, .95, .975, .99, .995)) * 1.42     
57.55 + qnorm(c(.90, .95, .975, .99, .995)) * 1.42 


emeyers/SDS100 documentation built on April 28, 2024, 5:07 p.m.