knitr::opts_chunk$set(echo = TRUE)
library(SDS100)

$\$

Normal distributions

Normal density curves

We can plot a normal density curve using the dnorm(x_vals, mu, sigma) function.

The arguments to the dnorm() function are:

Try plotting a normal density curve with a mean of 20 and a standard deviation of 3.

x_vals <- seq(7, 33, length.out = 1000)

$\$

Normal cumulative distribution functions

We can get the probability of getting a random value less than x from a normal distribution using the pnorm(x, mu, sigma) function.

The arguments to the pnorm() function are:

Try to get the probability of getting a value less than 15 from a normal distribution with a mean of 20 and a standard deviation of 3.

# get P(X < 15; mu = 20, sigma = 3)




# library(mosaic)

$\$

Normal quantile function

We can get the quantile value from a normal distribution using the qnorm() function. To get a quantile value, we give probability value p that is between 0 and 1. The function returns the value x such that P(X < x) = p.

Theqnorm(p, mu, sigma) function has the following arguments:

Try to get the quantile value such that 30% of a normal normal distribution with a mean of 20 and a standard deviation of 3 is less than the value returned.

# Get quantile value for a q = .3, from a normal with mean = 20, sigma = 3

$\$

The standard normal distribution

The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1.

We can transform any arbitrary normally distributed random variable X to a standard normal distribution using:

$$Z = \frac{(X - \mu)}{\sigma}$$

Conversely, we can transform any standard normally distributed random variable Z into an arbitrary normally distributed random variable X using:

$$X = \mu + \sigma \cdot Z$$

Let's explore this by generating a 10,000 normal random numbers in R using the rnorm(10000, mu, sigma) function and then transforming them to a standard normal distribution. Let's use a mean of 10 and a standard deviation of 3 for the numbers we generate.

# generate normally distributed random data with mean of 10 and a standard deviation of 3
rand_nums <- rnorm(10000, 10, 3)


# visualize the data




# transform the data to a standard normal and plot it





# look at the mean and standard deviation of the transformed data

$\$

Let's now generate standard normal data and transform it to a normal distribution with mean of 30 and a standard deviation of 5.

# generate 10,000 points from a standard normal distribution 




# visualize the data




# transform the data into a normal distribution with mean of 30 and a standard deviation of 5 




# look at the mean and standard deviation of the transformed data

$\$

The Central Limit Theorem

The central limit theorem (CLT) establishes that, for identically distributed independent samples, the sample mean tends towards a normal distribution .

Let's explore this by generating random data in R that is right skewed (using the rexp()` function). We can then show that the sampling distribution of sample means is normally distributed.

library(SDS100)


# generate 100 points from an exponential distribution 
one_sample <- rexp(100)
hist(one_sample)



# take the mean of these points




# create a sampling distribution with 10,000 statistics in it






# visualize the sampling distribution

$\$

Inference using normal distributions

Hypothesis test for a single proportion with a known SE

Do goalies guess the direction of a penalty shot less than 50% of the time?

From 1982 to 1994 there were 128 penalty shots in the World Cup.
Goal keepers correctly guessed the direction 41% of the time with SE* = 0.043

Step 1

$H_0$: $H_A$:

# Step 2: 





# steps 3 and 4





# step 5

$\$

CI for a single mean with a known SE

A data set of 200 ICU patients found that the average age of patients was 57.55 with a standard error of SE = 1.42

Use the normal distribution to compute a 90%, 95% and 99% CIs for the average age of patients in the ICU




emeyers/SDS100 documentation built on April 28, 2024, 5:07 p.m.