In emeyers/SDS100: Tools for the class Introductory Statistics

knitr::opts_chunk$set(echo = TRUE)

library(SDS100)

$\$

Normal distributions

Normal density curves

We can plot a normal density curve using the dnorm(x_vals, mu, sigma) function.

The arguments to the dnorm() function are:

x_vals: a vector of x-values that we will get the corresponding y-values. We can use the seq(start_val, end_val, length.out) argument to create these x-vals.
mu: the mean of the normal density curve
sigma: the standard deviation of the normal density curve

Try plotting a normal density curve with a mean of 20 and a standard deviation of 3.

x_vals <- seq(7, 33, length.out = 1000)

y_vals <- dnorm(x_vals, 20, 3)

plot(x_vals, y_vals, type = "l")

$\$

Normal cumulative distribution functions

We can get the probability of getting a random value less than x from a normal distribution using the pnorm(x, mu, sigma) function.

The arguments to the pnorm() function are:

x: the value such that P(X < x)
mu: the mean of the normal density curve
sigma: the standard deviation of the normal density curve

Try to get the probability of getting a value less than 15 from a normal distribution with a mean of 20 and a standard deviation of 3.

pnorm(15, 20, 3)


# library(mosaic)
# xpnorm(15, 20, 3)

$\$

Normal quantile function

We can get the quantile value from a normal distribution using the qnorm() function. To get a quantile value, we give probability value p that is between 0 and 1. The function returns the value x such that P(X < x) = p.

Theqnorm(p, mu, sigma) function has the following arguments:

p: a value between 0 and 1 such that P(X < x) = p
mu: the mean of the normal density curve
sigma: the standard deviation of the normal density curve

Try to get the quantile value such that 30% of a normal normal distribution with a mean of 20 and a standard deviation of 3 is less than the value returned.

qnorm(.3, 20, 3)


# xqnorm(.3, 20, 3)

$\$

The standard normal distribution

The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1.

We can transform any arbitrary normally distributed random variable X to a standard normal distribution using:

$$Z = \frac{(X - \mu)}{\sigma}$$

Conversely, we can transform any standard normally distributed random variable Z into an arbitrary normally distributed random variable X using:

$$X = \mu + \sigma \cdot Z$$

Let's explore this by generating a 10,000 normal random numbers in R using the rnorm(10000, mu, sigma) function and then transforming them to a standard normal distribution. Let's use a mean of 10 and a standard deviation of 3 for the numbers we generate.

# generate normally distributed random data with mean of 10 and a standard deviation of 3
rand_nums <- rnorm(10000, 10, 3)


# visualize the data
hist(rand_nums, breaks = 100)


# transform the data to a standard normal and plot it
transformed_random_data <- (rand_nums - 10)/3
hist(transformed_random_data, breaks = 100)


# look at the mean and standard deviation of the transformed data
mean(transformed_random_data)
sd(transformed_random_data)

$\$

Let's now generate standard normal data and transform it to a normal distribution with mean of 30 and a standard deviation of 5.

rand_nums <- rnorm(10000, 0, 1)

# visualize the data
hist(rand_nums, breaks = 100)


# transform the data to a standard normal and plot it
transformed_random_data <-  5 * rand_nums   + 30 
hist(transformed_random_data, breaks = 100)


# look at the mean and standard deviation of the transformed data
mean(transformed_random_data)
sd(transformed_random_data)

$\$

The Central Limit Theorem

The central limit theorem (CLT) establishes that, for identically distributed independent samples, the sample mean tends towards a normal distribution .

Let's explore this by generating random data in R that is right skewed (using the rexp()` function). We can then show that the sampling distribution of sample means is normally distributed.

library(SDS100)


# generate 100 points from an exponential distribution 
one_sample <- rexp(100)
hist(one_sample)


# take the mean of these points
mean(one_sample)



# create a sampling distribution with 10,000 statistics in it
sampling_dist <- do_it(10000) * {

  one_sample <- rexp(100)
  mean(one_sample)

}



# visualize the sampling distribution
hist(sampling_dist, breaks = 100)

$\$

Inference using normal distributions

Hypothesis test for a single proportion with a known SE

Do goalies guess the direction of a penalty shot less than 50% of the time?

From 1982 to 1994 there were 128 penalty shots in the World Cup.
Goal keepers correctly guessed the direction 41% of the time with SE* = 0.043

Step 1

$H_0$: $H_A$:

# Step 2: 
(z <- (.41 - .5)/.043)



# steps 3 and 4
pnorm(z, 0, 1)



# step 5

$\$

CI for a single mean with a known SE

A dataset of 200 ICU patients found that the average age of patients was 57.55 with a standard error of SE = 1.42

Use the normal distribution to compute a 90%, 95% and 99% CIs for the average age of patients in the ICU

57.55 - qnorm(c(.90, .95, .975, .99, .995)) * 1.42     
57.55 + qnorm(c(.90, .95, .975, .99, .995)) * 1.42

emeyers/SDS100 documentation built on April 28, 2024, 5:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

emeyers/SDS100
Tools for the class Introductory Statistics

In emeyers/SDS100: Tools for the class Introductory Statistics

Normal distributions

Normal density curves

Normal cumulative distribution functions

Normal quantile function

The standard normal distribution

The Central Limit Theorem

Inference using normal distributions

Hypothesis test for a single proportion with a known SE

CI for a single mean with a known SE

R Package Documentation

Browse R Packages

We want your feedback!

emeyers/SDS100 Tools for the class Introductory Statistics

In emeyers/SDS100: Tools for the class Introductory Statistics

Normal distributions

Normal density curves

Normal cumulative distribution functions

Normal quantile function

The standard normal distribution

The Central Limit Theorem

Inference using normal distributions

Hypothesis test for a single proportion with a known SE

CI for a single mean with a known SE

R Package Documentation

Browse R Packages

We want your feedback!

emeyers/SDS100
Tools for the class Introductory Statistics