In emeyers/SDS100: Tools for the class Introductory Statistics

knitr::opts_chunk$set(echo = TRUE)

library(SDS100)

$\$

Hypothesis test for a single proportion with a known SE

Do goalies guess the direction of a penalty shot less than 50% of the time?

From 1982 to 1994 there were 128 penalty shots in the World Cup.
Goal keepers correctly guessed the direction 41% of the time With SE* = 0.043

Step 1:

$H_0: \pi$ = .5 $H_A: \pi$ < .5

Step 2-5:

# step 2: calculate the z-statistic
z <- (.41 - .5)/.043 


# step 3-4: use the pnorm function to get the p-value
pnorm(-2.093, 0, 1)

Make a decision!

$\$

A formula for the SE for proportions

The standard error for a single proportion is given by the formula:

$$SE = \sqrt{\frac{\pi(1-\pi)}{n}} $$

When running a hypothesis test for a single proportion, we assume that $\pi$ is equal to the value specified by the null hypothesis ($\pi_0$) so we can calculate the standard error as:

$$SE = \sqrt{\frac{\pi_0(1-\pi_0)}{n}}$$

Let's calculate the SE for the soccer example...

(SE <- sqrt((.41 * (1- .41))/128))

$\$

Another example of a hypothesis test for a single proportion

Does the AstraZeneca vaccine cause blood clots?

A study found that 79 people experienced clots after receiving a first vaccine dose. More than 20 million AstraZeneca vaccines doses had been administered across the UK by the end of March.

About four people in a million would normally be expected to develop this particular kind of blood clot - though the fact they are so rare makes the usual rate hard to estimate.

Step 1:

$H_0: \pi$ = 1/1,000,000 $H_A: \pi$ > 1/1,000,000

p_hat <-  79/(20 * 10^6) 

n <- 20 * 10^6

pi_0 <- 1/10^6

p_hat <- 79/(20 * 10^6)

SE <- sqrt(   (pi_0  * (1 - pi_0))/n    )

(z_stat <- (p_hat - pi_0)/SE)





# Are these conditions met? 
n * pi_0         
n * (1 - pi_0) 



# visualize the null distribution

x_vals <- seq(-15, 15, length.out = 1000)

y_vals <- dnorm(x_vals, 0, 1)

plot(x_vals, y_vals, type = "l", xlim = c(-15, 15))


abline(v = z_stat, col = "red")


# get the p-value
pnorm(z_stat, 0, 1, lower.tail = FALSE)


# conclusion!

$\$

CI for a single proportion

What is the probability of having a blood clot if you take the AstraZeneca vaccine?

# calculate the SE
SE <- sqrt(   (p_hat  * (1 - p_hat))/n    )


# get the critical value z* for a 95% confidence interval
z_star <- qnorm(.975)


# create the confidence interval
p_hat - z_star * SE
p_hat + z_star * SE

Is it likely you will get a blood clot?

$\$

Confidence interval for a single proportion using the bootstrap

library(tictoc)


has_clot <- rep(TRUE, 79)
no_clot <- rep(FALSE, (20 * 10^6) - 79)

data_vec <- c(has_clot, no_clot)



boot_data <- sample(data_vec, replace = TRUE)

boot_proportion <- mean(boot_data)


library(tictoc)

tic()

boot_dist <- do_it(10) * {

  boot_data <- sample(data_vec, replace = TRUE)
  boot_proportion <- mean(boot_data)

}

toc()