In emeyers/SDS100: Tools for the class Introductory Statistics

knitr::opts_chunk$set(echo = TRUE)

library(SDS100)

Lie detector hypothesis test

$\$

# step 1: state null and alternative hypotheses
# H0: pi = 0.6
# HA: pi > 0.6


# step 2: compute observed statistic
# (out of 48 questions, made correct detections 31 times)
(lie_phat <- 31/48)



# step 3: create the null distribution

null_dist <- do_it(10000) * {

  rflip_count(48, .6)/48

}



# plot the null distribution
hist(null_dist, breaks = 100, 
     main = "Null distribution",
     xlab = "Proportion passing lie-detector test")

abline(v = lie_phat, col = "red")



# step 4: calculate the p-value
pnull(lie_phat, null_dist, lower.tail = FALSE)



# 5. decision?

What would have happened if we had tested whether lie detector tests get it correct more than 50% of the time?

$\$

Two-sided lie detector hypothesis test

# step 1: state null and alternative hypotheses
# H0: pi = 0.6
# HA: pi != 0.6


# step 2: compute observed statistic
# (out of 48 questions, made correct detections 31 times)
(lie_phat <- 31/48)



# step 3: create the null distribution

null_dist <- do_it(10000) * {

  rflip_count(48, .6)/48

}



# plot the null distribution
hist(null_dist, breaks = 100, 
     main = "Null distribution",
     xlab = "Proportion passing lie-detector test")


abline(v = lie_phat, col = "red")


# abline needs to have the same deviation from the middle of the distribution (.6) in the other direction
abline(v = .6 + (.6 - lie_phat), col = "red")
abline(v = .6, col = "green")



# step 4: calculate the p-value
pval_right_tail <- pnull(lie_phat, null_dist, lower.tail = FALSE)
pval_left_tail <- pnull(.6 + (.6 - lie_phat), null_dist, lower.tail = TRUE)


(pval <- pval_right_tail + pval_left_tail)


# 5. decision?

$\$

Should you choose B on a multiple choice test?

Multiple-choice questions on Advanced Placement exams have five options: A, B, C, D, and E. A random sample of the correct choice on 400 multiple-choice questions on a variety of AP shows that B was the correct answer 90 of the 400 questions. Does this provide evidence that B occurs at a different rate than what is expected if each question is equally likely?

Please run a hypothesis test to find out!

$\$

# step 1: state null and alternative hypotheses
# H0: pi = 1/5 = 0.2
# HA: pi != 1/5 = 0.2


# step 2: compute observed statistic
(b_phat <- 90/400)


# step 3: create the null distribution


null_dist <- do_it(10000) * {

  rflip_count(400, .2)/400

}


hist(null_dist, breaks = 30, 
     main = "Null distribution",
     xlab = "Proportion of B's")


abline(v = .2, col = "green")
abline(v = b_phat, col = "red")
abline(v = (.2 - (b_phat - .2)), col = "red")



# step 4: calculate the p-value

p_lower <- pnull(.2 - (b_phat - .2), null_dist, lower.tail = TRUE) 
p_upper <- pnull(b_phat, null_dist, lower.tail = FALSE) 

(p_value <- p_lower + p_upper)



# 5. decision?





# FYI: You don't need to look at the actual data to solve this problem, 
# but if you'd like it can be loaded using:
#  APMultipleChoice <- Lock5Data::APMultipleChoice



# Bonus, can you create a 95% CI for the proportion of answers that are B? 

answers <- Lock5Data::APMultipleChoice$Answer


boot_dist <- do_it(10000) * {

  boot_sample <- sample(answers, 400, replace = TRUE)
  prop.table(table(boot_sample))["B"]

}


hist(boot_dist, breaks = 30, main = "Bootstrap distribution")


SE_boot <- sd(boot_dist)

(CI <- b_phat + 2 * c(-SE_boot, SE_boot))