knitr::opts_chunk$set(echo = TRUE)
library(SDS100)
$\$
# step 1: state null and alternative hypotheses # H0: pi = 0.6 # HA: pi > 0.6 # step 2: compute observed statistic # (out of 48 questions, made correct detections 31 times) (lie_phat <- 31/48) # step 3: create the null distribution null_dist <- do_it(10000) * { rflip_count(48, .6)/48 } # plot the null distribution hist(null_dist, breaks = 100, main = "Null distribution", xlab = "Proportion passing lie-detector test") abline(v = lie_phat, col = "red") # step 4: calculate the p-value pnull(lie_phat, null_dist, lower.tail = FALSE) # 5. decision?
What would have happened if we had tested whether lie detector tests get it correct more than 50% of the time?
$\$
# step 1: state null and alternative hypotheses # H0: pi = 0.6 # HA: pi != 0.6 # step 2: compute observed statistic # (out of 48 questions, made correct detections 31 times) (lie_phat <- 31/48) # step 3: create the null distribution null_dist <- do_it(10000) * { rflip_count(48, .6)/48 } # plot the null distribution hist(null_dist, breaks = 100, main = "Null distribution", xlab = "Proportion passing lie-detector test") abline(v = lie_phat, col = "red") # abline needs to have the same deviation from the middle of the distribution (.6) in the other direction abline(v = .6 + (.6 - lie_phat), col = "red") abline(v = .6, col = "green") # step 4: calculate the p-value pval_right_tail <- pnull(lie_phat, null_dist, lower.tail = FALSE) pval_left_tail <- pnull(.6 + (.6 - lie_phat), null_dist, lower.tail = TRUE) (pval <- pval_right_tail + pval_left_tail) # 5. decision?
$\$
Multiple-choice questions on Advanced Placement exams have five options: A, B, C, D, and E. A random sample of the correct choice on 400 multiple-choice questions on a variety of AP shows that B was the correct answer 90 of the 400 questions. Does this provide evidence that B occurs at a different rate than what is expected if each question is equally likely?
Please run a hypothesis test to find out!
$\$
# step 1: state null and alternative hypotheses # H0: pi = 1/5 = 0.2 # HA: pi != 1/5 = 0.2 # step 2: compute observed statistic (b_phat <- 90/400) # step 3: create the null distribution null_dist <- do_it(10000) * { rflip_count(400, .2)/400 } hist(null_dist, breaks = 30, main = "Null distribution", xlab = "Proportion of B's") abline(v = .2, col = "green") abline(v = b_phat, col = "red") abline(v = (.2 - (b_phat - .2)), col = "red") # step 4: calculate the p-value p_lower <- pnull(.2 - (b_phat - .2), null_dist, lower.tail = TRUE) p_upper <- pnull(b_phat, null_dist, lower.tail = FALSE) (p_value <- p_lower + p_upper) # 5. decision? # FYI: You don't need to look at the actual data to solve this problem, # but if you'd like it can be loaded using: # APMultipleChoice <- Lock5Data::APMultipleChoice # Bonus, can you create a 95% CI for the proportion of answers that are B? answers <- Lock5Data::APMultipleChoice$Answer boot_dist <- do_it(10000) * { boot_sample <- sample(answers, 400, replace = TRUE) prop.table(table(boot_sample))["B"] } hist(boot_dist, breaks = 30, main = "Bootstrap distribution") SE_boot <- sd(boot_dist) (CI <- b_phat + 2 * c(-SE_boot, SE_boot))
$\$
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.