In NUstat/ISDStutorials: Tutorial Lessons for Introduction to Statistics and Data Science

library(skimr)
library(learnr)
library(tidyverse)
library(tutorialExtras)
library(gradethis)

library(moderndive)

gradethis_setup(pass = "Submitted",
                fail = "Submitted",
                error_checker.message = "Submitted",
                fail.hint = FALSE
                )

tutorialExtras_setup(is_exam = TRUE)

knitr::opts_chunk$set(echo = FALSE)

nc_births <- read_csv("data/nc_births.csv")

lock_server("lock", 
            ex = c("App1a", "App1b", "App2", "App3"), 
            ex_pts = c(1, 1, 1, 1),
            manual = c("App2-desc", "App3-desc"),
            manual_pts = c(1, 1))

# student name
question_text("Name:",
              answer_fn(function(value){
                              if(length(value) >= 1 ) {
                                return(mark_as(TRUE))
                                }
                              return(mark_as(FALSE) )
                              }),
              incorrect = "Submitted",
              allow_retry = FALSE )

Instructions

Note: this is a sample exam to provide examples of types and styles of questions that may be asked. It does not necessarily represent the exact length of the exam.

You have 50 minutes to complete this exam. The exam covers the material learned from Section 9 - Chapter 12. You are allowed one page of notes front and back.

Once you are finished:

View Submissions to make sure every question/exercise has been submitted.
Click the 'lock exam' button. You will not be able to make any changes once this is clicked.
Once the exam is locked you will be able to click on the 'download exam' button.
Submit the completed html to Canvas.

Concept

quiz(
  caption = NULL,
  question_numeric(
  "Q1) Math SAT scores are normally distributed with a mean of 500 and a standard deviation of 100. Based on the empirical rule, what percent of students score between 200 and 800.</strong>" ,
  answer(c(99.7), correct = TRUE),
  min = 0,
  max = 100,
  step = 0.1,
  allow_retry = TRUE,
  incorrect = "submitted"
  ),
  question_dropdown(
  "Q2) Math SAT scores are normally distributed with an average score of 500 and standard deviation of 100. What percent of people score higher than 600 on the Math SAT?",
  answer("qnorm(p = 1, mean = 0, sd = 1, lower.tail = FALSE)"),
  answer("pnorm(q = 400, mean = 500, sd = 100, lower.tail = FALSE)"),
  answer("pnorm(q = 600, mean = 500, sd = 100)"),
  answer("pnorm(q = 1, mean = 0, sd = 1, lower.tail = FALSE)", correct = TRUE),
  answer("qnorm(p = 600, mean = 500, sd = 100)"),
  allow_retry = TRUE,
  incorrect = "submitted"
  ),
  #Q1
  question_dropdown(
  paste0("Q3) Consider a population that has a unimodal right-skewed distribution. Which of the following pictures would best represent a sampling distribution of a sample mean where repeated samples were taken of size $n = 30$? <br>

  a) ", htmltools::img(src="images/sampling_01a.png", height = 200, width = 400) ,"<br>
  b) ", htmltools::img(src="images/sampling_01c.png", height =200, width = 200) ,"
  c) ", htmltools::img(src="images/sampling_01b.png", height = 200, width = 200)),
  answer("a"),
  answer("b", correct = TRUE),
  answer("c"),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
    "Q4) Which of the following scenarios would have the smallest spread? Assuming each sampling distribution is composed of random samples from the population of interest. <br> <br>",
  answer("5,000 replicates of size n = 5"),
  answer("5,000 replicates of size n = 30"),
  answer("5,000 replicates of size n = 100", correct = TRUE),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q5) You construct a sampling distribution of 300 sample means with sample size n = 60. How will the standard error change if you instead change the sample size to n = 30?",
  answer("increases", correct = TRUE),
  answer("decreases"),
  answer("stays the same"),
  answer("impossible to tell"),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q6) The Central Limit Theorem states which of the following for a sample size $n$, assuming $n$ is sufficiently large.",
  answer("the mean of the set of sample means is equal to the mean of the population", correct = TRUE),
  answer("the mean of the set of sample means is always less than the mean of the population"),
  answer("the standard deviation of sample means is equal to the standard deviation of the population"),
  answer("the standard deviation of sample means is greater than the standard deviation of the population"),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q7) When constructing a confidence interval you change your level of confidence from 97% to 90%. How does your margin of error change?",
  answer("increases"),
  answer("decreases", correct = TRUE),
  answer("stays the same"),
  answer("impossible to tell"),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q8) Suppose you conduct a hypothesis test at the $\\alpha = 0.02$ significance level. Your corresponding p-value is 0.043." ,
  answer("you would accept the null hypothesis because the p-value is greater than the pre-specified significance level"),
  answer("you would reject the null hypothesis because the p-value is less than the standard level of 0.05"),
  answer("you would reject the null hypothesis because the p-value is greater than the pre-specified significance level"),
  answer("you would fail to reject the null hypothesis because the p-value is greater than the pre-specified significance level", correct = TRUE),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q9) Which one of the statements below is TRUE when we decrease the sample size from 500 to 200? Assume a two-tailed hypothesis test for the mean and no other alterations.",
  answer("test statistic becomes larger"),
  answer("p-value for a hypothesis test becomes smaller"),
  answer("95% confidence interval becomes wider", correct = TRUE),
  answer("standard error becomes smaller"),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q10) We are trying to determine if the proportion of US citizens over the age of 25 that have a high school diploma is equal to 90%, $H_0: \\pi = 0.9$. Theoretically, suppose we had 100 surveyers go out and each collect a random sample of size 50 and compute a sample proportion. Each surveyer evaluates the hypothesis at $\\alpha = 0.05$ <br><br> We later find out from the census bureau that the truth is that 90% of citizens over the age of 25 have a high school diploma." ,
  answer("We would expect 90 surveyers to decide the proportion is equal to 0.9"),
  answer("We would expect 5 surveyers to decide that the proportion is equal to 0.9"),
  answer("We would expect 95 surveyers to decide that the proportion is equal to 0.9", correct = TRUE),
  answer("We would expect 50 surveyers to decide that the proportion is equal to 0.9"),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q11) You are conducting a hypothesis test and change your significance level from $\\alpha = 0.10$ to $\\alpha = 0.05$. How does this impact your Type II error rate?",
  answer("increases by exactly 0.05"),
  answer("decreases by exactly 0.05"),
  answer("increases but the magnitude is unknown", correct = TRUE),
  answer("decreases but the magnitude is unknown"),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q12) Imagine your friend might be pregnant and takes a pregnancy test. What is the person told and what is the truth for a Type II error?" ,
  answer("the person is told they are NOT pregnant and the truth is they are NOT pregnant."),
  answer("the person is told they are pregnant and the truth is they are NOT pregnant."),
  answer("the person is told they are NOT pregnant and the truth is they are pregnant", correct = TRUE),
  answer("the person is told they are pregnant and the truth is they are pregnant"),
  answer("not enough information provided"),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q13) Imagine you are conducting a hypothesis test and you reject the null hypothesis when the truth is that the null hypothesis is true. This is also known as a false positive.",
  answer("TRUE", correct = TRUE),
  answer("FALSE"),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q14) In repeated sampling, the sampling distribution of a sample mean is the normal (or T) distribution.",
  answer("TRUE", correct = TRUE),
  answer("FALSE"),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q15) Suppose you conducted a hypothesis test for a mean, $\\mu$, but there was disagreement over whether to use the standard normal distribution or the t-distribution. The STAT is 2.5. The corresponding p-value for the t-distribution will be larger than the corresponding p-value for the normal distribution.",
  answer("TRUE", correct = TRUE),
  answer("FALSE"),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q16) A random sample of middle age adults were asked how many different jobs they have held. The 90% confidence interval was found to be [4.5, 6.9]. This means we are 90% confident that a randomly chosen middle age adult has held between 4.5 and 6.9 jobs.",
  answer("TRUE"),
  answer("FALSE", correct = TRUE),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  "Q17) Northwestern claims that the average starting salary for a Northwestern graduate is more than the nationwide average starting salary of $47,000. We want to test this claim using a hypothesis test at the $\\alpha = 0.10$ significance level. What is an appropriate null hypothesis? <br> <br>
  a) $\\bar{x}_{salary} = 47000$ <br>
  b) $\\bar{x}_{salary} > 47000$ <br>
  c) $\\mu_{salary} = 47000$ <br>
  d) $\\mu_{salary} > 47000$ <br>
  e) $\\mu_{salary} \\ge 47000$",
  answer("a"),
  answer("b"),
  answer("c", correct = TRUE),
  answer("d"),
  answer("e"),
  allow_retry = TRUE,
  incorrect = "submitted"),
  question_dropdown(
  paste0("Q18) In Question 17, the hypothesis test results are as follows.<br>" ,   htmltools::img(src="images/htest_salary.png", height = 200, width = 500)," <br> What is the decision?"),
  answer("Reject the null, the average starting salary is greater than $47,000.", correct = TRUE),
  answer("Fail to reject the null, the average starting salary is greater than $47,000."),
  answer("Reject the null, the average starting salary is NOT greater than $47,000."),
  answer("Fail to reject the null, the average starting salary is NOT greater than $47,000."),
  allow_retry = TRUE,
  incorrect = "submitted")

)

Application

The following applications use the nc_births dataset which has been preloaded for you. This dataset contains a random sample of birth records from all hospitals in North Carolina.

This data has been pre-cleaned to remove any impossible observations. Missing values have not been removed and may need to be handled.

NA values should NOT be included in the total count (n) of a proportion because we do not know what category they would fall in and it would skew our proportion.

If for some reason you are not able to complete the code to obtain the results it would be wise to still interpret your problem with 'placeholder' values for partial credit.

Application 1 (5 points)

Suppose that the weight of a baby is normally distributed with mean 7.5 pounds and $\sigma$ = 0.25. 25% of babies are born below what weight?

qnorm(p = .25, mean = 7.5, sd = 0.25)

grade_this_code(
  correct = "Submitted",
  incorrect = "Submitted"
)

Suppose you are conducting a 90% confidence interval for a t-test with degrees of freedom = 24. What are the corresponding critical values? Note: there are 2 critical values.

qt(p = .05, df =24)
qt(p = .05, df =24, lower.tail = FALSE)

grade_this_code(
  correct = "Submitted",
  incorrect = "Submitted"
)

Application 2 (5 points)

Suppose we are interested in examining if the proportion of female babies that are born over 9 pounds is less than the proportion of male babies born over 9 pounds. Construct a 90% confidence interval for the difference in proportion.

nc_births |> 
  filter(weight > 9) |> 
  count(gender)

nc_births |> 
  count(gender)

prop.test(x = c(17, 44), n = c(503, 497))

grade_this_code(
  correct = "Submitted",
  incorrect = "Submitted"
)

Interpret the confidence interval in the context of the problem. Is there a difference proportions?

question_text("", incorrect = "submitted",
              answer("ManuallyGradedEverythingWrong", 
                     correct = TRUE),
              allow_retry = TRUE,
              rows = 5)

Application 3 (5 points)

It is claimed that the national average weight of female babies is 7.5 pounds at birth. Test to see if the average weight of babies born in North Carolina with a gender of female is equal to the national average of 7.5.

Conduct a hypothesis test for the weight at the $\alpha = 0.01$ significance level.

$$H_0: \mu_{female-weight} = 7.5$$ $$H_A: \mu_{female-weight} \ne 7.5$$

nc_female <- nc_births %>% 
  filter(gender == "female")

t.test(nc_female$weight, mu = 7.5, conf.level = 0.99)

grade_this_code(
  correct = "Submitted",
  incorrect = "Submitted"
)

State your p-value and interpret your decision in the context of the problem.

question_text("", incorrect = "submitted",
              answer("ManuallyGradedEverythingWrong", 
                     correct = TRUE),
              allow_retry = TRUE,
              rows = 5)

nc_births Data

You are welcome to open the dataset in RStudio Cloud as well if the information provided below is not sufficient for you.

Variables:

father_age -- father's age in years
mother_age -- mother's age in years
mature -- maturity status of mother {younger mom, mature mom}
weeks -- length of pregnancy in weeks
premie -- whether the birth was classified as premature {premie, full term}
visits -- number of hospital visits during pregnancy
marital -- marital status at birth {married, not married}
white_mom -- whether mom is {white, not white}
gained -- weight gained by mother during pregnancy in pounds
weight -- weight of the baby at birth in pounds
low_birth_weight -- whether baby was classified as low birth weight {low, not low}
gender -- gender of baby {female, male}
habit -- smoking habit of mother {nonsmoker, smoker}

glimpse(nc_births)

skim(nc_births)

Submit

Once you are finished:

No Name = 0 grade
Click Check Submissions to check that all questions and exercises are SUBMITTED
Click the 'Lock Exam' button below. Once you click this you will not be able to make any changes to your exam!
Once the lock is pressed a Download Exam option will become available
Click Download and submit the downloaded html to Canvas.