$\$
The purpose of this homework is to practice exploring normal distributions, and to run parametric hypothesis tests and create parametric confidence intervals for proportions. Please submit a compiled pdf with your answers to the exercises to Gradescope by 11pm on Sunday April 7th.
Functions you might find useful are pnorm()
and qnorm()
along with the list of functions found on Canvas. For full credit on problems involving R, be sure to label all your figures and to "show your work" by making sure all values that are answers to questions are printed/visible in your R Markdown pdf.
As always, if you need help with any of the homework assignments, please attend the TA office hours which are listed on Canvas and/or ask questions on Ed Discussions. Also, if you have completed the homework, please help others out by answering questions on Ed Discussions. Finally, reviewing the class slides and videos will be helpful if you run into difficulties.
download.file("https://raw.githubusercontent.com/emeyers/SDS100/master/ClassMaterial/images/Lock5_5_11.png", "Lock5_5_11.png", mode = "wb")
library(SDS100) library(knitr) options(scipen=999) set.seed(100)
$\$
In the first set of exercises you will explore normal distributions using R as well as some web applications. I encourage you to review the examples in the Lock5 textbook for additional practice. For all your answers be sure to "show your work" by printing the output of your calculations in the R chunks.
$\$
\pagebreak
Exercise 1.1 (4 points): In figure 1 below, the percentage of the population that is more than 30 is closest to which value: 4%, 27% 50%, 73% or 95% ?
Answer:
$\$
Exercise 1.2 (9 points) : Use the R chunk below and the pnorm()
function to find the areas on the standard normal density curve N(0, 1) listed below. You can check your answers using the apps at: (https://ethan-meyers-test.shinyapps.io/normal_area/) and (https://ethan-meyers-test.shinyapps.io/normal_range/)
a. The area below z = 0.8 b. The area above z = -1.5 c. The area between z = -1.75 and z = 0.5
Note: if you install and load the mosaic
package (using install.packages("mosaic")
, then library(mosaic)
, you can also use the mosaic::xpnorm()
function. The mosaic::xpnorm()
function works the same way as the pnorm()
function but it will also display a visualization of the distribution which is useful to confirm that you have the correct result.
Answers
a. b. c.
# a) The area below z = 0.8 # b) The area above z = -1.5 # c) The area between z = -1.75 and z = 0.5
$\$
Exercise 1.3 (9 points): Find the 'endpoints' (quantile values) for the specified area under the specified normal distribution. To do this use the qnorm()
function. You can then check your results using this app: (https://ethan-meyers-test.shinyapps.io/normal_quantile/)
a. For a normal curve of N(25, 8), what is the endpoint value such that the area under the curve to the right of the endpoint is 0.25?
b. For a normal curve of N(500, 80), what is the endpoint value such that the area under the curve to the left of the endpoint is 0.02?
c. For a normal curve of N(10, 3), what are the endpoint values such that the symmetric area under the middle of the curve is 0.95?
Answers:
a.
b.
c.
# a) # b) # c)
$\$
Exercise 1.4 (14 points): A Statistics professor designed an exam so that the grades would be roughly normally distributed with mean $\mu$ = 75 and standard deviation $\sigma$ = 10. Unfortunately, the Canvas website that the students were taking the exam on crashed with ten minutes to go in the exam which made it difficult for some students to finish. When the instructor graded the exams, he found that the grades were roughly normally distributed, but the mean grade was 62 and the standard deviation was 18. To be fair, he decides to "curve" the scores to match the desired N(75, 10) distribution. To do this, he standardizes the actual scores to z-scores using the N(62, 18) distribution and then "unstandardizes" those z-scores to convert to N(75, 10). What is the new grade assigned for a student whose original score was 47? Now how about a student who originally scores a 90? Hint: you can use the R chunks below to convert the two original scores to z-scores and then convert them back to the new scores from the N(75, 10) distribution.
Answers:
$\$
Exercise 1.5 (9 points): Find the p-value based on the standard normal distribution for each of the following standardized test statistics. By an upper-tailed test, we mean a test where the alternative hypothesis is framed as $H_A:param > x$ while a two-tailed test is framed as $H_A:param \ne x$. Please show the code you used to get your answers below, or explain how you got your answers.
a. z = 0.84 for an upper tail test for a difference in two proportions. b. z = -2.38 for a lower tail test for a diffrence in two means. c. z = 2.225 for a two-tailed test for a correlation.
Answers:
a.
b.
c.
# a) # b) # c)
$\$
In the next set of exercises you will practice computing confidence intervals and hypothesis tests for a single proportion using parametric methods based on the normal distribution. Again, I encourage you to review the examples in the Lock5 textbook for more practice. Also, for all your answers, be sure to "show your work" by printing the output of your calculations in the R chunks.
$\$
Exercise 2.1 (12 points): A recent study found that, of the 1771 participants aged 12 to 19 in the National Health and Nutrition Examination Survey, 19.5% had some hearing loss (defined as loss of 15 decibels in at least one ear). This is a dramatic increase from a decade ago. The sample size is large enough to use the normal distribution, and a bootstrap distribution shows that the standard error for the proportion is SE = 0.009. Find and interpret a 90% confidence interval for the proportion of teenagers with some hearing loss. Use the code chunk below to show your work.
Answer:
$\$
Exercise 2.2 (12 points): Suppose we have a population proportion $\pi$ = 0.27. Let's create the sampling distribution for n = 50 observations of a categorical variable by doing the following:
a. First, find the mean and the standard error of the sampling distribution of proportions.
b. Second, the code below plots the density curve for N(.5, .1). Modify the code below to plot the sampling distribution of proportions for $\pi$ = 0.27, n = 50. Is the sample size large enough for this normal density function to be valid to use as the sampling distribution? Why or why not?
# a) report/calculate the mean and SE of the sampling distribution mu <- 0 # change these values SE <- 0 mu SE # b) plot the density curve # The x-values of the function - you do not need to change this! x_vals <- seq(0, 1, by = .001) # you do not need to change this # Change the parameters in the dnorm() function to create the appropriate density function. # You do not need to change the first parameter - it sets our x-values. density_curve <- dnorm(x_vals, 0, 0) # Plotting the function - you do not need to change this either plot(x_vals, density_curve, type = 'l', xlab = expression(hat(p)), ylab = 'Density', main = expression(paste("Sampling distribution of ", hat(p))))
Answers:
a.
b.
$\$
Exercise 2.3 (14 points): In a survey of 1,000 US adults, twenty percent say they never exercise. This is the highest level seen in five years. Use the R chunk below to:
Answer:
$\$
Exercise 2.4 (14 points): Home Field Advantage in Baseball in 2014 There were 2428 Major League Baseball (MLB) games played in 2014, and the home team won 1288 of those games. If we consider the games played in 2014 as a sample of all MLB games, test to see if there is evidence, at the significance level of $\alpha$ = 1%, that the home team wins more than half the games. In particular, please complete the following steps:
State the null and alternative hypotheses in symbols.
Write down $\hat{p}$, n and compute the SE of our sampling distribution under the null hypothesis.
Then create the z-statistic.
Then use pnorm()
to find the probability you would get a value as big or bigger than the observed $\hat{p}$ (i.e., get the p-value).
Make a conclusion.
Show all details of your calculations in the code chunk below, and print out the relevant values to "show your work".
Answer:
$\$
Please reflect on how the homework went by going to Canvas, going to the Quizzes link, and clicking on Reflection on homework 8.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.