$\$
The purpose of this homework is to practice running parametric hypothesis tests and computing parametric confidence intervals for one and two means. Please submit a compiled pdf with your answers to the exercises to Gradescope by 11pm on Sunday April 14th, making sure to mark all correct pages per question to avoid points being taken off your score.
Functions you might fun useful are pt()
and qt()
along with the list of functions found on Canvas. For full credit on problems involving R, be sure to label all your figures and to "show your work" by making sure all values that are answers to questions are printed/visible in your R Markdown pdf.
As always, if you need help with any of the homework assignments, please attend the TA office hours which are listed on Canvas and/or ask questions on Ed Discussions forum. Also, if you have completed the homework, please help others out by answering questions on Ed Discussions. Finally, reviewing the class slides and videos will be helpful if you run into difficulties.
library(SDS100) library(knitr) options(scipen=999) set.seed(100)
$\$
In the first set of exercises you will explore questions related to parametric inference for a single mean. I encourage you to review the examples in the Lock5 textbook for additional practice. For all your answers be sure to "show your work" by printing the output of your calculations in the R chunks.
$\$
Exercise 1.1 (5 points): Suppose you had a sampling distribution that was a t-distribution which had been created by taking samples of size n = 10. Please find the two quantile values from this sampling distribution such that 5% of your statistics are in each tail. Hint: all quantile functions in R start with the letter q, and then have the distribution name, so for quantiles of the t distribution we use the qt()
function and we specify the area we want and the degrees of freedom (df), i.e., qt(area, df = degrees_of_freedom)
.
Answer:
$\$
Exercise 1.2 (points 10): Find a 95% confidence interval for $\mu$ using the sample results $\bar{x}$ = 12.7, s = 5.6, and n = 30. Assume the underlying distribution is relatively normal so using the t-distribution is valid here. Hint: as usual, compute the t* quantile value, then compute the standard error and then the confidence interval.
Answers:
$\$
Exercise 1.3 (8 points): Run a hypothesis test to test: $H_0: \mu = 100$ vs. $H_A: \mu < 100$ using the sample results $\bar{x}$ = 91.7, s = 12.5, n = 30. Hint: compute the t-statistic and the use pt()
to find the appropriate p-value. Write down your conclusions in the answer section below.
Answers:
$\$
Exercise 1.4 (10 points): Children with autism often have a small head circumference at birth, followed by a sudden and excessive increase in head circumference during the first year of life. A recent study examined the brain tissue in autopsies of seven male children with autism between the ages of 2 and 16. The mean number of neurons in the pre-frontal cortex in male children without autism of the same age is about 1.15 billion. The pre-frontal cortex is the part of the brain most disrupted in autism, as it deals with language and social communication. In the sample of seven children with autism, the mean number of neurons in the pre-frontal cortex was 1.94 billion with a standard deviation of 0.50 billion. The values in the sample are not heavily skewed. Use the t-distribution to test whether this sample provides evidence that male children with autism have more neurons (on average) in the pre-frontal cortex than children without autism. (This study indicates that the causes of autism may be present before birth.) Write down your conclusions in the answer section below.
Hint: write down the null and alternative hypotheses, compute the statistic of interest and then calculate a p-value using the pt()
function.
Answers:
$\$
In the next set of exercises you will explore questions related to parametric inference for comparing two means.
$\$
Exercise 2.1 (8 points): Test $H_0: \mu_1 = \mu_2$ vs. $H_A: \mu_1 > \mu_2$ using the sample results $\bar{x}_1$ = 56, $s_1$ = 8.2 with $n_1$ = 30 and $\bar{x}_2$ = 51, $s_2$ = 6.9 with $n_2$ = 40. Compute the statistic of interest and then calculate a p-value using the pt()
function. Finally, write down your conclusions in the answer section below.
Answers:
$\$
Exercise 2.2 (9 points): Studies are finding that bacteria in the stomach are essential for healthy functioning of the human body. One study compared the number of unique bacterial genes in stomachs of healthy patients and those of patients with irritable bowel syndrome (IBS). For healthy patients, we have $\bar{x}$ = 564 million with s = 122 million and n = 99. For those with IBS, we have $\bar{x}$ = 425 million with s = 127 million and n = 25. Both distributions appear to be approximately normally distributed. Test to see if people with IBS have, on average, significantly fewer unique bacterial genes in their stomachs. Show all details, including giving the degrees of freedom used and write down your conclusions in the answer section below.
Answers:
$\$
In the following exercises we will use several different types of tests to examine whether students' pulse rates are higher when they are taking a quiz compared to when they are listening to a lecture (we will assume that the mean pulse rate is higher when taking a quiz compared to when listening to a lecture since quizzes are more stressful). A data set was collected from 10 students when they took a quiz and when the same 10 students were listening to a lecture. The code to download the data set is below and extracts vector of the students pulse rates during the quiz and during a lecture.
pulse_data <- read.csv('https://www.lock5stat.com/datasets3e/QuizPulse10.csv') # pulse rate data for the quiz quiz <- pulse_data$Quiz # pulse rate data for the lecture lecture <- pulse_data$Lecture
$\$
Exercise 3.1 (13 points): For the first exercise let's treat the two samples as independent. Start by stating the null and alternative hypotheses in symbols and words. Then apply the following steps to conduct a two-sample t-test:
a. compute the mean and the variance (i.e., the standard deviation squared) for the quiz pulse rates b. compute the mean and the variance (i.e., the standard deviation squared) for the lecture pulse rates c. compute the standard error for two independent samples d. compute the t-statistic (and make sure that value is printed) e. compute the p-value (and make sure that value is printed) f. draw a conclusion with $\alpha$ = .05
State what you conclude from running this hypothesis test in the answer section.
$\$
State the null and alternative hypotheses here:
In symbols:
In words:
# a. compute the mean and standard deviation (or variance) for the quiz pulse rates # b. compute the mean and standard deviation (or variance) for the lecture pulse rates # c. compute the standard error # d. compute the t-statistic # e. compute the p-value
Answer:
$\$
Exercise 3.2 (13 points): For the second exercise let's use a paired sample t-test to see if this would change our conclusion. To do this create a new vector that has for each student, the pulse rates during a quiz minus the pulse rate during the lecture and call it diff_means_vector
. Let's call the mean of this vector of differences $\bar{x_d}$.
Start by stating the null and alternative hypothesis in symbols and words and then complete the following steps to run a hypothesis test:
a. plot the difference of pulse rates using the stripchart(diff_means_vector)
function
b. compute the standard error of the (paired) difference of scores
c. compute the t-statistic (and make sure that value is printed)
d. compute the p-value (and make sure that value is printed)
e. draw a conclusion with $\alpha$ = .05
Be sure to describe your conclusions in the answer section.
$\$
State the null and alternative hypotheses here:
In symbols:
In words:
# create a vector of the difference in pulse rate for quiz vs. lecture # a. plot the data using the stripchart() function: # b. compute the standard error of the difference of scores # c. compute the t-statistic # d. compute the p-value
Answer:
$\$
Exercise 3.3 (13 points): One other method we have used to test whether two means are different is simulation/randomization methods. Let's again treat the two samples as if they were independent and run a simulation/randomization test to assess whether there are differences in the quiz vs. lecture pulse rates by doing the following steps:
compute the observed statistic
create a null distribution: i.e., combine the data, then repeat the following 10,000 times: shuffle the combined data, separate into groups, calculate null statistic.
plot the null distribution and compute the p-value
draw a conclusion with $\alpha$ = .05
Be sure to describe your conclusions in the answer section.
Answer:
$\$
Exercise 3.4 (8 points): R actually has built in function to do t-tests which is called t.test()
. If we have two samples called sample1
and sample2
, we can do a two independent sample t-test using: t.test(sample1, sample2)
. The resulting p-value is for a two-tailed test. If we want the p-value for a right-tailed test we can set the argument to the function alternative = "greater" i.e., t.test(sample1, sample2, alternative = "greater")
. We can run a paired t-test by setting the the argument to the function paired = TRUE
.
Use R's t.test()
function to calculate p-values on the pulse data using an independent sample t-test and a paired sample t-test. Compare the results to the results you got using your own t-test. Do you notice any differences? Describe what might be causing the difference in results in the answer section below. Hint: R's t.test()
function is doing something slightly more sophisticated, so look carefully at the output of running this function to see if any of the values that R's t.test()
function has computed differ from the values you computed in the above exercises.
Answer:
$\$
Please state the question you will address in your final project and where you will get the data from. Also, load the data into R in the chunk below to verify that you can get the data into R.
Note: This question is listed as 0 points for this homework. However this same question will be asked again and listed as worth 5 points on the next homework, so please make sure you have a question and the relevant data ready by next week!
$\$
Please reflect on how the homework went by going to Canvas, going to the Quizzes link, and clicking on Reflection on homework 9.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.