library(learnr) library(gradethis) library(umdpsyc) knitr::opts_chunk$set(echo = FALSE) knitr::opts_chunk$set(exercise.checker = gradethis::grade_learnr) learnr::tutorial_options( exercise.timelimit = 60)
One of the basic inferential analyses you can do with data is a hypothesis test. R is very helpful for performing t-tests, with built-in functions allowing you run a variety of different types of tests, including one sample and two sample tests, with ease.
We will be using the exam_scores
dataset in this tutorial to look at how to use R to do some basic t-tests.
data("exam_scores") head(exam_scores)
As a reminder, the steps to a hypothesis are:
1) State your hypotheses.
2) Set the criterion.
3) Collect sample data.
4) Calculate the test statistic.
5) Evaluate and interpret.
Most of these steps will be done outside of R, and in fact, they can generally only be done outside of R. However, R can be very useful for the computational part, step 4. We won't go over every single step here and focus mostly on the R part, so remember to keep in mind that the code is only one small portion of the hypothesis testing process.
Suppose a professor tried to calibrate an exam so that it had an 80 average grade, and they want to know whether mean of all students who would take the exam would be 80. Assuming that the 90 students in the exam_scores
dataset are an independent and random sample, how might the professor test whether the exam was calibrated correctly (that is, the mean is 80).
The hypotheses would look like this.
$$H_0: \mu = 80 $$ $$H_1: \mu \ne 80$$ In the examples in this tutorial, we will use an alpha level of 0.05. In general, the R code won't change depending on the alpha level that you decide, though the code will change if you want different confidence levels for confidence intervals.
t.test()
We can do the t-test by hand by calculating the test statistic and finding the critical value, but R
also has a convenient way of doing it using just the data with the t.test
function. You do need to make sure you still go through all of the steps of performing a hypothesis test though -- the t.test()
function will perform the test for you, but you need to make sure to provide all of the appropriate arguments.
Let's look at doing a simple two-sided test, with a null hypothesis that $\mu = 12000$. We provide the vector of data points, as well as an argument providing the null value. In this case, since the value is in thousands, we use mu = 80
.
t.test(exam_scores$exam1, mu = 80)
The output provides the test statistic, degrees of freedom, and the p-value in the top line. In this case, our p-value was 0.03033, so we would reject the null hypothesis (remember, our alpha level was 0.05). Below, we get a few more pieces of information, such as the actual sample mean.
By default, this does a two-sided test. What if we wanted to do a one-sided test? We can use the alternative
argument to change it.
# These are how to do one-sided tests. # Note: This is just to show that it's possible! Don't # change the sidedness after the fact like this! t.test(exam_scores$exam1, mu = 80, alternative = "greater") t.test(exam_scores$exam1, mu = 80, alternative = "less")
Note that this doesn't provide the conclusion of the test for you. You need to make the correct conclusion from these results. Basically, you just need to take the p-value and compare it to the alpha level like before.
Lastly, notice that the output gives you information about the confidence interval. You can use the t.test()
function to create confidence intervals as well. Just make sure you use the conf.level
argument to specify the type of confidence interval you want.
t.test(exam_scores$exam1, conf.level = 0.99)
If you're just making a confidence interval, you can ignore the rest of the information. Here, I ran t.test()
without anything for mu
-- this gave me a test comparing it to 0, but I don't need that information since all I need is the confidence interval.
Doing an independent samples t-test can be done one of two ways. If you have the data from the two samples in two vectors, you can simply use those vectors to run the t-test. In a data frame, this would mean that the two samples are stored as two separate columns.
If we wanted to run a t-test to see if there was as signficant difference between exam 1 and exam 2, then we could use the following code.
t.test(exam_scores$exam1, exam_scores$exam2)
What if the data were stored in a way such that all the values were in one column, with the corresponding groups in another column? That is, for example, what if we wanted to do an independent samples t-test for exams scores of students who took notes by paper compared to students who took notes by computer?
We can use formula notation in this case.
t.test(exam1 ~ note_mode, data = exam_scores)
On the left of the tilde (~
), we have the numerical variable that we want to use the mean of. On the right side, we put the categorical variable that denotes which person is in which group. The data = exam_scores
argument, as before, denotes which data to pull these columns from.
Above, we used an independent sample t-test to look at the difference between exam 1 and exam 2 scores. This isn't quite how we should approach this, though. Remember, each student took both exams, so this is actually a paired t-test, since the scores on each exam are paired according to which student took them. We can do a paired t-test by adding an argument paired = TRUE
to the t.test
function.
t.test(exam_scores$exam1, exam_scores$exam2, paired = TRUE)
The professor designed exam 2 to be a little bit easier, and expects the average grade for students taking the exam to be 85. However, the professor is worried that the exam might be too easy and wants to see if the average grade on the exam would be higher. Is the mean scores on exam 2 signficantly greater than 85? Use an alpha level of 0.05 and perform the test using R.
grade_result( pass_if(~ identical(.result, t.test(exam_scores$exam2, mu = 85, alternative = 'greater')), "Good job!"), fail_if(~ TRUE, "Not quite. Try again.") )
learnr::question("What would be the conclusion from the hypothesis test in Exercise 1?", answer("Reject the null hypothesis.", correct=FALSE), answer("Fail to reject the null hypothesis.", correct = TRUE, message = "Good job!"), allow_retry = TRUE, random_answer_order = TRUE )
Suppose you are interested in whether there is a relationship between the method of note-taking and the score on exam 2. That is, in words, your null hypothesis is that the mean score on exam 2 for students who took notes by paper is the same as the mean score on exam 2 for students who took notes by computer. Your alternative hypothesis is that they are not the same. Use an alpha level of 0.05 and perform the test using R.
grade_result( pass_if(~ identical(.result, t.test(exam2 ~ note_mode, data = exam_scores)), "Good job!"), fail_if(~ TRUE, "Not quite. Try again.") )
learnr::question("What would be the conclusion from the hypothesis test in Exercise 3?", answer("Fail to reject the null hypothesis.", correct=FALSE), answer("Reject the null hypothesis.", correct = TRUE, message = "Good job!"), allow_retry = TRUE, random_answer_order = TRUE )
submission_code <- function(UID){ httr::sha1_hash('ttest123',as.character(UID)) }
Generate your submission code by putting in your UID in the function below. For example, if your UID is 2
, then your code should look like submission_code(UID = 2)
# Replace the number below with your UID submission_code(2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.