library(learnr) library(ggplot2) library(bst290) library(dplyr) library(magrittr) knitr::opts_chunk$set(echo = FALSE) ess <- bst290::ess
In this set of exercises, you practice calculating and interpreting confidence intervals.
To keep things simple, we will work with the small ess
practice dataset from the bst290
package, and use the weight
variable that measures people's body weight (in kg). This means we want to estimate the average weight in the entire Norwegian population using our sample data.
The variable is already a numeric one (so no transformations are needed!) and distributed as shown below:
ess %>% ggplot(aes(x = weight)) + geom_histogram(color = "black", bins = 30) + labs(x = "Body weight (kg)", y = "Observations") + theme_bw()
Since we want the confidence interval for the sample mean, we should first find out what the sample mean is. Can you add code below to calculate the mean of the weight
variable in the ess
dataset?
Obviously, you just needed to use the mean()
function --- and because the variable contains some missing values (NA
), you needed to set the na.rm
option to TRUE
:
mean(ess$weight, na.rm = TRUE)
Now that we have the mean value, we can calculate a confidence interval for it. But let's do a quick quiz before we get to coding:
question("Which function would you use to let `R` calculate a confidence interval for a sample mean?", answer("`conf.int()`"), answer("`t.test()`", correct = T, message = "This might be a bit confusing, but the correct function is indeed the `t.test()` function that you also use to calculate *t*-tests. You will understand better why that is when we talk about the *t*-test in a few days."), answer("`ci()`"), random_answer_order = T, allow_retry = T)
Since you know what the right function is (or your memory has been refreshed now), can you use the correct function to calculate a 95% confidence interval for the sample mean of the weight
variable?
This is also straightforward: You just use the t.test()
function and specify the weight
variable from the ess
dataset as your x
-variable. The function calculates a 95% confidence interval by default, so you don't need to do anything about that.
t.test(x = ess$weight)
Can you now change the confidence level to 99%?
t.test(x = ess$weight)
Obviously, all you needed to do was to set the conf.level
option to 0.99
:
t.test(x = ess$weight, conf.level = 0.99)
Can you also set the confidence level to 83%?
t.test(x = ess$weight, conf.level = 0.99)
Also easy: You just adjust the conf.level
option:
t.test(x = ess$weight, conf.level = 0.83)
This was just to show you that you can set that level to really any number between 0 and 1 (i.e., 0 and 100 percent).
Let's take another look at the 95% percent confidence interval:
t.test(x = ess$weight, conf.level = 0.95)
question("What is the correct interpretation of this confidence interval? (Only one of these is true!)", answer(paste0("95% of our observations lie within the confidence interval from ",round(t.test(ess$weight)$conf.int[1],digits = 2)," to ",round(t.test(ess$weight)$conf.int[2],digits = 2),"."), message = "Maybe you thought about the ''68-95-99 rule'', which predicts that 95% of the data of a normally distributed variable are within 2 standard deviations from the mean? That rule is of course true, but: In this particular context here, we apply this rule not to the sample data (''our observations'') but to the distribution of all possible measurements of the sample mean - the ''sampling distribution''. This is an important difference, and is explained in Kellstedt & Whitten (2018, p. 152)."), answer(paste0("The confidence interval from ",round(t.test(ess$weight)$conf.int[1],digits = 2)," to ",round(t.test(ess$weight)$conf.int[2],digits = 2)," includes the true population mean with a probability of 95%."), correct = T, message = "If you could really see that this, and only this, is the correct interpretation: Congrats, very well done! Interpreting confidence intervals is really tricky. Just always keep in mind: We are saying something about the confidence interval, this interval, and nothing else."), answer(paste0("With a 95% probability, the true population mean lies within the interval from ",round(t.test(ess$weight)$conf.int[1],digits = 2)," to ",round(t.test(ess$weight)$conf.int[2],digits = 2),"."), message = "Don't worry: This is a very common misinterpretation, and the issue is a really tiny nuance in the formulation. But nuances matter a lot here! When we interpret a confidence interval, we are always focusing on the interval itself: This interval includes the true mean with a given probability - and that is all we can say. With the statement here, however, we are trying to say something about the value of the true population mean and that is strictly speaking not correct because we will never know the value of the true mean, not even with a probability."), allow_retry = T, random_answer_order = T)
This might be tricky --- but if you get it right (and you also understand why), then you really understand confidence intervals. If you are unsure, you can look at the confidence interval simulation in the Practice Statistics dashboard (bst290::practiceStatistics()
).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.