In cknotz/bst290: An R Package To Complement the BST290 (UiS) Course

library(learnr)
library(ggplot2)
library(bst290)
library(dplyr)
library(magrittr)
knitr::opts_chunk$set(echo = FALSE)

ess <- bst290::ess

Introduction {data-progressive=TRUE}

In this set of exercises, you practice calculating and interpreting confidence intervals.

To keep things simple, we will work with the small ess practice dataset from the bst290 package, and use the weight variable that measures people's body weight (in kg). This means we want to estimate the average weight in the entire Norwegian population using our sample data.

The variable is already a numeric one (so no transformations are needed!) and distributed as shown below:

ess %>% 
  ggplot(aes(x = weight)) +
    geom_histogram(color = "black", bins = 30) +
    labs(x = "Body weight (kg)",
         y = "Observations") +
    theme_bw()

Analysis {data-progressive=TRUE}

Since we want the confidence interval for the sample mean, we should first find out what the sample mean is. Can you add code below to calculate the mean of the weight variable in the ess dataset?

Obviously, you just needed to use the mean() function --- and because the variable contains some missing values (NA), you needed to set the na.rm option to TRUE:

mean(ess$weight, na.rm = TRUE)

Now that we have the mean value, we can calculate a confidence interval for it. But let's do a quick quiz before we get to coding:

question("Which function would you use to let `R` calculate a confidence interval for a sample mean?",
         answer("`conf.int()`"),
         answer("`t.test()`", correct = T,
                message = "This might be a bit confusing, but the correct function is indeed the `t.test()` function that you also use to calculate *t*-tests. You will understand better why that is when we talk about the *t*-test in a few days."),
         answer("`ci()`"),
         random_answer_order = T, allow_retry = T)

Since you know what the right function is (or your memory has been refreshed now), can you use the correct function to calculate a 95% confidence interval for the sample mean of the weight variable?

This is also straightforward: You just use the t.test() function and specify the weight variable from the ess dataset as your x-variable. The function calculates a 95% confidence interval by default, so you don't need to do anything about that.

t.test(x = ess$weight)

Can you now change the confidence level to 99%?

t.test(x = ess$weight)

Obviously, all you needed to do was to set the conf.level option to 0.99:

t.test(x = ess$weight, conf.level = 0.99)

Can you also set the confidence level to 83%?

t.test(x = ess$weight, conf.level = 0.99)

Also easy: You just adjust the conf.level option:

t.test(x = ess$weight, conf.level = 0.83)

This was just to show you that you can set that level to really any number between 0 and 1 (i.e., 0 and 100 percent).

Let's take another look at the 95% percent confidence interval:

t.test(x = ess$weight, conf.level = 0.95)

question("What is the correct interpretation of this confidence interval? (Only one of these is true!)",
         answer(paste0("95% of our observations lie within the confidence interval from ",round(t.test(ess$weight)$conf.int[1],digits = 2)," to ",round(t.test(ess$weight)$conf.int[2],digits = 2),"."),
                message = "Maybe you thought about the ''68-95-99 rule'', which predicts that 95% of the data of a normally distributed variable are within 2 standard deviations from the mean? That rule is of course true, but: In this particular context here, we apply this rule not to the sample data (''our observations'') but to the distribution of all possible measurements of the sample mean - the ''sampling distribution''. This is an important difference, and is explained in Kellstedt & Whitten (2018, p. 152)."),
         answer(paste0("The confidence interval from ",round(t.test(ess$weight)$conf.int[1],digits = 2)," to ",round(t.test(ess$weight)$conf.int[2],digits = 2)," includes the true population mean with a probability of 95%."), correct = T,
                message = "If you could really see that this, and only this, is the correct interpretation: Congrats, very well done! Interpreting confidence intervals is really tricky. Just always keep in mind: We are saying something about the confidence interval, this interval, and nothing else."),
         answer(paste0("With a 95% probability, the true population mean lies within the interval from ",round(t.test(ess$weight)$conf.int[1],digits = 2)," to ",round(t.test(ess$weight)$conf.int[2],digits = 2),"."),
                message = "Don't worry: This is a very common misinterpretation, and the issue is a really tiny nuance in the formulation. But nuances matter a lot here! When we interpret a confidence interval, we are always focusing on the interval itself: This interval includes the true mean with a given probability - and that is all we can say. With the statement here, however, we are trying to say something about the value of the true population mean and that is strictly speaking not correct because we will never know the value of the true mean, not even with a probability."),
         allow_retry = T, random_answer_order = T)

This might be tricky --- but if you get it right (and you also understand why), then you really understand confidence intervals. If you are unsure, you can look at the confidence interval simulation in the Practice Statistics dashboard (bst290::practiceStatistics()).

cknotz/bst290 documentation built on Nov. 11, 2023, 1 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com