ECICourse::upload_images_to_canvas()
knitr::opts_chunk$set(
  echo = TRUE,
  message = TRUE,
  collapse = TRUE
)

Read along with this tutorial, and run the R code in RStudio as you go (copy and paste, or type directly into R).

Start by loading packages:

library(ECICourse)
library(tidyverse)

The data for this tutorial describe the same fictional field experiment, conducted by an educational software company in cooperation with a local university, that was described in a previous tutorial. To recap: Students in the same degree program were randomized into two conditions before entering a required, core course. In the treatment condition, students received access to the educational software. In the control condition, students did not gain access to the software.

To obtain the data describing this experiment, run the following code:

fieldexp <- get_ECI_data('module-4-tutorial')

As in the previous tutorial, the variable fieldexp is a data frame with r ncol(fieldexp) columns and r nrow(fieldexp) rows. Each row describes a student. The column D indicates whether the student was assigned to the treatment condition (D == 1) or to the control condition (D == 0). The column Y reflects the student's grade at the end of the course.

Next, repeat a few preliminary steps to get (re-)acquainted with the data set. First, look at the first few observations:

fieldexp

Next, notice that half the students were assigned to the treatment and half to the control condition:

fieldexp |>
  count(D)

And finally, look at the distribution of grades in the two conditions:

ggplot(fieldexp, aes(x = factor(D), y = Y)) +
  geom_jitter(width = .1, height = .1) +
  labs(x = 'Treatment condition, D',
       y = 'Grade, Y') +
  theme_bw()

Average treatment effect

We will estimate the average treatment effect (ATE) for this experiment as the difference in values of Y when D == 1 versus when D == 0. These values are:

fieldexp |>
  group_by(D) |>
  summarise(Y_avg = mean(Y))

The code below: 1) calculates the average values of Y for each value of D, 2) converts rows of output into columns (using pivot_wider), leading to a column called Y_0 and another called Y_1, and 3) calculates their difference (subtracting Y_0 from Y_1). These steps produce a single number representing an estimate of the ATE.

fieldexp |>
  group_by(D) |>
  summarise(Y_avg = mean(Y), .groups = 'drop') |> 
  pivot_wider(names_from = 'D', values_from = 'Y_avg',
              names_prefix = 'Y_') |> 
    mutate(ATE_est = Y_1 - Y_0) |> 
    getElement('ATE_est')

Note: the .groups = 'drop' argument in the call to summarise prevents the message summarise() ungrouping output (override with .groups argument) from being printed.

Later, it will be useful to have this estimation procedure encapsulated in a single function. We will create that function now:

get_ATE <- function(df) {
  df |>
  group_by(D) |>
  summarise(Y_avg = mean(Y), .groups = 'drop') |> 
  pivot_wider(names_from = 'D', values_from = 'Y_avg',
              names_prefix = 'Y_') |> 
    mutate(ATE_est = Y_1 - Y_0) |> 
    getElement('ATE_est')
}
ate <- 
  fieldexp |>
  get_ATE()
ate
sate <- round(ate, 3)

Randomization inference

Should we trust that the estimated ATE of r sate reflects a true difference in grades caused by access to the software, or should we be concerned that this estimate is non-zero simply due to random sampling variation? We will use randomization inference to address this question.

We start by noting that there are many ways the experiment could have assigned the 320 students into two groups of 160 each. In fact, there are 320! / (160! 160!) different ways to assign students into these two groups. That's a huge number.

choose(320, 160)

This number is so large that R is unable to represent it precisely as an integer value (the number is almost 100 digits long), and instead represents it approximately as floating-point number. It is unfeasible to consider every one of these possible random assignments, so we will rely on randomization inference.

The idea behind randomization inference is this: We start with a working assumption that the sharp null hypothesis is correct. That is, our working assumption is that ITE is exactly 0 for every student. This is the same as assuming that the potential outcomes for each student under D == 0 and D == 1 are the same (if Y(1) - Y(0) = 0, then Y(1) = Y(0)). Hence, if the sharp null hypothesis is true, it must be the case that Y(1) = Y(0). That means that whatever value of Y is observed in the data for a given student will necessarily be equal to both Y(1) and Y(0) for that student.

Under the working assumption that the sharp null hypothesis is correct, the estimated value of the ATE we calculated (r sate) differs from 0 only because of the way students were randomized into the treatment and control groups.

This leads to the question: If this estimate is due entirely to sampling variation, how "typical" is the estimate we obtained? We cannot answer this question exactly, because the number of ways to randomize the students is too large. However, we can estimate an approximate answer to this question.

We will build this approximation in steps. Let's first consider a single, hypothetical assignment of students to treatment and control. This assignment (probably) differs from the one we obtained in the actual experiment, but it was equally likely to have occurred.

set.seed(1)             # so your results match mine
new_D <- sample(rep(0:1, 160), 320, replace = FALSE)
new_D
# verify that exactly half of the students are assigned to each condition
sum(new_D == 1)
sum(new_D == 0)

Had this particular random assignment actually occurred, and if the sharp null hypothesis is, in fact, true---then we would have obtained the following estimate of the ATE:

fieldexp |>
    # replace the random assignment from the experiment
    # with our new, hypothetical alternative assignment:
  mutate(D = new_D) |> 
    # estimate the ATE for this alternative assignment
  get_ATE()

Combining this operation into a single function:

get_random_ATE_under_sharp_null <- function() {
  new_D <- sample(rep(0:1, 160), 320, replace = FALSE)
  fieldexp |>
    mutate(D = new_D) |>
    get_ATE()
}

Each call to this function yields an estimate of the ATE that, if the sharp null hypothesis were true, would be equally likely as the estimate we actually obtained.

get_random_ATE_under_sharp_null()
get_random_ATE_under_sharp_null()
get_random_ATE_under_sharp_null()

If we simulate many such estimates of the ATE, we can consider how the estimate we initially obtained compares to these simulated estimates. So we will rerun this function many times (we'll use 1000 for this example), to generate a sample of 1000 hypothetical estimated ATEs from our experiment under the working hypothesis that the sharp null is true:

random_ATEs_under_sharp_null <-  
  map(1:1000, ~get_random_ATE_under_sharp_null()) |>
  unlist()

We can now compare the estimate of the ATE obtained from the experiment (r sate) to these simulated estimates generated under the assumption that the sharp null hypothesis is true.

ggplot(NULL) +
  stat_bin(aes(x = random_ATEs_under_sharp_null), bins = 50) +
  geom_vline(aes(xintercept = ate), colour = 'orange') +
    theme_bw()

The vertical line represents the estimated ATE produced from the actual experiment, and the values in the histogram represent other estimates of the ATE that might have occurred if the null hypothesis were to be true. Only a few of the 1000 simulated ATEs are more positive than the one we did obtain. We can calculate the fraction of simulated ATEs that are bigger than the one we actually obtained:

p <- mean(ate < random_ATEs_under_sharp_null)
p

This proportion is a statistic that we call a p-value. For this experiment, it is the estimated probability of obtaining an estimated ATE greater than the one we actually got, were the sharp null hypothesis to be true. The p-value for our estimated ATE under the sharp null hypothesis is approximately r p.

Using the coin package

The code above is meant to be instructive, so it has the dual responsibility of conducting a permutation test while also explaining how a permutation test works. Because permutation tests are important for interpreting the results of randomized experiments, there are R packages that can do it more efficiently than our code.

One such package is called coin. If you don't have this package installed already, you can do so now:

if (!require(coin)) {
  install.packages('coin')
}
library(coin)

The following code runs a permutation test that generates a p-value that is conceptually the same as the one we just produced.

perm_test <- 
  independence_test(Y ~ D, 
                    alternative = "greater", 
                    distribution = "approximate", 
                    data = fieldexp)
pvalue(perm_test)

The result of this permutation test is shown above. Moreover, because the permutation test uses a random subset of all possible assignments to treatment and control, the p-value itself is subject to sampling variation. Hence the output above shows a confidence interval around the p-value (notice that, in this case, the p-value we simulated earlier falls within this range).



jasonmtroos/ECICourse documentation built on Sept. 8, 2023, 9:18 p.m.