ECICourse::upload_images_to_canvas()
knitr::opts_chunk$set(
  echo = TRUE,
  message = TRUE,
  collapse = TRUE
)

Read along with this tutorial, and run the R code in RStudio as you go (copy and paste, or type directly into R).

Start by loading packages:

library(ECICourse)
library(tidyverse)

The data for this tutorial describe a fictional field experiment conducted by an educational software company in cooperation with a local university. Students (all in the same degree program) were randomized into two conditions before enrolling in a required, core course. In the treatment condition, students received access to the educational software. In the control condition, students did not gain access to the software.

In this tutorial, you will go through the steps needed to calculate the average treatment effect (ATE) of having access to the educational software on students' grades in the course.

Obtain the data

To obtain the data describing this experiment, run the following code:

fieldexp <- get_ECI_data('module-2-tutorial')

The variable fieldexp is a data frame with r ncol(fieldexp) columns and r nrow(fieldexp) rows. Each row describes a student. The column D indicates whether the student was assigned to the treatment condition (D == 1) or to the control condition (D == 0). The column Y reflects the student's grade at the end of the course.

Look at the first few observations:

fieldexp

Half the students were assigned to the treatment and half to the control condition:

fieldexp |> 
  count(D)

Next, look at the distribution of grades in the two conditions:

ggplot(fieldexp, aes(x = factor(D), y = Y)) +
  geom_jitter(width = .1, height = .1) +
  labs(x = 'Treatment condition, D',
       y = 'Grade, Y') +
  theme_bw()

There are no obvious problems with the dataโ€”most of the grades are 6 or higher, there are very few 10's, etc. In other words, nothing stands out as problematic. So let's continue...

Average grades in the treatment and control conditions

What was the average grade among students who had access to the software (the treatment group, those with D == 1)? To answer this, we need to calculate the average value of Y among students who had D == 1, or in other words, the mean of Y, conditional on D == 1.

To calculate the average outcome (grade) among students who received access to the software (students with D == 1), we first filter the data frame, keeping only rows with D == 1, and then we calculate mean(Y):

fieldexp |> 
  filter(D == 1) |> 
  summarise(Y = mean(Y))

Next, to calculate the average grade among students who did not receive the software (the control group, those with D == 0), we follow an analogous procedure:

fieldexp |>
  filter(D == 0) |>
  summarise(Y = mean(Y))

Alternatively, we can use group_by to calculate both of these averages at the same time. In the code below, we summarize the mean of Y separately for each of the two groups defined by the two values of D. We will also use mutate to assign descriptive labels to the values in D (renaming D == 0 to 'Control' and D == 1 to 'Treatment').

fieldexp |>
  group_by(D) |>
  summarise(Y = mean(Y)) |>
  mutate(D = case_when(D == 0 ~ 'Control',
                       D == 1 ~ 'Treatment'))

Finally, we can use pivot_wider to transform the output, creating two new columns based on the contents of each row:

fieldexp |>
  group_by(D) |>
  summarise(Y = mean(Y)) |>
  mutate(D = case_when(D == 0 ~ 'Control',
                       D == 1 ~ 'Treatment')) |>
  pivot_wider(names_from = 'D', values_from = 'Y')

Average treatment effect

The average treatment effect (ATE) is ๐”ผ[Y(1) - Y(0)], or equivalently, ๐”ผ[Y(1)] - ๐”ผ[Y(0)]. We don't observe the potential outcomes Y(1) and Y(0). Thus, to estimate the ATE, we work under the assumption that the conditional averages of Y given D == 1 and Y given D == 0 (which we just calculated above) are consistent estimators of ๐”ผ[Y(1)] and ๐”ผ[Y(0)] respectively. This assumption (which is reasonable since we randomized students into treatment and control) allows us to estimate of the average treatment effect by taking the difference between the average values of Y when D is either 1 or 0.

fieldexp |>
  group_by(D) |>
  summarise(Y = mean(Y)) |>
  mutate(D = case_when(D == 0 ~ 'Control',
                       D == 1 ~ 'Treatment')) |>
  pivot_wider(names_from = 'D', values_from = 'Y') |>
  mutate(ATE = Treatment - Control)

The average treatment effect is the value in the column labeled ATE. Therefore, the ATE for this experiment is r round(lm(Y ~ D, data = fieldexp)$coef[2], 3). This is the expected difference in students' grades caused by having access to the educational software.

An alternative way to calculate the treatment effect is to perform a linear regression of Y on D:

fit <- lm(Y ~ D, data = fieldexp)
coef(fit)

In this regression, the coefficient for D (r coef(fit)['D']) is equal to the ATE, the intercept is equal to the average value of Y when D == 0 (r coef(fit)['(Intercept)']), and the sum of the intercept and the coefficient for D is equal to the average value of Y when D == 1 (r sum(coef(fit))). That is,

Compare these values to the linear regression coefficients and convince yourself this is true.



jasonmtroos/ECICourse documentation built on Sept. 8, 2023, 9:18 p.m.