ECICourse::upload_images_to_canvas() knitr::opts_chunk$set( echo = TRUE, message = TRUE, collapse = TRUE )
Read along with this tutorial, and run the R code in RStudio as you go (copy and paste, or type directly into R).
Start by loading packages:
library(ECICourse) library(tidyverse)
The data for this tutorial describe a fictional field experiment conducted by an educational software company in cooperation with a local university. Students (all in the same degree program) were randomized into two conditions before enrolling in a required, core course. In the treatment condition, students received access to the educational software. In the control condition, students did not gain access to the software.
In this tutorial, you will go through the steps needed to calculate the average treatment effect (ATE) of having access to the educational software on students' grades in the course.
To obtain the data describing this experiment, run the following code:
fieldexp <- get_ECI_data('module-2-tutorial')
The variable fieldexp
is a data frame with r ncol(fieldexp)
columns and r nrow(fieldexp)
rows. Each row describes a student. The column D
indicates whether the student was assigned to the treatment condition (D == 1
) or to the control condition (D == 0
). The column Y
reflects the student's grade at the end of the course.
Look at the first few observations:
fieldexp
Half the students were assigned to the treatment and half to the control condition:
fieldexp |> count(D)
Next, look at the distribution of grades in the two conditions:
ggplot(fieldexp, aes(x = factor(D), y = Y)) + geom_jitter(width = .1, height = .1) + labs(x = 'Treatment condition, D', y = 'Grade, Y') + theme_bw()
There are no obvious problems with the dataโmost of the grades are 6 or higher, there are very few 10's, etc. In other words, nothing stands out as problematic. So let's continue...
What was the average grade among students who had access to the software (the treatment group, those with D == 1
)? To answer this, we need to calculate the average value of Y
among students who had D == 1
, or in other words, the mean of Y
, conditional on D == 1
.
To calculate the average outcome (grade) among students who received access to the software (students with D == 1
), we first filter the data frame, keeping only rows with D == 1
, and then we calculate mean(Y)
:
fieldexp |> filter(D == 1) |> summarise(Y = mean(Y))
Next, to calculate the average grade among students who did not receive the software (the control group, those with D == 0
), we follow an analogous procedure:
fieldexp |> filter(D == 0) |> summarise(Y = mean(Y))
Alternatively, we can use group_by
to calculate both of these averages at the same time. In the code below, we summarize the mean of Y
separately for each of the two groups defined by the two values of D
. We will also use mutate
to assign descriptive labels to the values in D
(renaming D == 0
to 'Control'
and D == 1
to 'Treatment'
).
fieldexp |> group_by(D) |> summarise(Y = mean(Y)) |> mutate(D = case_when(D == 0 ~ 'Control', D == 1 ~ 'Treatment'))
Finally, we can use pivot_wider
to transform the output, creating two new columns based on the contents of each row:
fieldexp |> group_by(D) |> summarise(Y = mean(Y)) |> mutate(D = case_when(D == 0 ~ 'Control', D == 1 ~ 'Treatment')) |> pivot_wider(names_from = 'D', values_from = 'Y')
The average treatment effect (ATE) is ๐ผ[Y(1) - Y(0)], or equivalently, ๐ผ[Y(1)] - ๐ผ[Y(0)]. We don't observe the potential outcomes Y(1) and Y(0). Thus, to estimate the ATE, we work under the assumption that the conditional averages of Y
given D == 1
and Y
given D == 0
(which we just calculated above) are consistent estimators of ๐ผ[Y(1)] and ๐ผ[Y(0)] respectively. This assumption (which is reasonable since we randomized students into treatment and control) allows us to estimate of the average treatment effect by taking the difference between the average values of Y
when D
is either 1 or 0.
fieldexp |> group_by(D) |> summarise(Y = mean(Y)) |> mutate(D = case_when(D == 0 ~ 'Control', D == 1 ~ 'Treatment')) |> pivot_wider(names_from = 'D', values_from = 'Y') |> mutate(ATE = Treatment - Control)
The average treatment effect is the value in the column labeled ATE
. Therefore, the ATE for this experiment is r round(lm(Y ~ D, data = fieldexp)$coef[2], 3)
. This is the expected difference in students' grades caused by having access to the educational software.
An alternative way to calculate the treatment effect is to perform a linear regression of Y
on D
:
fit <- lm(Y ~ D, data = fieldexp) coef(fit)
In this regression, the coefficient for D
(r coef(fit)['D']
) is equal to the ATE, the intercept is equal to the average value of Y
when D == 0
(r coef(fit)['(Intercept)']
), and the sum of the intercept and the coefficient for D
is equal to the average value of Y
when D == 1
(r sum(coef(fit))
). That is,
ATE = ๐ผ[Y(1) - Y(0)] = r coef(fit)['D']
,
๐ผ[Y(0)] = r coef(fit)['(Intercept)']
, and
๐ผ[Y(1)] = ๐ผ[Y(0)] + ATE = r coef(fit)['(Intercept)']
+ r coef(fit)['D']
= r sum(coef(fit))
.
Compare these values to the linear regression coefficients and convince yourself this is true.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.