In jasonmtroos/ECICourse: Data and Tools for my Experimentation and Causal Inference Course

ECICourse::upload_images_to_canvas()
knitr::opts_chunk$set(
  echo = TRUE,
  message = TRUE,
  collapse = TRUE
)

Read along with this tutorial, and run the R code in RStudio as you go (copy and paste, or type directly into R).

Start by loading packages:

library(ECICourse)
library(tidyverse)

Also load the DeclareDesign package:

library(DeclareDesign)

This tutorial introduces the DeclareDesign package. This package (really a family of packages including randomizr, fabricatr, and estimatr) provides a standardized interface for defining and evaluating the properties of an experimental design.

The package has a web site with a lot of introductory materials (link). The package is extremely flexible and powerful. It is also somewhat abstract, and for that reason it can be a bit confusing at first. The goal of this tutorial therefore is to help you get acquainted with the most basic features of the package, using a running example based a simple experiment with blocked random assignment.

Model-Inquiry-Data-Answer strategy

The DeclareDesign package is based on a philosophy that splits an experimental study into four parts. I will explain these briefly here. (For more information, you might want to look at the authors' article describing the MIDA approach [Blair et al., linked under Module 4 Reading].)

Imagine that we are designing an experiment to understand a population of 250 students, all of whom are enrolled in the same MSc program. Students will be randomized into one of two conditions, labeled "treatment" and "control." Students in the treatment group will watch a 10 minute video about recycling, whereas the control group will browse the internet for 10 minutes. The outcome of interest is students' attitudes towards recycling, which is measured on a scale of 1 to 100. We want to measure how much the video changes attitudes (if at all).

Gender plays a role in the study, and we know that 62% of students in this program identify as female. Although we do not expect the treatment effect to differ by gender, we do expect baseline attitudes towards recycling to differ. For that reason, the experiment will make use of a blocked randomization scheme. (Note: gender is often used as blocking or control variable because it is observed in data, and because it often correlates with latent variables that are relevant to the student. In this example experiment, gender is not related to the main purpose of the experiment, but block randomization by gender is useful because block randomization by baseline attitudes---which are not observed---is not possible.)

Model

The first step in using DeclareDesign is to model the population of interest. Ideally, our model of the population will be a good description of the population we are trying to study. In practice, it may be a rather stylized depiction.

population <- declare_population(
  N = 250,                        # number of students
  female =                        # 0 - not female; 1 - female
    complete_rs(N, prob = .62),   # proportion of female students = .62
  attitude_toward_recycling =
    pmin(100,                     # max rating is 100
         10 * (female == 1)  +    # higher among female students
           sample(
             x = seq_len(100),    # ratings between 1 and 100
             size = N,
             replace = TRUE
          ))
)

This code defines an object called population. There are N = 250 people in this population, of which 62% have female == 1. There is also a latent variable called attitude_toward_recycling drawn as a uniform random variable between 1 and 100, which represents an unmeasured, prior attitude towards recycling. Based on a review of the relevant academic literature, we expect attitudes towards recycling to be about 10 points higher (i.e., more positive) among female students, on average. Therefore, we include that difference in the model. (Note: The pmin function limits the maximum value of the attitude measure to be 100.)

The resulting object, called population, can be called like a function. If we do so, it will generate data for a random population based on our model. We can use this to check that (simulated) attitudes towards recycling are in fact higher among female students, and that none of the attitude measures are greater than 100:

population() |>
  group_by(female) |>
  summarise(across(attitude_toward_recycling, list(mean = mean, max = max)))

The next step in using DeclareDesign is to define the potential outcomes under treatment and control. As mentioned, the treatment group will watch a 10 minute video about recycling, and the control group will browse the internet for 10 minutes. The potential outcomes represent the students' attitudes towards recycling, measured on a scale of 1 to 100, after watching the recycling video (treatment), or after browsing the internet (control).

For the purpose of this simulation, we assume that the potential outcome after browsing the internet (i.e., Y(0)) is equal to the latent variable called attitude_toward_recycling. We expect the potential outcome after watching the video (i.e., Y(1)) to be higher. Let's say that we think that in the treatment group, attitudes could be as much as 5 points higher than in the control group. We reflect these expectations in our specification for the potential outcomes:

potential_outcomes <-
  declare_potential_outcomes(Y_Z_0 = attitude_toward_recycling,
                             Y_Z_1 = pmin(100, attitude_toward_recycling + 5))

In the DeclareDesign package, the treatment variable is, by default, called Z. You can change the name of this variable, but for this tutorial, we will stick with Z as the variable indicating assignment to the control group (Z == 0) or treatment group (Z == 1). (Also, notice that we've again restricted the maximum value of the attitude rating to be 100 using the pmin function in the code above.)

Inquiry

Next, we define what we want to measure about this population. In this example, we care about the average treatment effect (ATE) due to watching the recycling video (versus browsing the internet). This is the mean difference between Y(1) (denoted Y_Z_1 in DeclareDesign) and Y(0) (denoted Y_Z_0):

ate <- declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0))

It is possible to define other measures of interest, such as the ATT or ATC, but for this tutorial we will focus on estimating the ATE. In the code above, we are naming this estimand---the thing we want to estimate---ATE. We will use that name to refer back to this estimand later in the tutorial.

Data strategy

We will now model the sampling strategy, and how we plan to assign students to the experimental conditions. To start, let's say we plan to work with a complete random sample of 100 students. In practice, such a sample is extremely difficult to obtain. But for this tutorial, just assume we can do it.

exp_sample <- declare_sampling(S = complete_rs(N, n = 100))

In the code above, complete_rs is a function that performs complete randomization, where n is the size of the sample, and S is the name of a variable that will indicate whether an individual from the population is included in the sample. (N was defined previously as the size of the population.)

Next, we define a strategy for assigning students into the treatment and control groups. Our plan is to block assignment by the (observed) variable female, such that within each block (female == 0 or female == 1), half of the participants are randomly assigned to treatment, and half are assigned to control.

assignment <- 
  declare_assignment(
    Z = conduct_ra(blocks = female,          # blocking variable
                   block_prob = c(.5, .5)) ) # probability of assignment to 
                                             #   treatment for each block

In the code above, the conduct_ra function performs random assignment to conditions, with the blocks argument specifying which variable to use for block randomization, and block_prob indicating the probability of assignment to treatment within each blocks. As mentioned previously, Z is the name of the variable indicating treatment status.

The final step in defining the data strategy is to simulate how the observed outcome (Y) is related to the potential outcomes (Y_Z_0 and Y_Z_1) and the treatment variable (Z). We specify this in the following way:

outcomes <- declare_measurement(Y = reveal_outcomes(Y ~ Z))

At this point, we might be curious what the data for this experiment look like. We can combine all of the objects created so far and pass them to a function called draw_data. Each time we do this, we will obtain a different, random data set.

set.seed(1234)     # so your results match mine...
a_simulated_experiment <- 
    (population + potential_outcomes + ate +
    exp_sample + assignment + outcomes) |>
  draw_data() |>
  as_tibble()
a_simulated_experiment

The columns in the data frame are:

ID An identifier for each person in the population. Since the population has 250 people, the maximum value of ID is 250. Because we have specified a random sample of 100, some values if ID are not present in the simulated data.
female and attitude_toward_recycling are the background variables we constructed to describe individuals in this population based on our prior expectations.
Y_Z_0 and Y_Z_1 are the (true, but partially unobserved) potential outcomes under treatment and control.
S indicates if the individual was sampled. This data frame excludes anybody who was not sampled, so S == 1 for all rows.
Z is the treatment assignment variable, which was block randomized
Y is the observed outcome, equal to Y_Z_1 if Z == 1 and Y_Z_0 if Z == 0.

Regarding the block randomization---we can see that within the treatment and control groups, about 67% of participants are female (compared to 62% in the population).

a_simulated_experiment |> 
    group_by(Z) |> 
    summarise(mean(female))

The discrepancy between the actual and expected proportions is a result of working with a somewhat small sample of 100 students.

Answer strategy

The next step is to define how we will evaluate the data from the experiment. Let's say that our plan is to estimate the ATE using linear regression of Y on Z and female, using the lm function. We define such an estimator in the following way:

ate_estimator <- declare_estimator(Y ~ Z + female,
                                   .method = estimatr::lm_robust,
                                   inquiry = "ATE")

Note that the argument to inquiry is the name "ATE" which we used earlier when defining the true average treatment effect. By setting inquiry = "ATE" we are linking what we want to estimate (the true ATE) to what we are able to estimate (a coefficient in a linear regression).

Putting it all together

We have now described our planned experiment, including: the population of interest, our beliefs about the responses and treatment effect, what we plan to estimate, our sampling strategy and assignment procedures, and our planned analysis of the resulting data. We can combine all of these into a single object called design:

design <- population + potential_outcomes + ate +
  exp_sample + assignment + outcomes + ate_estimator

Now...what do we do with this? Well, as noted above, we can draw a random data set from this design:

design |>
  draw_data() |>
  as_tibble()

But that is just a single random outcome based on the planned experiment. The real value of the DeclareDesign package comes from its ability to simulate the entire experiment and analysis and report on how well we have recovered what we hoped to measure. For example, if we "print" the value of the object called design, it will perform all of these steps, and report the results of any estimators we have defined:

design

But that's just a single random simulation. What we would like to do is repeatedly simulate the experiment many times, and understand how well our experiment performs in terms of measuring the causal effect(s) we care about.

diagnosis <- diagnose_design(design)

The diagnose_design function simulates the full experiment---everything from creating the population and drawing a random sample of 100 students, through random assignment and estimation of the average treatment effect---500 times. It then compares the true treatment effect (the true ATE) to the value it estimates via regression (the coefficient for Z in the linear regression Y ~ Z + female). Now we can print out a summary of the results.

diagnosis

The estimated Bias for the experiment is the estimated difference between the "true" (simulated) ATE and the estimated (with lm) ATE. In this experiment, the bias is small relative to its standard error, meaning we do not find strong evidence that the estimate from linear regression is biased for the ATE. That's good news, because it means we can potentially estimate the treatment effect.

The expected Power for the experiment, however, is very low (just r round(diagnosis$diagnosands_df$power, 2)). This means that if everything in the real-world experiment is exactly as we have coded it---if the treatment effect is truly about 5 points on a scale from 1 to 100, if we sample 100 people from a population of 250, etc.---then the probability of obtaining a p-value less than .05 and rejecting the null of zero effect should be about about r round(diagnosis$diagnosands_df$power, 2). Stated differently, even if the treatment effect is r round(diagnosis$diagnosands_df$mean_estimand, 2), the chance of detecting this effect with this version of the experiment is only r round(diagnosis$diagnosands_df$power, 2). For this experiment, I think that is too low, so we need to modify the experiment or cancel it. Running it as currently planned wouldn't be worth the cost.

A bigger sample?

What if we were to double the sample size from 100 to 200? To answer that question, we will need to modify the design, and then diagnose this new design.

exp_sample_bigger <- 
  declare_sampling(S = complete_rs(N, n = 200)) # bigger sample
design_with_bigger_sample <-
  design |>
  replace_step(exp_sample, exp_sample_bigger)   # swap in the new sample size
diagnosis_bigger_sample <-
  diagnose_design(design_with_bigger_sample)    # diagnose
diagnosis_bigger_sample

The estimated Power is now higher (r round(diagnosis_bigger_sample$diagnosands_df$power, 2) versus r round(diagnosis$diagnosands_df$power, 2)), but still unacceptably low: With this version of the experiment, we would only have about a 1 in 5 chance of detecting a r round(diagnosis_bigger_sample$diagnosands_df$mean_estimand, 2) point increase on the 100-point attitude scale.

Within-subjects design?

What if we change the experiment to have a within-subjects design? First, this would entail a change in procedure, because we would need to measure attitudes before and after treatment. The outcome variable would then be defined as the difference between the two attitude measurements.

To simulate this design, we will assume that attitude_toward_recycling is measured (and thus observed) as part of the experiment. We will also redefine the potential outcomes to match this modified design. Moreover, it is unrealistic that students would provide exactly the same attitude rating twice, so we will also simulate ±15 points of random noise, and add this to each student's second response (while using pmin and pmax to keep the simulated responses within the range of 1 to 100, which I'm now doing with a function I wrote called clamp).

# a function to limit values between 1 and 100
clamp <- function(x, min = 1, max = 100) {
  pmax(1, pmin(100, x))
}
potential_outcomes_within_student <- 
  declare_potential_outcomes(
    Y_Z_0 = clamp(attitude_toward_recycling              # initial rating
              + sample(seq(-15, 15), N, replace = TRUE)) # noise
            - attitude_toward_recycling,                       
    Y_Z_1 = clamp(attitude_toward_recycling              # initial rating
              + 5                                        # treatment effect
              + sample(seq(-15, 15), N, replace = TRUE)) # noise
            - attitude_toward_recycling)

These potential outcomes are different from what we defined earlier. Now, Y is the difference in two ratings:

The first rating is attitude_toward_recycling, which we defined previously as a latent (unobserved) variable, but which is now assumed to be measured prior to treatment.
The second rating is equal to the first rating, plus random noise (anywhere from -15 to 15 points difference), plus 5 points (for Y_Z_1). The resulting amount must be between 1 and 100, so it is passed through the clamp function.

Because we are now conducting a within-subjects experiment, the value of block randomization (which we used to account for differences in baseline attitudes towards recycling) may now be rather limited. To emphasize this point, we will assume random assignment without blocking on gender:

assignment_no_blocking <- 
  declare_assignment(
    Z = conduct_ra(N = N, prob = .5)) # probability of assignment to treatment

The outcome variable Y is the value of the second rating minus the first. The modified design now becomes:

design_within_student <-
  design |>
  replace_step(potential_outcomes, potential_outcomes_within_student) |> 
    replace_step(assignment, assignment_no_blocking)
diagnosis_within_student <-
  diagnose_design(design_within_student)
diagnosis_within_student

Notice that power is much higher (r round(diagnosis_within_student$diagnosands_df$power, 2)) with this design. If we choose to run this experiment, we will probably want to run this version, based on the difference of two measurements for each individual.

Regarding block randomization: We can also compare the experiment defined above (within-subjects measurement, no block randomization) to an alternative that uses block randomization based on gender (as in the earlier experimental design):

design_within_student_blocked <-
  design_within_student |>
  replace_step(assignment_no_blocking, assignment)
diagnosis_within_student_blocked <-
  diagnose_design(design_within_student_blocked)
diagnosis_within_student_blocked

Notice that the power for this experiment is about the same as in the version without block randomization. Again, this is because in this within-subject design, we are now planning to measure the variable that motivated the use of block randomization in the first place.

A simpler example

This next example is meant to give you an overview of the entire process with a simpler experiment and less discussion, so everything appears all in one place.

Population: 10,000 customers
Sample: A random sample of 500 customers

pop <- declare_population(N = 10000)
smp <- declare_sampling(S = complete_rs(N, n = 500))

Treatment (50%): We send the customer a hand-written note saying how much we love them
Control (50%): No hand-written note

asn <- declare_assignment(Z = complete_ra(N = N, prob = .5))

Outcome: Customer satisfaction, rated on a 5 point scale. From past experience with this scale, we have an idea of what the distribution of ratings will be in the control condition (see code for details). We think the note might improve satisfaction ratings by 1 point among 1/4 of our customers (unless the rating is already 5, in which case there cannot be an improvement).

pos <- declare_potential_outcomes(
  Y_Z_0 = draw_ordered(x = rnorm(N), breaks = c(-2, -1.5, -.5, 1)),
  Y_Z_1 = pmin(Y_Z_0 + 1*(runif(N) < .25), 5))  # increase by 1 point 
                                                # for 25% of customers
out <- declare_measurement(Y = reveal_outcomes(Y ~ Z))

Inquiry and estimator: Average treatment effect, estimated using a simple difference in means estimator

ate <- declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0))
est <- declare_estimator(Y ~ Z,
                         .method = estimatr::difference_in_means,
                         inquiry = 'ATE')

Here is the complete design:

design <- pop + smp + pos + ate + asn + out + est
diagnose_design(design)

jasonmtroos/ECICourse documentation built on Sept. 8, 2023, 9:18 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jasonmtroos/ECICourse
Data and Tools for my Experimentation and Causal Inference Course

In jasonmtroos/ECICourse: Data and Tools for my Experimentation and Causal Inference Course

Model-Inquiry-Data-Answer strategy

Model

Inquiry

Data strategy

Answer strategy

Putting it all together

A bigger sample?

Within-subjects design?

A simpler example

R Package Documentation

Browse R Packages

We want your feedback!

jasonmtroos/ECICourse Data and Tools for my Experimentation and Causal Inference Course

In jasonmtroos/ECICourse: Data and Tools for my Experimentation and Causal Inference Course

Model-Inquiry-Data-Answer strategy

Model

Inquiry

Data strategy

Answer strategy

Putting it all together

A bigger sample?

Within-subjects design?

A simpler example

R Package Documentation

Browse R Packages

We want your feedback!

jasonmtroos/ECICourse
Data and Tools for my Experimentation and Causal Inference Course