Big idea

This part introduces the big ideas: why we should do this, what data look like, and what's going on under the hood. It closes with basic package functionality.

Motivation

Gaps across social categories like race, class, and gender are important to understand. We would like to know whether there is anything we can do to close these gaps. What if we intervened to reduce incarceration or increase access to education? Would those interventions close gaps across categories of race, class, or gender?

These types of questions are at the core of a growing literature in epidemiology addresses these questions with techniques for causal decomposition analysis (@vanderweele2014, @jackson2018, @jackson2020). This package provides software to support inquiry into gap-closing estimands, as discussed in @lundberg2021.

A guiding principle is to distinguish human tasks from software tasks that can be automated.

As a human, you will:

Then package will:

Data structure

In a data frame data, we have a gap-defining category such as race, gender, or class. We have a binary treatment variable that could have counterfactually been different for any individual. We want to know the degree to which an intervention to change the treatment would close gaps across the categories.

These boxes will present an example.

Example. Suppose we have the following data.

  • $X$ (category): Category of interest, taking values {A, B, C}
  • $T$ (treatment): Binary treatment variable, taking values 0 and 1
  • $L$ (confounder): A continuous confounding variable, Uniform(-1,1)
  • $Y$ (outcome): A continuous outcome variable, conditionally normal
set.seed(08544)
library(gapclosing)
library(dplyr)
library(ggplot2)
simulated_data <- generate_simulated_data(n = 1000)
head(simulated_data)

Coding from scratch

With the most simple models, you can carry out a gap-closing analysis without the software package. First, fit any prediction function for the outcome as a function of the category of interest, confounders, and treatment.

example_ols <- lm(outcome ~ category*treatment + confounder,
                  data = simulated_data)

For everyone in the sample, predict under a counterfactual treatment value (e.g., treatment = 1).

fitted <- simulated_data %>%
  mutate(outcome_under_treatment_1 = predict(example_ols,
                                             newdata = simulated_data %>%
                                               mutate(treatment = 1)))

Average the counterfactual estimates within each category.

fitted %>%
  # Group by the category of interest
  group_by(category) %>%
  # Take the average prediction
  summarize(factual = mean(outcome),
            counterfactual = mean(outcome_under_treatment_1))

The software package supports these steps as well as more complex things you might want:

Basic package functionality

The gapclosing() function estimates gaps across categories and the degree to which they would close under the specified counterfactual_assignments of the treatment.

estimate <- gapclosing(
  data = simulated_data,
  counterfactual_assignments = 1,
  outcome_formula = formula(outcome ~ confounder + category*treatment),
  treatment_formula = formula(treatment ~ confounder + category),
  category_name = "category",
  se = TRUE,
  # Setting bootstrap_samples very low to speed this tutorial
  # Should be set higher in practice
  bootstrap_samples = 20,
  # You can process the bootstrap in parallel with as many cores as available
  parallel_cores = 1
)

By default, this function will do the following:

In this example, the plot(estimate) function produces the following visualization. The factual outcomes are unequal across categories, but the counterfactual outcomes are roughly equal. In this simulated setting, the intervention almost entirely closes the gaps across the categories.

plots <- plot(estimate, return_plots = TRUE)
plot(estimate)
print(plots[[1]] +
        ggtitle("First result of a call to plot()"))

The disparityplot() function lets us zoom in on the factual and counterfactual disparity between two categories, of interest. In this case, we see that the intervention lifts outcomes in category A to be more comparable to category B. A disparityplot is a ggplot2 object and can be customized by passing additional layers.

disparityplot(estimate, category_A = "A", category_B = "B") +
  ggtitle("A disparityplot()")

The summary function will print estimates, standard errors, and confidence intervals for all of these results.

old <- options()
options(width = 300)
summary(estimate)
options(old)


ilundberg/gapclosing documentation built on Dec. 12, 2024, 9:31 a.m.