In ilundberg/gapclosing: Estimate Gaps Under an Intervention

Big idea

This part introduces the big ideas: why we should do this, what data look like, and what's going on under the hood. It closes with basic package functionality.

Motivation

Gaps across social categories like race, class, and gender are important to understand. We would like to know whether there is anything we can do to close these gaps. What if we intervened to reduce incarceration or increase access to education? Would those interventions close gaps across categories of race, class, or gender?

These types of questions are at the core of a growing literature in epidemiology addresses these questions with techniques for causal decomposition analysis (@vanderweele2014, @jackson2018, @jackson2020). This package provides software to support inquiry into gap-closing estimands, as discussed in @lundberg2021.

A guiding principle is to distinguish human tasks from software tasks that can be automated.

As a human, you will:

define the intervention
make causal assumptions for identification
specify a treatment model and/or an outcome model

Then package will:

estimate models
produce doubly-robust estimates
sample split to improve convergence
estimate standard errors by bootstrapping
visualize the result

Data structure

In a data frame data, we have a gap-defining category such as race, gender, or class. We have a binary treatment variable that could have counterfactually been different for any individual. We want to know the degree to which an intervention to change the treatment would close gaps across the categories.

These boxes will present an example.

Example. Suppose we have the following data.

$X$ (category): Category of interest, taking values {A, B, C}

$T$ (treatment): Binary treatment variable, taking values 0 and 1

$L$ (confounder): A continuous confounding variable, Uniform(-1,1)

$Y$ (outcome): A continuous outcome variable, conditionally normal

set.seed(08544)
library(gapclosing)
library(dplyr)
library(ggplot2)

simulated_data <- generate_simulated_data(n = 1000)
head(simulated_data)

Coding from scratch

With the most simple models, you can carry out a gap-closing analysis without the software package. First, fit any prediction function for the outcome as a function of the category of interest, confounders, and treatment.

example_ols <- lm(outcome ~ category*treatment + confounder,
                  data = simulated_data)

For everyone in the sample, predict under a counterfactual treatment value (e.g., treatment = 1).

fitted <- simulated_data %>%
  mutate(outcome_under_treatment_1 = predict(example_ols,
                                             newdata = simulated_data %>%
                                               mutate(treatment = 1)))

Average the counterfactual estimates within each category.

fitted %>%
  # Group by the category of interest
  group_by(category) %>%
  # Take the average prediction
  summarize(factual = mean(outcome),
            counterfactual = mean(outcome_under_treatment_1))

The software package supports these steps as well as more complex things you might want:

three estimation strategies
- outcome prediction
- treatment prediction
- doubly robust estimation
machine learning prediction functions
counterfactual treatments that differ across units
bootstrapping for standard errors
easy visualization

Basic package functionality

The gapclosing() function estimates gaps across categories and the degree to which they would close under the specified counterfactual_assignments of the treatment.

estimate <- gapclosing(
  data = simulated_data,
  counterfactual_assignments = 1,
  outcome_formula = formula(outcome ~ confounder + category*treatment),
  treatment_formula = formula(treatment ~ confounder + category),
  category_name = "category",
  se = TRUE,
  # Setting bootstrap_samples very low to speed this tutorial
  # Should be set higher in practice
  bootstrap_samples = 20,
  # You can process the bootstrap in parallel with as many cores as available
  parallel_cores = 1
)

By default, this function will do the following:

Fit logistic regression to predict treatment assignment
Fit OLS regression to predict outcomes
Combine the two in a doubly-robust estimator estimated on a single sample
Return a gapclosing object which supports summary, print, and plot functions.

In this example, the plot(estimate) function produces the following visualization. The factual outcomes are unequal across categories, but the counterfactual outcomes are roughly equal. In this simulated setting, the intervention almost entirely closes the gaps across the categories.

plots <- plot(estimate, return_plots = TRUE)

plot(estimate)

print(plots[[1]] +
        ggtitle("First result of a call to plot()"))

The disparityplot() function lets us zoom in on the factual and counterfactual disparity between two categories, of interest. In this case, we see that the intervention lifts outcomes in category A to be more comparable to category B. A disparityplot is a ggplot2 object and can be customized by passing additional layers.

disparityplot(estimate, category_A = "A", category_B = "B") +
  ggtitle("A disparityplot()")

The summary function will print estimates, standard errors, and confidence intervals for all of these results.

old <- options()
options(width = 300)

summary(estimate)

options(old)

ilundberg/gapclosing documentation built on Dec. 12, 2024, 9:31 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ilundberg/gapclosing
Estimate Gaps Under an Intervention

In ilundberg/gapclosing: Estimate Gaps Under an Intervention

Big idea

Motivation

Data structure

Coding from scratch

Basic package functionality

R Package Documentation

Browse R Packages

We want your feedback!

ilundberg/gapclosing Estimate Gaps Under an Intervention

In ilundberg/gapclosing: Estimate Gaps Under an Intervention

Big idea

Motivation

Data structure

Coding from scratch

Basic package functionality

R Package Documentation

Browse R Packages

We want your feedback!

ilundberg/gapclosing
Estimate Gaps Under an Intervention