This part introduces the big ideas: why we should do this, what data look like, and what's going on under the hood. It closes with basic package functionality.
Gaps across social categories like race, class, and gender are important to understand. We would like to know whether there is anything we can do to close these gaps. What if we intervened to reduce incarceration or increase access to education? Would those interventions close gaps across categories of race, class, or gender?
These types of questions are at the core of a growing literature in epidemiology addresses these questions with techniques for causal decomposition analysis (@vanderweele2014, @jackson2018, @jackson2020). This package provides software to support inquiry into gap-closing estimands, as discussed in @lundberg2021.
A guiding principle is to distinguish human tasks from software tasks that can be automated.
As a human, you will:
Then package will:
In a data frame data
, we have a gap-defining category such as race, gender, or class. We have a binary treatment variable that could have counterfactually been different for any individual. We want to know the degree to which an intervention to change the treatment would close gaps across the categories.
These boxes will present an example.
Example. Suppose we have the following data.
- $X$ (
category
): Category of interest, taking values {A, B, C}- $T$ (
treatment
): Binary treatment variable, taking values 0 and 1- $L$ (
confounder
): A continuous confounding variable, Uniform(-1,1)- $Y$ (
outcome
): A continuous outcome variable, conditionally normal
set.seed(08544) library(gapclosing) library(dplyr) library(ggplot2)
simulated_data <- generate_simulated_data(n = 1000) head(simulated_data)
With the most simple models, you can carry out a gap-closing analysis without the software package. First, fit any prediction function for the outcome as a function of the category of interest, confounders, and treatment.
example_ols <- lm(outcome ~ category*treatment + confounder, data = simulated_data)
For everyone in the sample, predict under a counterfactual treatment value (e.g., treatment = 1
).
fitted <- simulated_data %>% mutate(outcome_under_treatment_1 = predict(example_ols, newdata = simulated_data %>% mutate(treatment = 1)))
Average the counterfactual estimates within each category.
fitted %>% # Group by the category of interest group_by(category) %>% # Take the average prediction summarize(factual = mean(outcome), counterfactual = mean(outcome_under_treatment_1))
The software package supports these steps as well as more complex things you might want:
The gapclosing()
function estimates gaps across categories and the degree to which they would close under the specified counterfactual_assignments
of the treatment.
estimate <- gapclosing( data = simulated_data, counterfactual_assignments = 1, outcome_formula = formula(outcome ~ confounder + category*treatment), treatment_formula = formula(treatment ~ confounder + category), category_name = "category", se = TRUE, # Setting bootstrap_samples very low to speed this tutorial # Should be set higher in practice bootstrap_samples = 20, # You can process the bootstrap in parallel with as many cores as available parallel_cores = 1 )
By default, this function will do the following:
gapclosing
object which supports summary
, print
, and plot
functions.In this example, the plot(estimate)
function produces the following visualization. The factual outcomes are unequal across categories, but the counterfactual outcomes are roughly equal. In this simulated setting, the intervention almost entirely closes the gaps across the categories.
plots <- plot(estimate, return_plots = TRUE)
plot(estimate)
print(plots[[1]] + ggtitle("First result of a call to plot()"))
The disparityplot()
function lets us zoom in on the factual and counterfactual disparity between two categories, of interest. In this case, we see that the intervention lifts outcomes in category A to be more comparable to category B. A disparityplot
is a ggplot2
object and can be customized by passing additional layers.
disparityplot(estimate, category_A = "A", category_B = "B") + ggtitle("A disparityplot()")
The summary
function will print estimates, standard errors, and confidence intervals for all of these results.
old <- options() options(width = 300)
summary(estimate)
options(old)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.