In almost-matching-exactly/R-FLAME: Interpretable Matching for Causal Inference

FLAME: Fast, interpretable, and accurate methods for estimating causal effects from observational data

The Methodology

library(RefManageR)
BibOptions(check.entries = FALSE,
           bib.style = "authoryear",
           cite.style = "authoryear",
           style = "markdown",
           hyperlink = FALSE,
           dashed = FALSE)
myBib <- ReadBib("./biblio.bibtex", check = FALSE)

Causal Inference

Goal: quantify effect of treatment $T \in {0, 1}$ on outcome $Y$

r NoCite(myBib, 'DAME', 'FLAME')

Two potential outcomes for each unit: ${Y_i(1), Y_i(0)}$, denoting response under treatment, control

Only one -- denoted $Y_i$ -- is actually observed

The other, the counterfactual, must be estimated from the data

In observational data, covariates $\mathbf{X} \in \mathbb{R}^p$ might be confounders

??? Can't naively compare the average outcome of control units to estimate a control counterfactual

(Exact) Matching

One approach to estimating counterfactuals under confounding: matching

If, for treated unit $i$, there existed control unit $k$ such that $\mathbf{x}_k = \mathbf{x}_i$, then:

-- - $Y_k$ (observed) is a good estimate of $Y_i(0)$ (unobserved)

But exact matches are unlikely in high dimensional settings

Settle for $\mathbf{x}_k \approx \mathbf{x}_i$

Almost Matching Exactly

Given a unit $i$, covariate weights $\mathbf{w}$, and a covariate selection vector $\boldsymbol{\theta}$, define the AME problem:

$$\overbrace{\text{argmax}{\boldsymbol{\theta} \in {0, 1}^p}\;\boldsymbol{\theta}^T\mathbf{w}}^{\text{most important covariate set}}\quad\text{s.t.}\\quad \exists k\;\:\text{with}\;\: \color{blue}{\underbrace{\mathbf{x}{k} \circ \boldsymbol{\theta} = \mathbf{x}{i} \circ \boldsymbol{\theta}}{\text{exact matching on }\boldsymbol{\theta}}} \;\:\text{and}\;\: \underbrace{T_{k} = 1 -T_i}_{\text{opposite treatment}}$$

Almost Matching Exactly

Given a unit $i$, covariate weights $\mathbf{w}$, and a covariate selection vector $\boldsymbol{\theta}$, define the AME problem:

$$\overbrace{\text{argmax}{\boldsymbol{\theta} \in {0, 1}^p}\;\boldsymbol{\theta}^T\mathbf{w}}^{\text{most important covariate set}}\quad\text{s.t.}\\quad \exists k\;\:\text{with}\;\: \underbrace{\mathbf{x}{k} \circ \boldsymbol{\theta} = \mathbf{x}{i} \circ \boldsymbol{\theta}}{\text{exact matching on }\boldsymbol{\theta}} \;\:\text{and}\;\: \color{blue}{\underbrace{T_{k} = 1 -T_i}_{\text{opposite treatment}}}$$

Almost Matching Exactly

Given a unit $i$, covariate weights $\mathbf{w}$, and a covariate selection vector $\boldsymbol{\theta}$, define the AME problem:

$$\color{blue}{\overbrace{\text{argmax}{\boldsymbol{\theta} \in {0, 1}^p}\;\boldsymbol{\theta}^T\mathbf{w}}^{\text{most important covariate set}}}\quad\text{s.t.}\\quad \exists k\;\:\text{with}\;\: \underbrace{\mathbf{x}{k} \circ \boldsymbol{\theta} = \mathbf{x}{i} \circ \boldsymbol{\theta}}{\text{exact matching on }\boldsymbol{\theta}} \;\:\text{and}\;\: \underbrace{T_{k} = 1 -T_i}_{\text{opposite treatment}}$$

Implicitly defines a distance metric that:

Prioritizes matches on relevant covariates

-- 2. Matches exactly when possible

Iterate over covariate sets, starting with more important ones

-- exclude: true In practice, don't have $\mathbf{w}$; run ML algorithm on separate holdout set

Compute Predictive Error ( $\mathtt{PE}$ ): error in using a covariate set to predict the outcome
Determines next covariate set to match on
Learning a distance metric
- test

??? Going to try and solve the AME problem for units. Way this is going to work in practice is that we're going to pick a theta, starting with a theta of all 1s, which corresponds to exact matching -- the best possible thing we can do -- and match all possible units. Then we're going to choose another theta, and match those units. Bc in practice we don't have fixed covariate weights, for each of these thetas, ..

Almost Matching Exactly: The Algorithms

DAME (Dynamic Almost Matching Exactly)

Solves the AME problem exactly for each unit

Efficient solution via downward closure property

FLAME (Fast, Large-scale Almost Matching Exactly)

Approximates the exact solution via backwards stepwise selection.

At each iteration, eliminate an entire covariate

??? Given all this background, it's now very natural and easy to explain two of our methods

Almost Matching Exactly: Dynamic Weights

In practice, don't have $\mathbf{w}$; run ML algorithm on separate holdout set

Compute Predictive Error ( $\mathtt{PE}$ ): error using covariate set to predict outcome

Determines next covariate set to match on

Almost Matching Exactly: Dynamic Weights

Oftentimes don't have a priori measures of covariate importance

-- exclude: true At every iteration, run ML algorithm on separate holdout set to model how well a covariate set predicts the outcome

-- exclude: true The Predictive Error ( $\mathtt{PE}$ ) measures the error in doing so and determines what covariate set next to match on.

Other Distance Metrics

Propensity score matching: match on estimates of $\mathrm{P}(T_i = 1 | \mathbf{X} = \mathbf{x}_i)$
Prognostic score matching: match on estimates of $Y_i(0)$
Coarsened exact matching: Coarsen covariates and do exact matching

The Package

Overview of `FLAME`

FLAME and DAME are the workhorses of the package

Match input data under a wide variety of specifications

Efficient bit-vectors routine for making matches

Return S3 objects of class ame with print, plot, and summary methods

Installation

CRAN

install.packages('FLAME')

GitHub

library(devtools)

install_github('https://github.com/vittorioorlandi/FLAME')
# Or (mirror of the above)
install_github('https://github.com/almost-matching-exactly/R-FLAME')

Natality Data

library(FLAME)

natality_out <- readRDS('../natality/natality_out_500k_lm.rds')

US 2010 Natality Data r Citep(myBib, 'natality2010').

Data on neonatal health outcomes in Neonatal Intensive Care Unit (NICU)

Effect of "extreme smoking" ( $\geq 10$ cigarettes a day during pregnancy) on birth weight r Citep(myBib, 'kondracki2020').

Subset of ~500k observations with 16 covariates including sex of infant, races of parents, previous Cesarean deliveries, and others.

Missing Data

missing_data: how missing values in data to be matched are handled

-- - drop: effectively drop units with missingness from the data

-- - impute: impute missing values and match on complete dataset

-- - keep: keep missing values but do not match on them

missing_holdout is analogous, with impute and keep options

Computing Predictive Error

Two implemented options for computing $\mathtt{PE}$ - glmnet::cv.glmnet with 5-fold cross-validation (default) - xgboost::xgb.cv with 5-fold cross-validation

Supply your own function:

my_PE_lm <- function(X, Y) {
  df <- as.data.frame(cbind(X, Y = Y))
  return(lm(Y ~ ., df)$fitted.values)
}

Calling FLAME and DAME