In microsoft/CausalGrid: Analysis of Subgroups

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  dev = "ragg_png"
)

library(CausalGrid)
library(gridExtra)

Some common parameters

set.seed(1337)
N = 1000
K = 3
err_sd = 1

Simple Example

Let's get some fake data

X = matrix(runif(N*K), ncol=K) #Features for splitting partition
d = rbinom(N, 1, 0.5) #treatment assignment
tau = as.integer(X[,1]>.5)*2-1 #true treatment effect (just heterogeneous across X1)
y = d*tau + rnorm(N, 0, err_sd) #outcome

est_part0 = fit_estimate_partition(y, X, d, cv_folds=2)
get_desc_df(est_part0)

We typically want a high-level partition for "human" consumption. To save time, avoid cells with too few observations, and reduce the chance of splitting from running many noisy tests, it's common to only look for a few splits per dimension. If we don't specify this, the function will try every possible split across each dimension.

# With just a scalar, we will split at points equal across the quantile-distribution for each feature.
breaks = 5 
#Otherwise we can explicitly list the potential splits to evaluate.
breaks = rep(list(seq(breaks)/(breaks+1)), K)
est_part = fit_estimate_partition(y, X, d, breaks_per_dim=breaks, cv_folds=2)

plot(est_part)

get_desc_df(est_part)

We can manually estimate this simple model given the partition

est_df = data.frame(y=y, d=d, f=predict(est_part$partition, X))
summary(lm(y~0+f+d:f, data=est_df[-est_part$index_tr,]))

Sometimes we want a different level of complexity than that picked by CV. Either we can pre-specify which partition in the sequence that we want (using the partition_i parameter), or we can look at the sequence of objective function values and see where additional splits only provide marginal improvements.

print(paste("In-sample Objective function values: ", paste(est_part$is_obj_val_seq, collapse=" ")))

Compare this with the average treatment effect for the whole and estimation-only samples

est_part$full_stat_df

How important are each of the dimensions of X for the objective function? We refit the model without each dimension and see the change in the objective function

est_part$importance_weights

The first feature is the only one that is useful.

Are there any interactions between the importances? (That is if we remove X1, does the importance of X2 change? This is done by dropping pairs of featurs at a time and see how they differ from single-feature droppings)

est_part$interaction_weights

Essentially no.

Get the observation-level estimated treatment effects.

tau_hat = predict(est_part, new_X=X)

With many estimates, we may wish to account for multiple testing when checking if "there are any negative (or positive) effects"

any_neg = test_any_sign_effect(est_part, check_negative=T)
print(paste("Adjusted 1-side p-values testing if negative:", paste(any_neg$pval1s_fdr, collapse=", ")))

Now let's look at a case where there's hereogeneity across all three dimensions.

tau_3 = (as.integer(X[,1]>0.5)*2-1) + (as.integer(X[,2]>0.5)*2-1)*2 + (as.integer(X[,3]>0.5)*2-1)*3
y_3 = d*tau_3 + rnorm(N, 0, err_sd)
est_part_3 = fit_estimate_partition(y_3, X, d, breaks_per_dim=5, partition_i=4)
get_desc_df(est_part_3)

One benefit of grid-based partitions is that you can view easily view 2D slices of full heterogeneity space.

plts = plot(est_part_3)

grid.arrange(plts[[1]], plts[[2]], ncol=2)

Improving the partition

We can improve the partition by controlling for X's (either local-linearly or global-flexibly) and using bootstrap "bumping"

est_part_l = fit_estimate_partition(y, X, d, breaks_per_dim=5, ctrl_method = "LassoCV", bump_samples = 20, partition_i=2)

LassoCV is a local-linear approach and we can use the global-flexible approach by setting ctrl_method="RF" for a random forest.

Parallel-processing

Parallel-processing the outer-loops

#library(parallel)
#cl <- makeCluster(getOption("cl.cores", default=3)) #see also detectCores()
#fit_res = fit_estimate_partition(..., pr_cl=cl)
#stopCluster(cl)

Multiple core estimates

We can generate a single partition that works across multiple core estimates. We have three options.

1) Multiple outcomes, but same sample (single treatment)

tau2 = as.integer(X[,2]>0.5)*2-1
y2_yM = d*tau2 + rnorm(N, 0, err_sd)
y_yM = cbind(y, y2_yM)
est_part_yM = fit_estimate_partition(y_yM, X, d, breaks_per_dim=5, partition_i = 3)
get_desc_df(est_part_yM)

2) Multiple treatments, but same sample (single outcome)

d2 = rbinom(N, 1, 0.5)
d_dM = cbind(d, d2)
y_dM =  d*tau + d2*tau2 + rnorm(N, 0, err_sd)
est_part_dM = fit_estimate_partition(y_dM, X, d_dM, breaks_per_dim=5, partition_i = 3)
get_desc_df(est_part_dM)

3) Multiple separate samples, each having a single outcome and treatment

y2_MM = d2*tau2 + rnorm(N, 0, err_sd)
y_MM = list(y, y2_MM)
d_MM = list(d, d2)
X_MM = list(X, X)
est_part_MM = fit_estimate_partition(y_MM, X_MM, d_MM, breaks_per_dim=5, partition_i = 3)
get_desc_df(est_part_MM)

Mean-outcome prediction

alpha = as.integer(X[,1]>0.5)*2-1 #true average outcome effect (just heterogeneous across X1)
y_y = alpha + rnorm(N, 0, err_sd) #outcome
est_part_y = fit_estimate_partition(y_y, X, breaks_per_dim=5, partition_i=2)
get_desc_df(est_part_y)