Get Started"
In civ: Categorical Instrumental Variables

This article is a brief introduction to civ.

library(civ)
library(AER)
set.seed(517938)

To illustrate civ on a simple example, consider the data generating process from the simulation of Wiemann (2023). The code snippet below draws a sample of size $n=800$.

# Set seed
set.seed(51944)
# Sample parameters
nobs = 800 # sample size
C = 0.858 # first stage coefficient
sgm_V = sqrt(0.81) # first stage error
tau_X <- c(-0.5, 0.5) + 1 # second stage effects
# Sample controls and instrument
X <- sample(1:2, nobs, replace = T)
Z <- model.matrix(~ 0 + as.factor(sample(1:20, nobs, replace = T)):as.factor(X))
Z <- Z %*% c(1:ncol(Z))
# Create the low-dimensional latent instrument
Z0 <- Z %% 2 # underlying latent instrument
# Draw first and second stage errors
U_V <- matrix(rnorm(2 * nobs, 0, 1), nobs, 2) %*%
  chol(matrix(c(1, 0.6, 0.6, sgm_V), 2, 2))
# Draw treatment and outcome variables
D <- Z0 * C + U_V[, 2]
y <- D * tau_X[X] + U_V[, 1]

In the generated sample, the observed instrument takes 40 values with varying numbers of observations per instrument. Using only the observed instrument Z, the goal is to estimate the in-sample average treatment effect:

mean(tau_X[X])

## [1] 1.0325

The code snippet below estimates CIV where the first stage is restricted to K=2 support points. The AER package is used to compute heteroskedasticity robust standard errors.

# Compute CIV with K=2 and conduct inference
civ_fit <- civ(y = y, D = D, Z = Z, X = as.factor(X), K = 2)
civ_res <- summary(civ_fit, vcov = vcovHC(civ_fit$iv_fit, type = "HC1"))

The CIV estimate and the corresponding standard error are shown below. The associated 95\% confidence interval covers the true effect as indicated by the t-value of less than 1.96.

c(Estimate = civ_res$coef[2, 1], "Std. Error" = civ_res$coef[2, 2],
  "t-val." = abs(civ_res$coef[2, 1]-mean(tau_X[X]))/civ_res$coef[2, 2])

##   Estimate Std. Error     t-val. 
##  1.0063143  0.1086868  0.2409285

CIV uses a K-Conditional-Means (KCMeans) estimator in a first step to estimate the optimal instrument. To understand the estimated mapping of observed instruments to the support points of the latent instrument, it is useful to print the cluster_map attribute of the first-stage kcmeans_fit object (see also kcmeans for details). The code snippet below prints the results for the first 10 values of the instrument. Here, x denotes the value of the observed instrument while cluster_x denotes the association with the estimated optimal instrument.

t(head(civ_fit$kcmeans_fit$cluster_map[, c(1, 4)], 10))

##           [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x           26   20   10   32   23   12    7   25   33    21
## cluster_x    1    1    1    1    2    1    2    2    2     2