This article is a brief introduction to civ
.
library(civ) library(AER) set.seed(517938)
To illustrate civ
on a simple example, consider the data generating process from the simulation of Wiemann (2023). The code snippet below draws a sample of size $n=800$.
# Set seed set.seed(51944) # Sample parameters nobs = 800 # sample size C = 0.858 # first stage coefficient sgm_V = sqrt(0.81) # first stage error tau_X <- c(-0.5, 0.5) + 1 # second stage effects # Sample controls and instrument X <- sample(1:2, nobs, replace = T) Z <- model.matrix(~ 0 + as.factor(sample(1:20, nobs, replace = T)):as.factor(X)) Z <- Z %*% c(1:ncol(Z)) # Create the low-dimensional latent instrument Z0 <- Z %% 2 # underlying latent instrument # Draw first and second stage errors U_V <- matrix(rnorm(2 * nobs, 0, 1), nobs, 2) %*% chol(matrix(c(1, 0.6, 0.6, sgm_V), 2, 2)) # Draw treatment and outcome variables D <- Z0 * C + U_V[, 2] y <- D * tau_X[X] + U_V[, 1]
In the generated sample, the observed instrument takes 40 values with varying numbers of observations per instrument. Using only the observed instrument Z
, the goal is to estimate the in-sample average treatment effect:
mean(tau_X[X])
## [1] 1.0325
The code snippet below estimates CIV where the first stage is restricted to K=2
support points. The AER
package is used to compute heteroskedasticity robust standard errors.
# Compute CIV with K=2 and conduct inference civ_fit <- civ(y = y, D = D, Z = Z, X = as.factor(X), K = 2) civ_res <- summary(civ_fit, vcov = vcovHC(civ_fit$iv_fit, type = "HC1"))
The CIV estimate and the corresponding standard error are shown below. The associated 95\% confidence interval covers the true effect as indicated by the t-value of less than 1.96.
c(Estimate = civ_res$coef[2, 1], "Std. Error" = civ_res$coef[2, 2], "t-val." = abs(civ_res$coef[2, 1]-mean(tau_X[X]))/civ_res$coef[2, 2])
## Estimate Std. Error t-val. ## 1.0063143 0.1086868 0.2409285
CIV uses a K-Conditional-Means (KCMeans) estimator in a first step to estimate the optimal instrument. To understand the estimated mapping of observed instruments to the support points of the latent instrument, it is useful to print the cluster_map
attribute of the first-stage kcmeans_fit
object (see also kcmeans
for details). The code snippet below prints the results for the first 10 values of the instrument. Here, x
denotes the value of the observed instrument while cluster_x
denotes the association with the estimated optimal instrument.
t(head(civ_fit$kcmeans_fit$cluster_map[, c(1, 4)], 10))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] ## x 26 20 10 32 23 12 7 25 33 21 ## cluster_x 1 1 1 1 2 1 2 2 2 2
Wiemann T (2023). "Optimal Categorical Instruments." https://arxiv.org/abs/2311.17021
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.