Translating Zelig to clarify

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  fig.width = 6.5,
  fig.height = 2.75
)

Introduction

In this document, we demonstrate some common uses of Zelig [@imaiCommonFrameworkStatistical2008a] and how the same tasks can be performed using clarify. We'll include examples for computing predictions at representative values (i.e., setx() and sim() in Zelig), the rare-events logit model, estimating the average treatment effect (ATT) after matching, and combining estimates after multiple imputation.

The usual workflow in Zelig is to fit a model using zelig(), specify quantities of interest to simulate using setx() on the zelig() output, and then simulate those quantities using sim(). clarify uses a similar approach, except that the model is fit outside clarify using functions in a different R package. In addition, clarify's sim_apply() allows for the computation of any arbitrary quantity of interest. Unlike Zelig, clarify follows the recommendations of @raineyCarefulConsiderationCLARIFY2023 to use the estimates computed from the original model coefficients rather than the average of the simulated draws. We'll demonstrate how to replicate a standard Zelig analysis using clarify step-by-step. Because simulation-based inference involves randomness and some of the algorithms may not perfectly align, one shouldn't expect results to be identical, though in most cases, they should be similar.

## library("Zelig")
library("clarify")
set.seed(100)

Note that both Zelig and clarify have a function called "sim()", so we will always make it clear which package's sim() is being used.

Predictions at representative values

Here we'll use the lalonde dataset in {MatchIt} and fit a linear model for re78 as a function of the treatment treat and covariates.

data("lalonde", package = "MatchIt")

We'll be interested in the predicted values of the outcome for a typical unit at each level of treatment and their first difference.

Zelig workflow

In Zelig, we fit the model using zelig():

fit <- zelig(re78 ~ treat + age + educ + married + race +
               nodegree + re74 + re75, data = lalonde,
             model = "ls", cite = FALSE)

Next, we use setx() and setx1() to set our values of treat:

fit <- setx(fit, treat = 0)
fit <- setx1(fit, treat = 1)

Next we simulate the values using sim():

fit <- Zelig::sim(fit)

Finally, we can print and plot the predicted values and first differences:

fit
plot(fit)

clarify workflow

In clarify, we fit the model using functions outside clarify, like stats::lm()or fixest::feols().

fit <- lm(re78 ~ treat + age + educ + married + race +
            nodegree + re74 + re75, data = lalonde)

Next, we simulate the model coefficients using clarify::sim():

s <- clarify::sim(fit)

Next, we use sim_setx() to set our values of the predictors:

est <- sim_setx(s, x = list(treat = 0), x1 = list(treat = 1),
                verbose = FALSE)

Finally, we can summarize and plot the predicted values:

summary(est)

plot(est)

Rare-events logit

Zelig uses a special method for logistic regression with rare events as described in @kingLogisticRegressionRare2001. This is the primary implementation of the method in R. However, newer methods have been developed that perform similarly to or better than the method of King and Zeng [@puhrFirthLogisticRegression2017] and are implemented in R packages that are compatible with clarify, such as logistf and brglm2.

Here, we'll use the lalonde dataset with a constructed rare outcome variable to demonstrate how to perform a rare events logistic regression in Zelig and in clarify.

data("lalonde", package = "MatchIt")

#Rare outcome: 1978 earnings over $20k; ~6% prevalence
lalonde$re78_20k <- lalonde$re78 >= 20000

Zelig workflow

In Zelig, we fit a rare events logistic model using zelig() with model = "relogit".

fit <- zelig(re78_20k ~ treat + age + educ + married + race +
               nodegree + re74 + re75, data = lalonde,
             model = "relogit", cite = FALSE)

fit

We can compute predicted values at representative values using setx() and Zelig::sim() as above.

fit <- setx(fit, treat = 0)
fit <- setx1(fit, treat = 1)

fit <- Zelig::sim(fit)

fit
plot(fit)

clarify workflow

Here, we'll use logistf::logistif() with flic = TRUE, which performs a variation on Firth's logistic regression with a correction for bias in the intercept [@puhrFirthLogisticRegression2017].

fit <- logistf::logistf(re78_20k ~ treat + age + educ + married + race +
                          nodegree + re74 + re75, data = lalonde,
                        flic = TRUE)

summary(fit)

We can compute predictions at representative values using clarify::sim() and sim_setx().

s <- clarify::sim(fit)

est <- sim_setx(s, x = list(treat = 0), x1 = list(treat = 1),
                verbose = FALSE)

summary(est)
plot(est)

Estimating the ATT after matching

Here we'll use the lalonde dataset and perform propensity score matching and then fit a linear model for re78 as a function of the treatment treat, the covariates, and their interaction. From this model, we'll compute the ATT of treat using Zelig and clarify.

data("lalonde", package = "MatchIt")

m.out <- MatchIt::matchit(treat ~ age + educ + married + race +
                            nodegree + re74 + re75, data = lalonde,
                          method = "nearest")

Zelig workflow

In Zelig, we fit the model using zelig() directly on the matchit object:

fit <- zelig(re78 ~ treat * (age + educ + married + race +
                               nodegree + re74 + re75),
             data = m.out, model = "ls", cite = FALSE)

Next, we use ATT() to request the ATT of treat and simulate the values:

fit <- ATT(fit, "treat")
fit
plot(fit)

clarify workflow

In clarify, we need to extract the matched dataset and fit a model outside clarify using another package.

m.data <- MatchIt::match.data(m.out)

fit <- lm(re78 ~ treat * (age + educ + married + race +
                            nodegree + re74 + re75),
          data = m.data)

Next, we simulate the model coefficients using clarify::sim(). Because we performed pair matching, we will request a cluster-robust standard error:

s <- clarify::sim(fit, vcov = ~subclass)

Next, we use sim_ame() to request the average marginal effect of treat within the subset of treated units:

est <- sim_ame(s, var = "treat", subset = treat == 1,
               contrast = "diff", verbose = FALSE)

Finally, we can summarize and plot the ATT:

summary(est)

plot(est)

Combining results after multiple imputation

Here we'll use the africa dataset in {Amelia} to demonstrate combining estimates after multiple imputation. This analysis is also demonstrated using clarify at the end of vignette("clarify").

library(Amelia)
data("africa", package = "Amelia")

First we multiply impute the data using amelia() using the specification in the {Amelia} documentation.

# Multiple imputation
a.out <- amelia(x = africa, m = 10, cs = "country",
                ts = "year", logs = "gdp_pc", p2s = 0)

Zelig workflow

With Zelig, we can supply the amelia object directly to the data argument of zelig() to fit a model in each imputed dataset:

fit <- zelig(gdp_pc ~ infl * trade, data = a.out,
             model = "ls", cite = FALSE)

Summarizing the coefficient estimates after the simulation can be done using summary():

summary(fit)

We can use Zelig::sim() and setx() to compute predictions at specified values of the predictors:

fit <- setx(fit, infl = 0, trade = 40)
fit <- setx1(fit, infl = 0, trade = 60)

fit <- Zelig::sim(fit)

Zelig does not allow you to combine predicted values across imputations.

fit
plot(fit)

clarify workflow

clarify does not combine coefficients, unlike zelig(); instead, the models should be fit using Amelia::with(). To view the combined coefficient estimates, use Amelia::mi.combine().

#Use Amelia functions to model and combine coefficients
fits <- with(a.out, lm(gdp_pc ~ infl * trade))

mi.combine(fits)

Derived quantities can be computed using clarify::misim() and sim_apply() or its wrappers on the with() output, which is a list of regression model fits:

#Simulate coefficients, 100 in each of 10 imputations
s <- misim(fits, n = 100)

#Compute predictions at specified values
est <- sim_setx(s, x = list(infl = 0, trade = 40),
                x1 = list(infl = 0, trade = 60),
                verbose = FALSE)

summary(est)

plot(est)

References



Try the clarify package in your browser

Any scripts or data that you put into this service are public.

clarify documentation built on June 22, 2024, 12:09 p.m.