knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, fig.width = 6.5, fig.height = 2.75 )
In this document, we demonstrate some common uses of Zelig
[@imaiCommonFrameworkStatistical2008a] and how the same tasks can be performed using clarify
. We'll include examples for computing predictions at representative values (i.e., setx()
and sim()
in Zelig
), the rare-events logit model, estimating the average treatment effect (ATT) after matching, and combining estimates after multiple imputation.
The usual workflow in Zelig
is to fit a model using zelig()
, specify quantities of interest to simulate using setx()
on the zelig()
output, and then simulate those quantities using sim()
. clarify
uses a similar approach, except that the model is fit outside clarify
using functions in a different R package. In addition, clarify
's sim_apply()
allows for the computation of any arbitrary quantity of interest. Unlike Zelig
, clarify
follows the recommendations of @raineyCarefulConsiderationCLARIFY2023 to use the estimates computed from the original model coefficients rather than the average of the simulated draws. We'll demonstrate how to replicate a standard Zelig
analysis using clarify
step-by-step. Because simulation-based inference involves randomness and some of the algorithms may not perfectly align, one shouldn't expect results to be identical, though in most cases, they should be similar.
## library("Zelig") library("clarify") set.seed(100)
Note that both Zelig
and clarify
have a function called "sim()
", so we will always make it clear which package's sim()
is being used.
Here we'll use the lalonde
dataset in {MatchIt}
and fit a linear model for re78
as a function of the treatment treat
and covariates.
data("lalonde", package = "MatchIt")
We'll be interested in the predicted values of the outcome for a typical unit at each level of treatment and their first difference.
Zelig
workflowIn Zelig
, we fit the model using zelig()
:
fit <- zelig(re78 ~ treat + age + educ + married + race + nodegree + re74 + re75, data = lalonde, model = "ls", cite = FALSE)
Next, we use setx()
and setx1()
to set our values of treat
:
fit <- setx(fit, treat = 0) fit <- setx1(fit, treat = 1)
Next we simulate the values using sim()
:
fit <- Zelig::sim(fit)
Finally, we can print and plot the predicted values and first differences:
fit
plot(fit)
clarify
workflowIn clarify
, we fit the model using functions outside clarify
, like stats::lm()
or fixest::feols()
.
fit <- lm(re78 ~ treat + age + educ + married + race + nodegree + re74 + re75, data = lalonde)
Next, we simulate the model coefficients using clarify::sim()
:
s <- clarify::sim(fit)
Next, we use sim_setx()
to set our values of the predictors:
est <- sim_setx(s, x = list(treat = 0), x1 = list(treat = 1), verbose = FALSE)
Finally, we can summarize and plot the predicted values:
summary(est) plot(est)
Zelig
uses a special method for logistic regression with rare events as described in @kingLogisticRegressionRare2001. This is the primary implementation of the method in R. However, newer methods have been developed that perform similarly to or better than the method of King and Zeng [@puhrFirthLogisticRegression2017] and are implemented in R packages that are compatible with clarify
, such as logistf
and brglm2
.
Here, we'll use the lalonde
dataset with a constructed rare outcome variable to demonstrate how to perform a rare events logistic regression in Zelig
and in clarify
.
data("lalonde", package = "MatchIt") #Rare outcome: 1978 earnings over $20k; ~6% prevalence lalonde$re78_20k <- lalonde$re78 >= 20000
Zelig
workflowIn Zelig
, we fit a rare events logistic model using zelig()
with model = "relogit"
.
fit <- zelig(re78_20k ~ treat + age + educ + married + race + nodegree + re74 + re75, data = lalonde, model = "relogit", cite = FALSE) fit
We can compute predicted values at representative values using setx()
and Zelig::sim()
as above.
fit <- setx(fit, treat = 0) fit <- setx1(fit, treat = 1) fit <- Zelig::sim(fit) fit
plot(fit)
clarify
workflowHere, we'll use logistf::logistif()
with flic = TRUE
, which performs a variation on Firth's logistic regression with a correction for bias in the intercept [@puhrFirthLogisticRegression2017].
fit <- logistf::logistf(re78_20k ~ treat + age + educ + married + race + nodegree + re74 + re75, data = lalonde, flic = TRUE) summary(fit)
We can compute predictions at representative values using clarify::sim()
and sim_setx()
.
s <- clarify::sim(fit) est <- sim_setx(s, x = list(treat = 0), x1 = list(treat = 1), verbose = FALSE) summary(est)
plot(est)
Here we'll use the lalonde
dataset and perform propensity score matching and then fit a linear model for re78
as a function of the treatment treat
, the covariates, and their interaction. From this model, we'll compute the ATT of treat
using Zelig
and clarify
.
data("lalonde", package = "MatchIt") m.out <- MatchIt::matchit(treat ~ age + educ + married + race + nodegree + re74 + re75, data = lalonde, method = "nearest")
Zelig
workflowIn Zelig
, we fit the model using zelig()
directly on the matchit
object:
fit <- zelig(re78 ~ treat * (age + educ + married + race + nodegree + re74 + re75), data = m.out, model = "ls", cite = FALSE)
Next, we use ATT()
to request the ATT of treat
and simulate the values:
fit <- ATT(fit, "treat")
fit
plot(fit)
clarify
workflowIn clarify
, we need to extract the matched dataset and fit a model outside clarify
using another package.
m.data <- MatchIt::match.data(m.out) fit <- lm(re78 ~ treat * (age + educ + married + race + nodegree + re74 + re75), data = m.data)
Next, we simulate the model coefficients using clarify::sim()
. Because we performed pair matching, we will request a cluster-robust standard error:
s <- clarify::sim(fit, vcov = ~subclass)
Next, we use sim_ame()
to request the average marginal effect of treat
within the subset of treated units:
est <- sim_ame(s, var = "treat", subset = treat == 1, contrast = "diff", verbose = FALSE)
Finally, we can summarize and plot the ATT:
summary(est) plot(est)
Here we'll use the africa
dataset in {Amelia}
to demonstrate combining estimates after multiple imputation. This analysis is also demonstrated using clarify
at the end of vignette("clarify")
.
library(Amelia) data("africa", package = "Amelia")
First we multiply impute the data using amelia()
using the specification in the {Amelia}
documentation.
# Multiple imputation a.out <- amelia(x = africa, m = 10, cs = "country", ts = "year", logs = "gdp_pc", p2s = 0)
Zelig
workflowWith Zelig
, we can supply the amelia
object directly to the data
argument of zelig()
to fit a model in each imputed dataset:
fit <- zelig(gdp_pc ~ infl * trade, data = a.out, model = "ls", cite = FALSE)
Summarizing the coefficient estimates after the simulation can be done using summary()
:
summary(fit)
We can use Zelig::sim()
and setx()
to compute predictions at specified values of the predictors:
fit <- setx(fit, infl = 0, trade = 40) fit <- setx1(fit, infl = 0, trade = 60) fit <- Zelig::sim(fit)
Zelig
does not allow you to combine predicted values across imputations.
fit
plot(fit)
clarify
workflowclarify
does not combine coefficients, unlike zelig()
; instead, the models should be fit using Amelia::with()
. To view the combined coefficient estimates, use Amelia::mi.combine()
.
#Use Amelia functions to model and combine coefficients fits <- with(a.out, lm(gdp_pc ~ infl * trade)) mi.combine(fits)
Derived quantities can be computed using clarify::misim()
and sim_apply()
or its wrappers on the with()
output, which is a list of regression model fits:
#Simulate coefficients, 100 in each of 10 imputations s <- misim(fits, n = 100) #Compute predictions at specified values est <- sim_setx(s, x = list(infl = 0, trade = 40), x1 = list(infl = 0, trade = 60), verbose = FALSE) summary(est) plot(est)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.