Built using Zelig version r packageVersion("Zelig")

knitr::opts_knit$set(
    stop_on_error = 2L
)
knitr::opts_chunk$set(
    fig.height = 11,
    fig.width = 7
)

options(cite = FALSE)

Rare Events Logistic Regression for Dichotomous Dependent Variables with relogit.

The relogit procedure estimates the same model as standard logistic regression (appropriate when you have a dichotomous dependent variable and a set of explanatory variables; see ), but the estimates are corrected for the bias that occurs when the sample is small or the observed events are rare (i.e., if the dependent variable has many more 1s than 0s or the reverse). The relogit procedure also optionally uses prior correction for case-control sampling designs.

Syntax

z.out <- zelig(Y ~ X1 + X2, model = "relogit", tau = NULL,
               case.control = c("prior", "weighting"),
               bias.correct = TRUE,
               data = mydata, ...)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Arguments

The relogit procedure supports four optional arguments in addition to the standard arguments for zelig(). You may additionally use:

Note that if tau = NULL, bias.correct = FALSE, the relogit procedure performs a standard logistic regression without any correction.

Example 1: One Tau with Prior Correction and Bias Correction

rm(list=ls(pattern="\\.out"))
suppressWarnings(suppressMessages(library(Zelig)))
set.seed(1234)

Due to memory and space considerations, the data used here are a sample drawn from the full data set used in King and Zeng, 2001, The proportion of militarized interstate conflicts to the absence of disputes is $\tau = 1,042 / 303,772 \approx 0.00343$. To estimate the model,

data(mid)
z.out1 <- zelig(conflict ~ major + contig + power + maxdem + mindem + years,
                data = mid, model = "relogit", tau = 1042/303772)

Summarize the model output:

summary(z.out1)

Set the explanatory variables to their means:

x.out1 <- setx(z.out1)

Simulate quantities of interest:

s.out1 <- sim(z.out1, x = x.out1)
summary(s.out1)
plot(s.out1)

Example 2: Tau with Weighting and Bias Correction

Suppose that we wish to perform case control correction using weighting (rather than the default prior correction). To estimate the model:

z.out2 <- zelig(conflict ~ major + contig + power + maxdem + mindem + years,
                data = mid, model = "relogit", tau = 1042/303772,
                case.control = "weighting")

Summarize the model output:

summary(z.out2)

Set the explanatory variables to their means:

x.out2 <- setx(z.out2)

Simulate quantities of interest:

s.out2 <- sim(z.out2, x = x.out2)
summary(s.out2)

Model

$$ Y_i \; \sim \; \textrm{Bernoulli}(\pi_i), $$

where $Y_i$ is the binary dependent variable, and takes a value of either 0 or 1.

$$ \pi_i \; = \; \frac{1}{1 + \exp(-x_i \beta)}. $$

$$ \beta = \hat{\beta_0} - \ln \left[ \bigg( \frac{1 - \tau}{\tau} \bigg) \bigg( \frac{\bar{y}}{1 - \bar{y}} \bigg) \right]. $$

$$ \begin{aligned} w_1 &=& \frac{\tau}{\bar{y}} \ w_0 &=& \frac{(1 - \tau)}{(1 - \bar{y})} \ w_i &=& w_1 Y_i + w_0 (1 - Y_i) \end{aligned} $$

If $\tau$ is unknown, you may alternatively specify an upper and lower bound for the possible range of $\tau$. In this case, the relogit procedure uses "robust Bayesian" methods to generate a confidence interval (rather than a point estimate) for each quantity of interest. The nominal coverage of the confidence interval is at least as great as the actual coverage.

$$ \widehat{\beta} - \textrm{bias}(\widehat{\beta}) = \tilde{\beta} $$

The bias term is

$$ \textrm{bias}(\widehat{\beta}) = (X'WX)^{-1} X'W \xi $$

where

$$ \begin{aligned} \xi_i &=& 0.5 Q_{ii} \Big( (1 + w-1)\widehat{\pi}_i - w_1 \Big) \ Q &=& X(X'WX)^{-1} X' \ W = \textrm{diag}{\widehat{\pi}_i (1 - \widehat{\pi}_i) w_i} \end{aligned} $$

where $w_i$ and $w_1$ are given in the "weighting" section above.

Quantities of Interest

$$ E(Y) = \pi_i = \frac{1}{1 + \exp(-x_i \beta)}, $$

given draws of $\beta$ from its posterior.

$$ \textrm{FD} = \Pr(Y = 1 \mid x_1, \tau) - \Pr(Y = 1 \mid x, \tau). $$

$$ \textrm{RR} = \Pr(Y = 1 \mid x_1, \tau) \ / \ \Pr(Y = 1 \mid x, \tau). $$

$$ \frac{1}{\sum_{i=1}^n t_i}\sum_{i:t_i=1}^n \left{ Y_i(t_i=1) - E[Y_i(t_i=0)] \right}, $$

where $t_i$ is a binary explanatory variable defining the treatment ($t_i=1$) and control ($t_i=0$) groups. Variation in the simulations are due to uncertainty in simulating $E[Y_i(t_i=0)]$, the counterfactual expected value of $Y_i$ for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to $t_i=0$.

$$ \frac{1}{\sum_{i=1}^n t_i}\sum_{i:t_i=1}^n \left{ Y_i(t_i=1) - \widehat{Y_i(t_i=0)} \right}, $$

where $t_i$ is a binary explanatory variable defining the treatment ($t_i=1$) and control ($t_i=0$) groups. Variation in the simulations are due to uncertainty in simulating $\widehat{Y_i(t_i=0)}$, the counterfactual predicted value of $Y_i$ for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to $t_i=0$.

Output Values

The Zelig object stores fields containing everything needed to rerun the Zelig output, and all the results and simulations as they are generated. In addition to the summary commands demonstrated above, some simply utility functions (known as getters) provide easy access to the raw fields most commonly of use for further investigation.

In the example above z.out1$get_coef() returns the estimated coefficients, z.out1$get_vcov() returns the estimated covariance matrix, and z.out1$get_predict() provides predicted values for all observations in the dataset from the analysis.

Differences with Stata Version

The Stata version of ReLogit and the R implementation differ slightly in their coefficient estimates due to differences in the matrix inversion routines implemented in R and Stata. Zelig uses orthogonal-triangular decomposition (through lm.influence()) to compute the bias term, which is more numerically stable than standard matrix calculations.

See also

z5 <- zrelogit$new()
z5$references()


IQSS/Zelig documentation built on Dec. 11, 2023, 1:51 a.m.