Robust Empirical Bayes Confidence Intervals
In ebci: Robust Empirical Bayes Confidence Intervals

library("knitr")
knitr::opts_knit$set(self.contained = FALSE)
knitr::opts_chunk$set(tidy = TRUE, collapse=TRUE, comment = "#>",
                      tidy.opts=list(blank=FALSE, width.cutoff=55))

The package ebci implements robust empirical Bayes confidence intervals (EBCIs) proposed by @akp20 for inference in a normal means model $Y_i\sim N(\theta_i, \sigma^2_i)$, $i=1, \dotsc, n$.

Setup

Suppose we use an empirical Bayes estimator of $\theta_i$ that shrinks toward the predictor based on the regression of $\theta_i$ onto $X_i$ (equivalently, regression of $Y_i$ onto $X_i$), \begin{equation}\label{Bayes-estimator} \hat{\theta}{i} = X_i'\delta+w{EB, i}(Y_i-X_i'\delta), \end{equation}

where $\delta=E[X_i X_i']^{-1}E[X_i\theta_i]$, $w_{EB, i}=\frac{\mu_{2}}{\mu_{2}+\sigma_{i}^{2}}$ is the degree of shrinkage, and \begin{equation}\label{mu2-independence} \mu_{2}=E[(\theta_i-X_i'\delta)^{2} \mid X_{i}, \sigma_i]. \end{equation} is the second moment of the regression residual. We assume that $\mu_2$ doesn't depend on $\sigma_i$. Under this setup, @morris83 proposes to use the parametric EBCI \begin{equation} \hat{\theta}{i} \pm \frac{z{1-\alpha/2}}{\sqrt{w_{EB, i}}}w_{EB, i}\sigma_i. \end{equation}

The critical value $z_{1-\alpha/2}/\sqrt{w_{EB, i}}$ is larger than the usual critical value $z_{1-\alpha/2}=$qnorm(1-alpha/2) if the estimator was unbiased conditional on $\theta_i$. This CI is justified if we strengthen the assumption (\ref{mu2-independence}) by making the normality assumption $\theta_i\mid X_{i}, \sigma_{i} \sim N(X_i'\delta, \mu_2)$. However, if the residual $\theta_i-X_i'\delta$ is not normally distributed, this CI will undercover. @akp20 derive a robust EBCI that is only uses (\ref{mu2-independence}) and not the normality assumption. The EBCI takes the form \begin{equation}\label{robust-ebci} \hat{\theta}{i} \pm cva{\alpha}(m_{2i}, \infty) w_{EB, i}\sigma_i, \,\, m_{2i}=(1-1/w_{EB, i})^2\mu_2/\sigma^2_i=\sigma^{2}{i}/\mu{2}, \end{equation} where the critical value $cva_{\alpha}$ is derived in @akp20. Here $m_{2i}$ is the second moment of the conditional bias of $\hat{\theta}i$, scaled by the standard error (normalized bias, henceforth). This critical value imposes a constraint (\ref{mu2-independence}) on the second moment of $\theta_i$, but no constraints on higher moments. We can make the critical value smaller by also imposing a constraint on the kurtosis of $\theta_i$ (or equivalently, the kurtosis of the normalized bias) \begin{equation}\label{kappa-independence} \kappa=E[(\theta_i-X_i'\delta)^{4} \mid X{i}, \sigma_i]/\mu_{2}^2. \end{equation} In analogy to (\ref{mu2-independence}), we assume here that the conditional fourth moment of $\theta_i-X_i'\delta$ doesn't depend on $(X_i,\sigma_i)$. In this case, the robust EBCI takes the form \begin{equation} \hat{\theta}{i} \pm cva{\alpha}(m_{2i},\kappa) w_{EB, i}\sigma_i. \end{equation}

These critical values are implemented in the package by the cva function:

library("ebci")
## If m_2=0, then we get the usual critical value
cva(m2=0, kappa=Inf, alpha=0.05)$cv
## Otherwise the critical value is larger:
cva(m2=4, kappa=Inf, alpha=0.05)$cv
## Imposing a constraint on kurtosis tightens it
cva(m2=4, kappa=3, alpha=0.05)$cv

In practice, the parameters $\delta, \mu_2$, and $\kappa$ are unknown. To implement the EBCI, the package replaces them with consistent estimates, following the baseline implementation in @akp20. We illustrate this in the next section.

Example

Here we illustrate the use of the package using a dataset from @ChHe18ii (CH hereafter). The dataset is included in the package as the list cz. Run ?cz for a full description of the dataset. As in @ChHe18ii, we use precision weights proportional to the inverse of the squared standard error to compute $(\delta,\mu_2,\kappa)$.

## As Y_i, use fixed effect estimate theta25 of causal effect of neighborhood
## for children with parents at the 25th percentile of income distribution. The
## standard error for this estimate is se25. As predictors use average outcome
## for permanent residents (stayers), stayer25. Let us use 90% CIs, as in
## Armstrong et al
r <- ebci(formula=theta25~stayer25, data=cz, se=se25, weights=1/se25^2,
          alpha=0.1)

For shrinkage toward the grand mean, or toward zero, use the specification theta25 ~ 1, and theta25 ~ 0, respectively, in the formula argument of ebci.

The return value contains (see ?ebci for full description)

The least squares estimate of $\delta$: r r$delta
Estimates of $\mu_2$ and $\kappa$. The estimate used for EBCI calculations (estimate) is obtained by applying a finite-sample correction to an initial method of moments estimate (uncorrected_estimate). This correction ensures that we don't shrink all the way to zero (or past zero) if the method-of-moments estimate of $\mu_2$ is negative (see @akp20 for details): r c(r$mu2, r$kappa)
The parameter $\alpha$ determining the confidence level, r$alpha, and the matrix of regressors, r$X.
A data frame with columns: r names(r$df)

The columns of the data frame refer to:

w_eb Empirical Bayes shrinkage factor $w_{EB, i}=\mu_2/(\mu_2+\sigma_i^2)$.
th_eb Empirical Bayes estimator $\hat{\theta_i}$ given in (\ref{Bayes-estimator})
len_eb Half-length $cva_{\alpha}(m_2, \kappa)w_i\sigma_i$ of the robust EBCI, so that the lower endpoint of the EBCIs are given by th_eb-len_eb, and the upper endpoint by th_eb+len_eb. Let us verify this for the first observation: r cva(r$df$se[1]^2/r$mu2[1], r$kappa[1], alpha=0.1)$cv*r$df$w_eb[1]*r$df$se[1] r$df$len_eb[1]
len_pa Half-length $z_{1-\alpha/2}\sqrt{w_i}\sigma_i$ of the parametric EBCI.
w_opt Shrinkage factor that optimizes the length of the resulting confidence interval. In other words, instead of using $w_{EB, i}$ in (\ref{robust-ebci}), we use shrinkage $w_i$ that minimizes $cva_{\alpha}((1-1/w_{EB, i})^2\mu_2/\sigma^2_i, \kappa) w_{i}\sigma_i$. See @akp20 for details. The vector is missing here since the default option, wopt=FALSE, is to skip computation of the length-optimal CIs to speed up the calculations.
th_op Estimator based on the length-optimal shrinkage factor w_opt (missing here since the default is wopt=FALSE)
len_op Half-length $cva_{\alpha}((1-1/w_{EB, i})^2\mu_2/\sigma^2_i, \kappa) w_{i}\sigma_i$ of the length-optimal EBCI (missing here since we specified wopt=FALSE).
th_us The unshrunk estimate $Y_i$, as specified in the formula argument of the function ebci.
len_us Half-length $z_{1-\alpha/2}\sigma_i$ of the CI based on the unshrunk estimate
se The standard error $\sigma_i$, as specified by the argument se of the ebci function.
ncov_pa maximal non-coverage of the parametric EBCI.

Using the data frame, we can give a table summarizing the results. Let us show the results for the CZ in New York:

df <- (cbind(cz[!is.na(cz$se25), ], r$df))
df <- df[df$state=="NY", ]

knitr::kable(data.frame(cz=df$czname, unshrunk_estimate=df$theta25,
             estimate=df$th_eb,
             lower_ci=df$th_eb-df$len_eb, upper_ci=df$th_eb+df$len_eb),
             digits=3)

@akp20 present the same information as a figure.

Finally, let us compute some summary statistics as in Table 3 in @akp20. Average half-length of the robust, parametric, and unshrunk CI:

mean(r$df$len_eb)
mean(r$df$len_pa)
mean(r$df$len_us)

The efficiency of the parametric and unshrunk CI relative to the robust EBCI is given by

mean(r$df$len_eb)/mean(r$df$len_pa)
mean(r$df$len_eb)/mean(r$df$len_us)

While the parametric EBCI is shorter on average, it yields CIs that may violate the 90% coverage requirement. In particular, the average maximal non-coverage probability at the estimated value of $(\mu_{2},\kappa)$ is given by