library("knitr") knitr::opts_knit$set(self.contained = FALSE) knitr::opts_chunk$set(tidy = TRUE, collapse=TRUE, comment = "#>", tidy.opts=list(blank=FALSE, width.cutoff=55)) oldoptions <- options(digits=3)
This package implements small-sample degrees of freedom adjustments to robust and cluster-robust standard errors in linear regression, as discussed in @ImKo16. The implementation can handle models with fixed effects, and cases with a large number of observations or clusters
[^1]: We thank Bruce Hansen for comments and Ulrich Müller for suggesting to us a version of Lemma 2 below.
library(dfadjust)
To give some examples, let us construct an artificial dataset with 11 clusters
set.seed(7) d1 <- data.frame(y=rnorm(1000), x1=c(rep(1, 3), rep(0, 997)), x2=c(rep(1, 150), rep(0, 850)), x3=rnorm(1000), cl=as.factor(c(rep(1:10, each=50), rep(11, 500))))
Let us first run a regression of y
on x1
. This is a case in which, in spite
of moderate data size, the effective number of observations is small since there
are only three treated units:
r1 <- lm(y~x1, data=d1) ## No clustering dfadjustSE(r1)
We can see that the usual robust standard errors (HC1 se
) are much smaller
than the effective standard errors (Adj. se
), which are computed by taking the
HC2
standard errors and applying a degrees of freedom adjustment.
Now consider a cluster-robust regression of y
on x2
. There are
only 3 treated clusters, so the effective number of observations is again small:
r1 <- lm(y~x2, data=d1) # Default Imbens-Kolesár method dfadjustSE(r1, clustervar=d1$cl) # Bell-McCaffrey method dfadjustSE(r1, clustervar=d1$cl, IK=FALSE)
Now, let us run a regression of y
on x3
, with fixed effects. Since we're
only interested in x3
, we specify that we only want inference on the second
element:
r1 <- lm(y~x3+cl, data=d1) dfadjustSE(r1, clustervar=d1$cl, ell=c(0, 1, rep(0, r1$rank-2))) dfadjustSE(r1, clustervar=d1$cl, ell=c(0, 1, rep(0, r1$rank-2)), IK=FALSE)
Finally, an example in which the clusters are large. We have 500,000 observations:
d2 <- do.call("rbind", replicate(500, d1, simplify = FALSE)) d2$y <- rnorm(length(d2$y)) r2 <- lm(y~x2, data=d2) summary(r2) # Default Imbens-Kolesár method dfadjustSE(r2, clustervar=d2$cl) # Bell-McCaffrey method dfadjustSE(r2, clustervar=d2$cl, IK=FALSE)
This section describes the implementation of the @ImKo16 and @BeMc02 degrees of freedom adjustments.
There are $S$ clusters, and we observe $n_{s}$ observations in cluster $s$, for a total of $n=\sum_{s=1}^{S}n_{s}$ observations. We handle the case with independent observations by letting each observation be in its own cluster, with $S=n$. Consider the linear regression of a scalar outcome $Y_{i}$ onto a $p$-vector of regressors $X_{i}$, \begin{equation} Y_{i}=X_{i}'\beta+u_{i},\qquad E[u_{i}\mid X_{i}]=0. \end{equation} We're interested in inference on $\ell'\beta$ for some fixed vector $\ell\in\mathbb{R}^{p}$. Let $X$, $u$, and $Y$ denote the design matrix, and error and outcome vectors, respectively. For any $n\times k$ matrix $M$, let ${M}{s}$ denote the $n{s}\times k$ block corresponding to cluster $s$, so that, for instance, $Y_{s}$ corresponds to the outcome vector in cluster $s$. For a positive semi-definite matrix ${M}$, let ${M}^{1/2}$ be a matrix satisfying ${{M}^{1/2}}'{M}^{1/2}={M}$, such as its symmetric square root or its Cholesky decomposition.
Assume that
\begin{equation}
E[u_{s}u_{s}'\mid X]=\Omega_{s},\quad\text{and}\quad
E[u_{s}u_{t}'\mid X]=0\quad\text{if $s\neq t$}.
\end{equation}
Denote the conditional variance matrix of $u$ by $\Omega$, so that $\Omega_{s}$
is the block of $\Omega$ corresponding to cluster $s$. We estimate $\ell'\beta$
using OLS. In R
, the OLS estimator is computed via a QR decomposition, $X=QR$,
where $Q' Q=I$ and $R$ is upper-triangular, so we can write the estimator as
\begin{equation}
\ell'\hat{\beta}=\ell'\left(\sum_{s}X_{s}'X_{s}\right)^{-1}\sum_{s}X_{s}Y_{s}
=\tilde{\ell}'\sum_{s}Q_{s}'Y_{s},\qquad \tilde{\ell}={R^{-1}}'\ell.
\end{equation}
It has variance
\begin{equation}
V:= \var(\ell'\hat{\beta}\mid X)
=\ell'\left(X' X\right)^{-1}
\sum_{s}X_{s}'\Omega_{s}X_{s}\left(X' X\right)^{-1}\ell
=\tilde{\ell}'\sum_{s}Q_{s}'\Omega_{s}Q_{s}
\tilde{\ell}.
\end{equation}
We estimate $V$ using a variance estimator that generalizes the HC2 variance estimator to clustering. Relative to the LZ2 estimator described in @ImKo16, we use a slight modification that allows for fixed effects: \begin{equation} \hat{V}=\ell'(X' X)^{-1}\sum_{s}^{}X'{s}{A}{s}\hat{u}{s}\hat{u}{s}' {A}{s}'X{s}(X' X)^{-1}\ell =\ell' R^{-1}\sum_{s}^{}Q'{s}{A}{s}\hat{u}{s}\hat{u}{s}' {A}{s}'Q{s}{R'}^{-1}\ell =\sum_{s=1}^{S}(\hat{u}{s}'a{s})^{2}, \end{equation} where \begin{equation} \hat{u}{s}:=Y{s}-X_{s}\hat{\beta} =u_{s}-Q_{s}Q' u,\qquad a_{s}={A}{s}'Q{s}\tilde{\ell}, \end{equation}
and $A_s$ is a generalized inverse of the symmetric square root of $I-Q_s Q_s'$, the block of the hat matrix corresponding to cluster $s$. In presence of cluster-specific fixed effects, $I-Q_s Q_s'$ is not generally invertible, which necessitates taking a generalized inverse. So long as the vector $\ell$ doesn't load on these fixed effects, $\hat{V}$ will be unbiased under homoskedasticity, as the next result, which slightly generalizes the Theorem 1 in @PuTi18, shows.
\begin{lemma} Suppose that $X=(W,L)$ is full rank, and suppose that the vector $\ell$ loads only on elements of $W$. Let $\ddot{W}$ denote the residual from projecting $W$ onto $L$, and suppose that for each cluster $s$, (i) $L_s'\ddot{W}s=0$ and that (ii) $\sum{k=1}^S I(k\neq s)\ddot{W}_k'\ddot{W}_k$ is full rank. Then $\hat{V}$ is unbiased under homoskedasticity. \end{lemma}
The proof is given in the last section. By definition of projection, $L$ and $\ddot{W}$ are orthogonal. Condition (i) of the lemma strengthens this requirement to orthogonality within each cluster. It holds if $L$ corresponds to a vector of cluster fixed effects, or more generally if $L$ contains cluster-specific variables. Condition (ii) ensures that after partialling out $L$, it is feasible to run leave-one-cluster-out regressions. Without clustering, the condition is equivalent to the requirement that the partial leverages associated with $\ddot{W}$ are smaller than one.[^2]
[^2]: To see this, let $H=\ddot{W}(\ddot{W}'\ddot{W})^{-1}\ddot{W}'$ denote the partial projection matrix. Since $H=H^{2}$, \begin{equation}H_{ii}-H_{ii}^{2}=\sum_{j\neq i}H_{i j}H_{j i}=\ddot{W}{i}'(\ddot{W}'\ddot{W})^{-1} [\sum{j\neq i}\ddot{W}{j}\ddot{W}{j}'] (\ddot{W}'\ddot{W})^{-1}\ddot{W}_{i}.\end{equation} so that $H_{ii}=1$ iff $\sum_{j\neq i}\ddot{W}{j}\ddot{W}{j}'$ is reduced rank.
If the observations are independent, the vector of leverages $(Q_{1}'Q_{1},
\dotsc, Q_{n}'Q_{n})$ can be computed directly using the stats::hatvalues
function. In this case, we use this function to compute
$A_{i}=1/\sqrt{1-Q_{i}'Q_{i}}$ directly, and we then compute
$a_{i}=A_{i}Q_{i}'\tilde{\ell}$ using vector operations. For the case with
clustering, computing an inverse of $I-Q_{s}Q_{s}'$ can be expensive or even
infeasible if the cluster size $n_s$ is large. We therefore use the following
result, which allows us to compute $a_{s}$ by computing a spectral decomposition
of a $p\times p$ matrix.
\begin{lemma} Let $Q_{s}'Q_{s}=\sum_{i=1}^{p}\lambda_{is}r_{is}r_{is}'$ be the spectral decomposition of $Q_{s}'Q_{s}$. Then $a_s=Q_{s}D_{s} \tilde{\ell}$, where $\qquad D_{s}=\sum_{i\colon \lambda_{i}\neq 1}(1-\lambda_{i})^{-1/2}r_{is}r_{is}'$. \end{lemma}
The lemma follows from the fact that $I-Q_{s}Q_{s}'$ has eigenvalues $1-\lambda_{is}$ and eigenvectors $Q_{s}r_{is}$. More precisely, let $Q_{s}=\sum_{i}\lambda_{is}^{1/2}u_{is}r_{is}'$ denote the singular value decomposition of $Q_{s}$, so that $I-Q_{s}Q_{s}'=\sum_{i}(1-\lambda_{is})u_{is}u_{is}'$, and we can take $A_{s}=\sum_{i\colon \lambda_{is}\neq 1}(1-\lambda_{is})^{-1/2}u_{is}u_{is}'$. Then, \begin{equation} A_{s}'Q_{s}=\sum_{i\colon \lambda_{i}\neq 1} (1-\lambda_{is})^{-1/2}\lambda_{is}^{1/2} u_{is} r_{is}' =Q_{s}\sum_{i\colon \lambda_{i}\neq 1}(1-\lambda_{is})^{-1/2}r_{is} r_{is}', \end{equation} where the second equality uses $Q_{s}r_{is}=\lambda_{is}^{1/2}u_{is}$.
Let $G$ be an $n\times S$ matrix with columns $(I-QQ'){s}'a{s}$. Then the
@BeMc02 adjustment sets the degrees of freedom to
\begin{equation}
f_{\text{BM}}=\frac{\trace(G' G)^{2}}{\trace((G' G)^{2})}.
\end{equation}
Since $(G' G){st}=a{s}'(I-QQ'){s}(I-QQ){t}'a_{t}=a_{s}(\1{s=t}-Q_{s}Q_{t}')a_{t}$,
the matrix $G' G$ can be efficiently computed as
\begin{equation}
G' G=\diag(a_{s}'a_{s})-BB'\qquad B_{s k}=a_{s}'Q_{s k}.
\end{equation}
Note that $B$ is an $S\times p$ matrix, so that computing the degrees of freedom
adjustment only involves $p\times p$ matrices:
\begin{align}
f_{\text{BM}}=\frac{(\sum_{s}a_{s}'a_{s}-\sum_{s,k}B_{s k}^{2})^{2}}{
\sum_{s}(a_{s}'a_{s})^{2}-2\sum_{s,k}(a_{s}'a_{s})B_{s k}^{2}+\sum_{s,t}(B_{s}'B_{t})^{2}
}.
\end{align}
If the observations are independent, we compute $B$ directly as B <- a*Q
,
and since $a_{i}$ is a scalar, we have
\begin{equation}
f_{\text{BM}}=\frac{(\sum_{i}a_{i}^{2}-\sum_{s k}B_{s k}^{2})^{2}}{
\sum_{i}a_{i}^{4}-2\sum_{i}a_{i}^{2}B_{i}'B_{i}+\sum_{i, j}(B_{i}'B_{j})^{2}}.
\end{equation}
The @ImKo16 degrees of freedom adjustment instead sets \begin{equation} f_{IK}=\frac{\trace({G}'\hat{\Omega} G)^{2}}{\trace(({G}'\hat{\Omega} G)^{2})}, \end{equation}
where $\hat{\Omega}$ is an estimate of the @moulton86 model of the covariance matrix, under which $\Omega_{s}=\sigma_{\epsilon}^{2}I_{n_{s}}+\rho\iota_{n_{s}}\iota_{n_{s}}'$. Using simple algebra, one can show that in this case, \begin{equation} G'\Omega G=\sigma_{\epsilon}^{2}\diag(a_{s}'a_{s}) -\sigma_{\epsilon}^{2}BB'+\rho (D-BF')(D-BF')', \end{equation} where \begin{equation} F_{s k}=\iota_{n_{s}}'Q_{s k},\qquad D=\diag(a_{s}'\iota_{n_{s}}) \end{equation} which can again be computed even if the clusters are large. The estimate $\hat{\Omega}$ replaces $\sigma_{\epsilon}^{2}$ and $\rho$ with analog estimates.
options(oldoptions)
The estimator of the block of $V$ associated with $W$ implied by $\hat{V}$ is given by \begin{equation} (\ddot{W}'\ddot{W})^{-1} \sum_{s}\ddot{W}{s}'A{s} (I-QQ'){s} u u'(I-QQ'){s}'A_{s}'\ddot{W}_{s} (\ddot{W}'\ddot{W})^{-1}, \end{equation} which is unbiased under homoskedasticity if for each $s$, \begin{equation}\label{equation:1} \ddot{W}{s}'A{s}(I-Q_{s}Q_{s}')A_{s}'\ddot{W}{s} = \ddot{W}{s}'\ddot{W}{s}. \end{equation} We will show that \eqref{equation:1} holds. To this end, we first claim that under conditions (i) and (ii), $\ddot{W}{s}$ is in the column space of $I-Q_{s}Q_{s}'$ (a claim that's trivial if this matrix is full rank). Decompose $I-QQ'=I-H_{\ddot{W}}-H_{L}$, where $H_{\ddot{W}}$ and $H_{L}$ are hat matrices associated with $\ddot{W}$ and $L$. The block associated with cluster $s$ can thus be written as $I-Q_{s}Q_{s}'=I-L_{s}(L' L)^{-1}L_{s}'-\ddot{W}{s}(\ddot{W}'\ddot{W})^{-1}\ddot{W}{s}'$. Let $B_{s}=\ddot{W}{s}(\ddot{W}'\ddot{W}-\ddot{W}{s}'\ddot{W}{s})^{-1}\ddot{W}'\ddot{W}$, which is well-defined under condition (ii). Then, using condition (i), we get \begin{equation} \begin{split} (I-Q_{s}Q_{s}')B_{s} &=(I-\ddot{W}{s}(\ddot{W}'\ddot{W})^{-1}\ddot{W}{s}')B_{s}\ &=\ddot{W}{s}(I-(\ddot{W}'\ddot{W})^{-1}\ddot{W}{s}'\ddot{W}{s}) (\ddot{W}'\ddot{W}-\ddot{W}{s}'\ddot{W}{s})^{-1}\ddot{W}'\ddot{W}=\ddot{W}{s}, \end{split} \end{equation} proving the claim. Letting $C$ denote the symmetric square root of $I-Q{s}Q_{s}'$, the left-hand side of \eqref{equation:1} can therefore be written as \begin{equation} \ddot{W}{s}'A{s}(I-Q_{s}Q_{s}')A_{s}'\ddot{W}{s} = B{s}'CC A_{s} CC A_{s}' CC B_{s}=\ddot{W}{s}'\ddot{W}{s}, \end{equation} where the second equality follows by the definition of a generalized inverse.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.