library("knitr") knitr::opts_knit$set(self.contained = FALSE) knitr::opts_chunk$set(tidy = TRUE, collapse=TRUE, comment = "#>", tidy.opts=list(blank=FALSE, width.cutoff=55)) oldoptions <- options(digits=3)
This package implements small-sample degrees of freedom adjustments to robust and cluster-robust standard errors in linear regression, as discussed in @ImKo16. The implementation can handle models with fixed effects, and cases with a large number of observations or clusters
[^1]: We thank Bruce Hansen for comments and Ulrich Müller for suggesting to us a version of Lemma 2 below.
library(dfadjust)
To give some examples, let us construct an artificial dataset with 11 clusters
set.seed(7) d1 <- data.frame(y=rnorm(1000), x1=c(rep(1, 3), rep(0, 997)), x2=c(rep(1, 150), rep(0, 850)), x3=rnorm(1000), cl=as.factor(c(rep(1:10, each=50), rep(11, 500))))
Let us first run a regression of y
on x1
. This is a case in which, in spite
of moderate data size, the effective number of observations is small since there
are only three treated units:
r1 <- lm(y~x1, data=d1) ## No clustering dfadjustSE(r1)
We can see that the usual robust standard errors (HC1 se
) are much smaller
than the effective standard errors (Adj. se
), which are computed by taking the
HC2
standard errors and applying a degrees of freedom adjustment.
Now consider a cluster-robust regression of y
on x2
. There are
only 3 treated clusters, so the effective number of observations is again small:
r1 <- lm(y~x2, data=d1) # Default Imbens-Kolesár method dfadjustSE(r1, clustervar=d1$cl) # Bell-McCaffrey method dfadjustSE(r1, clustervar=d1$cl, IK=FALSE)
Now, let us run a regression of y
on x3
, with fixed effects. Since we're
only interested in x3
, we specify that we only want inference on the second
element:
r1 <- lm(y~x3+cl, data=d1) dfadjustSE(r1, clustervar=d1$cl, ell=c(0, 1, rep(0, r1$rank-2))) dfadjustSE(r1, clustervar=d1$cl, ell=c(0, 1, rep(0, r1$rank-2)), IK=FALSE)
Finally, an example in which the clusters are large. We have 500,000 observations:
d2 <- do.call("rbind", replicate(500, d1, simplify = FALSE)) d2$y <- rnorm(length(d2$y)) r2 <- lm(y~x2, data=d2) summary(r2) # Default Imbens-Kolesár method dfadjustSE(r2, clustervar=d2$cl) # Bell-McCaffrey method dfadjustSE(r2, clustervar=d2$cl, IK=FALSE)
This section describes the implementation of the @ImKo16 and @BeMc02 degrees of freedom adjustments.
There are $S$ clusters, and we observe $n_{s}$ observations in cluster $s$, for a total of $n=\sum_{s=1}^{S}n_{s}$ observations. We handle the case with independent observations by letting each observation be in its own cluster, with $S=n$. Consider the linear regression of a scalar outcome $Y_{i}$ onto a $p$-vector of regressors $X_{i}$, \begin{equation} Y_{i}=X_{i}'\beta+u_{i},\qquad E[u_{i}\mid X_{i}]=0. \end{equation} We're interested in inference on $\ell'\beta$ for some fixed vector $\ell\in\mathbb{R}^{p}$. Let $X$, $u$, and $Y$ denote the design matrix, and error and outcome vectors, respectively. For any $n\times k$ matrix $M$, let ${M}{s}$ denote the $n{s}\times k$ block corresponding to cluster $s$, so that, for instance, $Y_{s}$ corresponds to the outcome vector in cluster $s$. For a positive semi-definite matrix ${M}$, let ${M}^{1/2}$ be a matrix satisfying ${{M}^{1/2}}'{M}^{1/2}={M}$, such as its symmetric square root or its Cholesky decomposition.
Assume that
\begin{equation}
E[u_{s}u_{s}'\mid X]=\Omega_{s},\quad\text{and}\quad
E[u_{s}u_{t}'\mid X]=0\quad\text{if $s\neq t$}.
\end{equation}
Denote the conditional variance matrix of $u$ by $\Omega$, so that $\Omega_{s}$
is the block of $\Omega$ corresponding to cluster $s$. We estimate $\ell'\beta$
using OLS. In R
, the OLS estimator is computed via a QR decomposition, $X=QR$,
where $Q' Q=I$ and $R$ is upper-triangular, so we can write the estimator as
\begin{equation}
\ell'\hat{\beta}=\ell'\left(\sum_{s}X_{s}'X_{s}\right)^{-1}\sum_{s}X_{s}Y_{s}
=\tilde{\ell}'\sum_{s}Q_{s}'Y_{s},\qquad \tilde{\ell}={R^{-1}}'\ell.
\end{equation}
It has variance
\begin{equation}
V:= \var(\ell'\hat{\beta}\mid X)
=\ell'\left(X' X\right)^{-1}
\sum_{s}X_{s}'\Omega_{s}X_{s}\left(X' X\right)^{-1}\ell
=\tilde{\ell}'\sum_{s}Q_{s}'\Omega_{s}Q_{s}
\tilde{\ell}.
\end{equation}
We estimate $V$ using a variance estimator that generalizes the HC2 variance estimator to clustering. Relative to the LZ2 estimator described in @ImKo16, we use a slight modification that allows for fixed effects: \begin{equation} \hat{V}=\ell'(X' X)^{-1}\sum_{s}^{}X'{s}{A}{s}\hat{u}{s}\hat{u}{s}' {A}{s}'X{s}(X' X)^{-1}\ell =\ell' R^{-1}\sum_{s}^{}Q'{s}{A}{s}\hat{u}{s}\hat{u}{s}' {A}{s}'Q{s}{R'}^{-1}\ell =\sum_{s=1}^{S}(\hat{u}{s}'a{s})^{2}, \end{equation} where \begin{equation} \hat{u}{s}:=Y{s}-X_{s}\hat{\beta} =u_{s}-Q_{s}Q' u,\qquad a_{s}={A}{s}'Q{s}\tilde{\ell}, \end{equation}
and $A_s$ is a generalized inverse of the symmetric square root of $I-Q_s Q_s'$, the block of the hat matrix corresponding to cluster $s$. In presence of cluster-specific fixed effects, $I-Q_s Q_s'$ is not generally invertible, which necessitates taking a generalized inverse. So long as the vector $\ell$ doesn't load on these fixed effects, $\hat{V}$ will be unbiased under homoskedasticity, as the next result, which slightly generalizes the Theorem 1 in @PuTi18, shows.
\begin{lemma} Suppose that $X=(W,L)$ is full rank, and suppose that the vector $\ell$ loads only on elements of $W$. Let $\ddot{W}$ denote the residual from projecting $W$ onto $L$, and suppose that for each cluster $s$, (i) $L_s'\ddot{W}s=0$ and that (ii) $\sum{k=1}^S I(k\neq s)\ddot{W}_k'\ddot{W}_k$ is full rank. Then $\hat{V}$ is unbiased under homoskedasticity. \end{lemma}
The proof is given in the last section. By definition of projection, $L$ and $\ddot{W}$ are orthogonal. Condition (i) of the lemma strengthens this requirement to orthogonality within each cluster. It holds if $L$ corresponds to a vector of cluster fixed effects, or more generally if $L$ contains cluster-specific variables. Condition (ii) ensures that after partialling out $L$, it is feasible to run leave-one-cluster-out regressions. Without clustering, the condition is equivalent to the requirement that the partial leverages associated with $\ddot{W}$ are smaller than one.[^2]
[^2]: To see this, let $H=\ddot{W}(\ddot{W}'\ddot{W})^{-1}\ddot{W}'$ denote the partial projection matrix. Since $H=H^{2}$, \begin{equation}H_{ii}-H_{ii}^{2}=\sum_{j\neq i}H_{i j}H_{j i}=\ddot{W}{i}'(\ddot{W}'\ddot{W})^{-1} [\sum{j\neq i}\ddot{W}{j}\ddot{W}{j}'] (\ddot{W}'\ddot{W})^{-1}\ddot{W}_{i}.\end{equation} so that $H_{ii}=1$ iff $\sum_{j\neq i}\ddot{W}{j}\ddot{W}{j}'$ is reduced rank.
If the observations are independent, the vector of leverages $(Q_{1}'Q_{1},
\dotsc, Q_{n}'Q_{n})$ can be computed directly using the stats::hatvalues
function. In this case, we use this function to compute
$A_{i}=1/\sqrt{1-Q_{i}'Q_{i}}$ directly, and we then compute
$a_{i}=A_{i}Q_{i}'\tilde{\ell}$ using vector operations. For the case with
clustering, computing an inverse of $I-Q_{s}Q_{s}'$ can be expensive or even
infeasible if the cluster size $n_s$ is large. We therefore use the following
result, which allows us to compute $a_{s}$ by computing a spectral decomposition
of a $p\times p$ matrix.
\begin{lemma} Let $Q_{s}'Q_{s}=\sum_{i=1}^{p}\lambda_{is}r_{is}r_{is}'$ be the spectral decomposition of $Q_{s}'Q_{s}$. Then $a_s=Q_{s}D_{s} \tilde{\ell}$, where $\qquad D_{s}=\sum_{i\colon \lambda_{i}\neq 1}(1-\lambda_{i})^{-1/2}r_{is}r_{is}'$. \end{lemma}
The lemma follows from the fact that $I-Q_{s}Q_{s}'$ has eigenvalues $1-\lambda_{is}$ and eigenvectors $Q_{s}r_{is}$. More precisely, let $Q_{s}=\sum_{i}\lambda_{is}^{1/2}u_{is}r_{is}'$ denote the singular value decomposition of $Q_{s}$, so that $I-Q_{s}Q_{s}'=\sum_{i}(1-\lambda_{is})u_{is}u_{is}'$, and we can take $A_{s}=\sum_{i\colon \lambda_{is}\neq 1}(1-\lambda_{is})^{-1/2}u_{is}u_{is}'$. Then, \begin{equation} A_{s}'Q_{s}=\sum_{i\colon \lambda_{i}\neq 1} (1-\lambda_{is})^{-1/2}\lambda_{is}^{1/2} u_{is} r_{is}' =Q_{s}\sum_{i\colon \lambda_{i}\neq 1}(1-\lambda_{is})^{-1/2}r_{is} r_{is}', \end{equation} where the second equality uses $Q_{s}r_{is}=\lambda_{is}^{1/2}u_{is}$.
Let $G$ be an $n\times S$ matrix with columns $(I-QQ'){s}'a{s}$. Then the
@BeMc02 adjustment sets the degrees of freedom to
\begin{equation}
f_{\text{BM}}=\frac{\trace(G' G)^{2}}{\trace((G' G)^{2})}.
\end{equation}
Since $(G' G){st}=a{s}'(I-QQ'){s}(I-QQ){t}'a_{t}=a_{s}(\1{s=t}-Q_{s}Q_{t}')a_{t}$,
the matrix $G' G$ can be efficiently computed as
\begin{equation}
G' G=\diag(a_{s}'a_{s})-BB'\qquad B_{s k}=a_{s}'Q_{s k}.
\end{equation}
Note that $B$ is an $S\times p$ matrix, so that computing the degrees of freedom
adjustment only involves $p\times p$ matrices:
\begin{align}
f_{\text{BM}}=\frac{(\sum_{s}a_{s}'a_{s}-\sum_{s,k}B_{s k}^{2})^{2}}{
\sum_{s}(a_{s}'a_{s})^{2}-2\sum_{s,k}(a_{s}'a_{s})B_{s k}^{2}+\sum_{s,t}(B_{s}'B_{t})^{2}
}.
\end{align}
If the observations are independent, we compute $B$ directly as B <- a*Q
,
and since $a_{i}$ is a scalar, we have
\begin{equation}
f_{\text{BM}}=\frac{(\sum_{i}a_{i}^{2}-\sum_{s k}B_{s k}^{2})^{2}}{
\sum_{i}a_{i}^{4}-2\sum_{i}a_{i}^{2}B_{i}'B_{i}+\sum_{i, j}(B_{i}'B_{j})^{2}}.
\end{equation}
The @ImKo16 degrees of freedom adjustment instead sets \begin{equation} f_{IK}=\frac{\trace({G}'\hat{\Omega} G)^{2}}{\trace(({G}'\hat{\Omega} G)^{2})}, \end{equation}
where $\hat{\Omega}$ is an estimate of the @moulton86 model of the covariance matrix, under which $\Omega_{s}=\sigma_{\epsilon}^{2}I_{n_{s}}+\rho\iota_{n_{s}}\iota_{n_{s}}'$. Using simple algebra, one can show that in this case, \begin{equation} G'\Omega G=\sigma_{\epsilon}^{2}\diag(a_{s}'a_{s}) -\sigma_{\epsilon}^{2}BB'+\rho (D-BF')(D-BF')', \end{equation} where \begin{equation} F_{s k}=\iota_{n_{s}}'Q_{s k},\qquad D=\diag(a_{s}'\iota_{n_{s}}) \end{equation} which can again be computed even if the clusters are large. The estimate $\hat{\Omega}$ replaces $\sigma_{\epsilon}^{2}$ and $\rho$ with analog estimates.
options(oldoptions)
The estimator of the block of $V$ associated with $W$ implied by $\hat{V}$ is given by \begin{equation} (\ddot{W}'\ddot{W})^{-1} \sum_{s}\ddot{W}{s}'A{s} (I-QQ'){s} u u'(I-QQ'){s}'A_{s}'\ddot{W}_{s} (\ddot{W}'\ddot{W})^{-1}, \end{equation} which is unbiased under homoskedasticity if for each $s$, \begin{equation}\label{equation:1} \ddot{W}{s}'A{s}(I-Q_{s}Q_{s}')A_{s}'\ddot{W}{s} = \ddot{W}{s}'\ddot{W}{s}. \end{equation} We will show that \eqref{equation:1} holds. To this end, we first claim that under conditions (i) and (ii), $\ddot{W}{s}$ is in the column space of $I-Q_{s}Q_{s}'$ (a claim that's trivial if this matrix is full rank). Decompose $I-QQ'=I-H_{\ddot{W}}-H_{L}$, where $H_{\ddot{W}}$ and $H_{L}$ are hat matrices associated with $\ddot{W}$ and $L$. The block associated with cluster $s$ can thus be written as $I-Q_{s}Q_{s}'=I-L_{s}(L' L)^{-1}L_{s}'-\ddot{W}{s}(\ddot{W}'\ddot{W})^{-1}\ddot{W}{s}'$. Let $B_{s}=\ddot{W}{s}(\ddot{W}'\ddot{W}-\ddot{W}{s}'\ddot{W}{s})^{-1}\ddot{W}'\ddot{W}$, which is well-defined under condition (ii). Then, using condition (i), we get \begin{equation} \begin{split} (I-Q_{s}Q_{s}')B_{s} &=(I-\ddot{W}{s}(\ddot{W}'\ddot{W})^{-1}\ddot{W}{s}')B_{s}\ &=\ddot{W}{s}(I-(\ddot{W}'\ddot{W})^{-1}\ddot{W}{s}'\ddot{W}{s}) (\ddot{W}'\ddot{W}-\ddot{W}{s}'\ddot{W}{s})^{-1}\ddot{W}'\ddot{W}=\ddot{W}{s}, \end{split} \end{equation} proving the claim. Letting $C$ denote the symmetric square root of $I-Q{s}Q_{s}'$, the left-hand side of \eqref{equation:1} can therefore be written as \begin{equation} \ddot{W}{s}'A{s}(I-Q_{s}Q_{s}')A_{s}'\ddot{W}{s} = B{s}'CC A_{s} CC A_{s}' CC B_{s}=\ddot{W}{s}'\ddot{W}{s}, \end{equation} where the second equality follows by the definition of a generalized inverse.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.