set.seed(42)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(digits = 2)

This document provides the mathematical notes for each of the estimators in estimatr. The most up-to-date version of this can be found on the DeclareDesign website here.

Estimators

The current estimators we provide are:

lm_robust notes

The lm_robust method uses the C++ library Eigen, via the RcppEigen package, to estimate the coefficients, variance-covariance matrix, and, in some cases, the degrees of freedom of linear models.

The default estimators have been selected for efficiency in large samples and low bias in small samples as well as for their similarities to design-based randomization estimators [@samiiaronow2012]. This section outlines the various kinds of variance estimators one can employ within lm_robust.

Coefficient estimates

[ \widehat{\beta} =(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbf{y} ]

Our algorithm solves the least squares problem using a rank-revealing column-pivoting QR factorization that eliminates the need to invert $(\mathbf{X}^{\top}\mathbf{X})^{-1}$ explicitly and behaves much like the default lm function in R. However, when $\mathbf{X}$ is rank deficient, there are certain conditions under which the QR factorization algorithm we use, from the Eigen C++ library, drops different coefficients from the output than the default lm function. In general, users should avoid specifying models with rank-deficiencies. In fact, if users are certain their data are not rank deficient, they can improve the speed of lm_robust by setting try_cholesky = TRUE. This replaces the QR factorization with a Cholesky factorization that is only guaranteed to work $\mathbf{X}$ is of full rank.

Weights

If weights are included, we transform the data as below and then proceed as normal, following advice from @romanowolf2017 that this weighted estimator has attractive properties. We do so by first scaling all of the weights so that they sum to one. Then we multiply each row of the design matrix $\mathbf{X}$ by the square root each unit's weight, $\mathbf{x}_i \sqrt{w_i}$, and then do the same to the outcome, $\mathbf{y}_i \sqrt{w_i}$. This results in our coefficients being estimated as follows, where $\mathbf{W}$ is a diagonal matrix with the scaled weights on the diagonal.

Weighted: [ \widehat{\beta} =(\mathbf{X}^{\top}\mathbf{W}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbf{W}\mathbf{y} ]

The transformed data are then used in the analysis below, where $(\mathbf{X}^{\top}\mathbf{X})^{-1}$ is now $(\mathbf{X}^{\top}\mathbf{W}\mathbf{X})^{-1}$ and $\mathbf{X}$ is now $\mathbf{X} \mathrm{sqrt}[W]$, where $\mathrm{sqrt}[.]$ is an operator that applies a square root to the coefficients of some matrix.

We should note that this transformation yields the same standard errors as specifying weights using aweight in Stata for the "classical", "HC0", and "HC1" ("stata") variance estimators. Furthermore, in the clustered case, our weighted estimator for the "stata" cluster-robust variance also matches Stata. Thus Stata's main robust standard error estimators, "HC1" and their clustered estimator, match our package when weights are applied. Nonetheless, Stata uses a slightly different Hat matrix and thus "HC2" and "HC3" estimates in Stata when weights are specified may differ from our estimates---more on that here.

Variance

In addition to solving for OLS coefficients faster than lm, we provide a variety of robust variance estimators. Below we outline them for the non-clustered and clustered cases. You can see some simulations about the unbiasedness of the classical variance estimators with homoskedasticity and the consistency of the HC2 estimators with heteroskedasticity in these simulations.

Heteroskedasticity-Robust Variance and Degrees of Freedom

The default variance estimator without clusters is the HC2 variance, first proposed by @mackinnonwhite1985. This estimator has the advantage of being equivalent to a conservative randomization-based "Neyman" estimator of the variance [@samiiaronow2012]. Furthermore, while it is somewhat less efficient than the HC1 variance estimator, the default in Stata, it tends to perform better in small samples (evidence for that can be found in our simulations here).

| se_type = | Variance Estimator ($\widehat{\mathbb{V}}[\widehat{\beta}]$) | Degrees of Freedom | Notes | |----------|--------------------------|------|----------------------------| | "classical" | $\frac{\mathbf{e}^\top\mathbf{e}}{N-K} (\mathbf{X}^{\top}\mathbf{X})^{-1}$ | N - K | | | "HC0" | $(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathrm{diag}\left[e_i^2\right]\mathbf{X}(\mathbf{X}^{\top}\mathbf{X})^{-1}$ | N - K | | | "HC1", "stata"| $\frac{N}{N-K}(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathrm{diag}\left[e_i^2\right]\mathbf{X}(\mathbf{X}^{\top}\mathbf{X})^{-1}$ | N - K | Often called the Eicker-Huber-White variance (or similar) | | "HC2" (default) | $(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathrm{diag}\left[\frac{e_i^2}{1-h_{ii}}\right]\mathbf{X}(\mathbf{X}^{\top}\mathbf{X})^{-1}$ | N - K | | | "HC3" | $(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}{\top}\mathrm{diag}\left[\frac{e_i^2}{(1-h_{ii})^2}\right]\mathbf{X}(\mathbf{X}^{\top}\mathbf{X})^{-1}$ | N - K | | $\mathbf{x}_i$ is the $i$th row of $\mathbf{X}$. $h_{ii} = \mathbf{x}_i(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{x}^{\top}_i$ $e_i = y_i - \mathbf{x}_i\widehat{\beta}$ $\mathrm{diag}[.]$ is an operator that creates a diagonal matrix from a vector $N$ is the number of observations $K$ is the number of elements in $\beta$.

Cluster-Robust Variance and Degrees of Freedom

For cluster-robust inference, we provide several estimators that are essentially analogs of the heteroskedastic-consistent variance estimators for the clustered case. Our default is the CR2 variance estimator, analogous to HC2 standard errors, and perform quite well in small samples without sacrificing much in the way of efficiency in larger samples. This estimator was originally proposed in @bellmccaffrey2002, although we implement a generalized version of the algorithm outlined in @pustejovskytipton2016; these authors provide an R package for CR2 variance estimation, clubSandwich, that applies these standard errors to a wide variety of models. For a good overview of the different cluster-robust variance estimators and simulations of their accuracy in small samples, again users can see @imbenskolesar2016. For an overview of when to use cluster-robust estimators, especially in an experimental setting, see @abadieetal2017.

| se_type = | Variance Estimator ($\widehat{\mathbb{V}}[\widehat{\beta}]$) | Degrees of Freedom | Notes | |-------------------|--------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | "CR0" | $(\mathbf{X}^{\top}\mathbf{X})^{-1} \sum^S_{s=1} \left[\mathbf{X}^\top_s \mathbf{e}s\mathbf{e}^\top_s \mathbf{X}_s \right] (\mathbf{X}^{\top}\mathbf{X})^{-1}$ | $S - 1$ | | | "stata" | $\frac{N-1}{N-K}\frac{S}{S-1} (\mathbf{X}^{\top}\mathbf{X})^{-1} \sum^S{s=1} \left[\mathbf{X}^\top_s \mathbf{e}s\mathbf{e}^\top_s \mathbf{X}_s \right] (\mathbf{X}^{\top}\mathbf{X})^{-1}$ | $S - 1$ | The Stata variance estimator is the same as the CR0 estimate but with a special finite-sample correction. | | "CR2" (default) | $(\mathbf{X}^{\top}\mathbf{X})^{-1} \sum^S{s=1} \left[\mathbf{X}^\top_s \mathbf{A}s \mathbf{e}_s\mathbf{e}^\top_s \mathbf{A}^\top_s \mathbf{X}_s \right] (\mathbf{X}^{\top}\mathbf{X})^{-1}$ | $\frac{\left(\sum^S{i = 1} \mathbf{p}^\top_i \mathbf{p}i \right)^2}{\sum^S{i=1}\sum^S_{j=1} \left(\mathbf{p}^\top_i \mathbf{p}_j \right)^2}$ | These estimates of the variance and degrees of freedom come from @pustejovskytipton2016, which is an extension to certain models with a particular set of dummy variables of the method proposed by @bellmccaffrey2002. Note that the degrees of freedom can vary for each coefficient. See below for more complete notation. | $S$ is the number of clusters $\mathbf{X}_s$ is the rows of $\mathbf{X}$ that belong to cluster $s$ $I_n$ is an identity matrix of size $n\times n$ $\mathbf{e}_s$ is the elements of the residual matrix $\mathbf{e}$ in cluster $s$, or $\mathbf{e}_s = \mathbf{y}_s - \mathbf{X}_s \widehat{\beta}$ * $\mathbf{A}_s$ and $\mathbf{p}$ are defined in the notes below

Further notes on CR2: The variance estimator we implement is shown in equations (4) and (5) in @pustejovskytipton2016 and equation (11), where we set $\mathbf{\Phi}$ to be $I$, following @bellmccaffrey2002. Further note that the @pustejovskytipton2016 CR2 estimator and the @bellmccaffrey2002 estimator are identical when $\mathbf{B_s}$ is full rank. It could be rank-deficient if there were dummy variables, or fixed effects, that were also your clusters. In this case, the original @bellmccaffrey2002 estimator could not be computed. You can see the simpler @bellmccaffrey2002 estimator written up plainly on page 709 of @imbenskolesar2016 along with the degrees of freedom denoted as $K_{BM}$.

In the CR2 variance calculation, we get $\mathbf{A}_s$ as follows:

$$ \begin{aligned} \mathbf{H} &= \mathbf{X}(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^\top \\\ \mathbf{B}s &= (I{N} - \mathbf{H})s (I{N} - \mathbf{H})^\top_s \\\ \mathbf{A}_s &= \mathbf{B}^{+1/2}_s \end{aligned} $$

where $\mathbf{B}^{+1/2}_s$ is the symmetric square root of the Moore–Penrose inverse of $\mathbf{B}_s$ and $(I - \mathbf{H})_s$ are the $N_s$ columns that correspond to cluster $s$. To get the corresponding degrees of freedom, note that

[ \mathbf{p}s = (I_N - \mathbf{H})^\top_s \mathbf{A}_s \mathbf{X}_s (\mathbf{X}^{\top}\mathbf{X})^{-1} \mathbf{z}{k} ] where $\mathbf{z}_{k}$ is a vector of length $K$, the number of coefficients, where the $k$th element is 1 and all other elements are 0. The $k$ signifies the coefficient for which we are computing the degrees of freedom.

Confidence intervals and hypothesis testing

If $\widehat{\mathbb{V}}_k$ is the $k$th diagonal element of $\widehat{\mathbb{V}}$, we build confidence intervals using the user specified $\alpha$ as:

[ \mathrm{CI}^{1-\alpha} = \left(\widehat{\beta_k} + t^{df}{\alpha/2} \sqrt{\widehat{\mathbb{V}}_k}, \widehat{\beta_k} + t^{df}{1 - \alpha/2} \sqrt{\widehat{\mathbb{V}}_k}\right) ]

We also provide two-sided p-values using a t-distribution with the aforementioned significance level $\alpha$ and degrees of freedom $df$.

lm_lin notes

The lm_lin estimator is a data pre-processor for lm_robust that implements the regression method for covariate adjustment suggested by @lin2013.

This estimator works by taking the outcome and treatment variable as the main formula (formula) and takes a right-sided formula of all pre-treatment covariates (covariates). These pre-treatment covariates are then centered to be mean zero and interacted with the treatment variable before being added to the formula and passed to lm_robust. In other words, instead of fitting a simple model adjusting for pre-treatment covariates such as

[ y_i = \tau z_i + \mathbf{\beta}^\top \mathbf{x}_i + \epsilon_i ]

with the following model

[ y_i = \tau z_i + \mathbf{\beta}^\top \mathbf{x}^c_i + \mathbf{\gamma}^\top \mathbf{x}^c_i z_i + \epsilon_i ]

where $\mathbf{x}^c_i$ is a vector of pre-treatment covariates for unit $i$ that have been centered to have mean zero and $z_i$ is an indicator for the treatment group. @lin2013 proposed this estimator in response to the critique by @freedman2008 that using regression to adjust for pre-treatment covariates could bias estimates of treatment effects.

The estimator lm_lin also works for multi-valued treatments by creating a full set of dummies for each treatment level and interacting each with the centered pre-treatment covariates. The rest of the options for the function and corresponding estimation is identical to lm_robust.

iv_robust notes

Our iv_robust estimator uses a two-stage least squares estimation.

Coefficient estimates

[ \widehat{\beta}_{2SLS} =(\mathbf{X}^{\top}\mathbf{P_z}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbf{P_z}\mathbf{y}, ]

where $\mathbf{X}$ are the endogenous regressors, $\mathbf{P_Z} = \mathbf{Z}(\mathbf{Z}^{\top}\mathbf{Z})^{-1}\mathbf{Z}^\top$, and $\mathbf{Z}$ are the instruments. This is equivalent to estimating the first stage regression,

[ \mathbf{X} = \mathbf{Z}\beta_{FS} + \mathbf{\zeta}, ]

and using the first stage predicted values in the second stage regression,

$$ \begin{aligned} \widehat{\mathbf{X}} &= \mathbf{Z}\widehat{\beta}{FS} \ \mathbf{y} &= \widehat{\mathbf{X}}\beta{2SLS} + \mathbf{\epsilon}. \end{aligned} $$

Weighting

When weights are applied, we use the same estimation strategy as in lm_robust where we first transform the data by the square root of the weights and proceed with estimation as usual.

Variance

The variances estimates for iv_robust are the same as the estimates for lm_robust although two changes are made. First, we replace $\mathbf{X}$ with the second stage regressors, $\widehat{\mathbf{X}}$, and we replace the residuals, $e_i$, with $\mathbf{y} - \mathbf{X} \beta_{2SLS}$. That is, we use the residuals from the final coefficients and the endogenous, uninstrumented regressors $\mathbf{X}$.

Because Stata does not default to using finite sample corrections and tests with its ivregress 2sls estimator, the correspondence between our instrumental variables estimator and theirs can be a bit unclear. The following table shows the options in Stata that correspond to our estimators.

| estimatr | Stata | Notes | |----------------------------------------------------------------------|-------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------| | N/A | ivregress 2sls y (x = z) | Stata's default has no finite sample correction (i.e., $\widehat{\sigma}^2 = \mathbf{e}^\top \mathbf{e}$). Stata here also uses z-tests. | | iv_robust(y ~ x | z, se_type = "classical") | ivregress 2sls y (x = z), small | $\widehat{\sigma}^2 = \frac{\mathbf{e}^\top \mathbf{e}}{N - k}$. | | iv_robust(y ~ x | z, se_type = "HC0") | ivregress 2sls y (x = z), rob | Stata uses z-tests here. | | iv_robust(y ~ x | z, se_type = "HC1") | ivregress 2sls y (x = z), rob small | | | iv_robust(y ~ x | z, se_type = "HC2") (default) | N/A | | | iv_robust(y ~ x | z, se_type = "HC3") | N/A | | | iv_robust(y ~ x | z, clusters = clust, se_type = "CR0") | ivregress 2sls y (x = z), vce(cl clust) | Stata uses z-tests here. | | iv_robust(y ~ x | z, clusters = clust, se_type = "stata") | ivregress 2sls y (x = z), vce(cl clust) small | | | iv_robust(y ~ x | z, clusters = clust, se_type = "CR2") (default) | N/A | |

Confidence intervals and hypothesis testing

If $\widehat{\mathbb{V}}_k$ is the $k$th diagonal element of $\widehat{\mathbb{V}}$, we build confidence intervals using the user specified $\alpha$ as:

[ \mathrm{CI}^{1-\alpha} = \left(\widehat{\beta_{2SLS, k}} + t^{df}{\alpha/2} \sqrt{\widehat{\mathbb{V}}_k}, \widehat{\beta{2SLS, k}} + t^{df}_{1 - \alpha/2} \sqrt{\widehat{\mathbb{V}}_k}\right) ]

We also provide two-sided p-values using a t-distribution with the aforementioned significance level $\alpha$ and degrees of freedom $df$, which come from the second-stage regression. As mentioned in the table above, these results will be different from Stata in certain cases as Stata uses z-tests when small is not specified.

difference_in_means notes

There are six kinds of experimental designs for which our difference_in_means estimator can estimate treatment effects, standard errors, confidence intervals, and provide p-values. We list the different designs here along with how the software learns the design:

There are two subsets of blocked designs that we also consider:

Note: if there are blocks of size two and blocks greater than size two, we default to the matched-pairs estimators described below.

For each design, our estimator is informed by the recent statistical literature on the analysis of experimental data.

Estimates

Any unblocked design [ \widehat{\tau} = \frac{1}{N} \sum^N_{i=1} z_i y_i - (1 - z_i) y_i ] where $z_i$ is the treatment variable, $y_i$ is the outcome, and $N$ is the total number of units.

Blocked design (including matched-pairs designs) [ \widehat{\tau} = \sum^J_{j=1} \frac{N_j}{N} \widehat{\tau_j} ] where $J$ is the number of blocks, $N_j$ is the size of those blocks, and $\widehat{\tau_j}$ is the estimated difference-in-means in block $j$.

Weighting

If the user specifies weights, treatment effects (or block-level treatment effects) and their standard errors are estimated by lm_robust. There are three exceptions. First, we still compute the degrees of freedom as in the below table. Second, if the design is blocked, a weighted treatment effect and variance estimate are computed within each block using lm_robust and then combined as below. Third, specifying weights with a matched-pairs estimator in difference_in_means is not supported at the moment.

Variance and Degrees of Freedom

| Design type | Variance $\widehat{\mathbb{V}}[\widehat{\tau}]$ | Degrees of Freedom | Notes | |---------------------------------|--------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | No blocks or clusters (standard) | $\frac{\widehat{\mathbb{V}}[y_{i,0}]}{N_0} + \frac{\widehat{\mathbb{V}}[y_{i,1}]}{N_1}$ | $\widehat{\mathbb{V}}[\widehat{\tau}]^2 \left(\frac{(\widehat{\mathbb{V}}[y_{i,1}]/ N_1)^2}{N_1 - 1} + \frac{(\widehat{\mathbb{V}}[y_{i,0}]/ N_0)^2}{N_0 - 1}\right)$ | Where $\widehat{\mathbb{V}}[y_{i,k}]$ is the Bessel-corrected variance of all units where $z_i = k$ and $N_k$ is the number of units in condition $k$. This is equivalent to the variance and Welch–Satterthwaite approximation of the degrees of freedom used by R's t.test. | | Blocked | $\sum^J_{j=1} \left(\frac{N_j}{N}\right)^2 \widehat{\mathbb{V}}[\widehat{\tau_j}]$ | $N - 2 * J$ | Where $\widehat{\mathbb{V}}[\widehat{\tau_j}]$ is the variance of the estimated difference-in-means in block $j$. See footnote 17 on page 74 of [@gerbergreen2012] for a reference. The degrees of freedom are equivalent to a regression with a full set of block specific treatment effects. | | Clusters | Same as lm_robust CR2 estimator | Same as lm_robust CR2 estimator | This variance is the same as that recommended by @gerbergreen2012 in equation 3.23 on page 83 when the clusters are even sizes. | | Blocked and clustered | $\sum^J_{j=1} \left(\frac{N_j}{N}\right)^2 \widehat{\mathbb{V}}[\widehat{\tau_j}]$ | $S - 2 * J$ | Where $\widehat{\mathbb{V}}[\widehat{\tau_j}]$ is the variance of the estimated difference-in-means in block $j$ and S is the number of clusters. See footnote 17 on page 74 of @gerbergreen2012 for a reference. The degrees of freedom are equivalent to a regression on data collapsed by cluster with a full set of block specific treatment effects. | | Matched pairs | $\frac{1}{J(J-1)} \sum^J_{j=1} \left(\widehat{\tau_j} - \widehat{\tau}\right)^2$ | $J - 1$ | See equation 3.16 on page 77 of @gerbergreen2012 for a reference. | | Matched pair cluster randomized | $\frac{J}{(J-1)N^2} \sum^J_{j=1} \left(N_j \widehat{\tau_j} - \frac{N \widehat{\tau}}{J}\right)^2$ | $J - 1$ | See the variance for the SATE defined in equation 6 on page 36 of [@imaietal2009] and the suggested degrees of freedom on page 37. |

Confidence intervals and hypothesis testing

We build confidence intervals using the user specified $\alpha$ as:

[ \mathrm{CI}^{1-\alpha} = \left(\widehat{\tau} + t^{df}{\alpha/2} \sqrt{\widehat{\mathbb{V}}[\widehat{\tau}]},\widehat{\tau}] + t^{df}{1 - \alpha/2} \sqrt{\widehat{\mathbb{V}}[\widehat{\tau}]}\right) ]

We also provide two-sided p-values using a t-distribution with the aforementioned significance level $\alpha$ and degrees of freedom $df$.

horvitz_thompson notes

We provide Horvitz-Thompson estimators for two-armed trials and can be used to estimate unbiased treatment effects when the randomization is known. Horvitz-Thompson estimators require information about the probability each unit is in treatment and control, as well as the joint probability each unit is in the treatment, in the control, and in opposite treatment conditions.

The estimator we implement here, horvitz_thompson(), can be told the design of an experiment in several ways, and the reference page is a good place to see some of those examples. Users can see a description of the estimator and its properties in @aronowmiddleton2013, @middletonaronow2015, and @aronowsamii2017.

Some definitions used below:

Estimates

Simple, complete, clustered

[ \widehat{\tau} = \frac{1}{N} \sum^N_{i=1} z_i \frac{y_i}{\pi_{1i}} - (1 - z_i) \frac{y_i}{\pi_{0i}} ]

Blocked

[ \widehat{\tau} = \sum^J_{j=1} \frac{N_j}{N} \widehat{\tau_j} ] where $J$ is the number of blocks, $N_j$ is the size of those blocks, and $\widehat{\tau_j}$ is the Horvitz-Thompson estimate in block $j$.

Variance

Currently we provide variance estimates that rely on two separate assumptions:

Young's inequality

For designs that that are not clustered we use the following variance:

$$ \begin{aligned} \widehat{\mathbb{V}}{Y}[\widehat{\tau}] = \frac{1}{N^2} \sum^N{i=1} \Bigg[& z_i \left(\frac{y_i}{\pi_{1i}}\right)^2 + (1 - z_i) \left(\frac{y_i}{\pi_{0i}}\right)^2 + \sum_{j \neq i} \bigg(\frac{z_i z_j}{\pi_{1i1j} + \epsilon_{1i1j}}(\pi_{1i1j} - \pi_{1i}\pi_{1j})\frac{y_i}{\pi_{1i}}\frac{y_j}{\pi_{1j}} \\\ & + \frac{(1-z_i) (1-z_j)}{\pi_{0i0j} + \epsilon_{0i0j}}(\pi_{0i0j} - \pi_{0i}\pi_{0j})\frac{y_i}{\pi_{0i}}\frac{y_j}{\pi_{0j}} - 2 \frac{z_i (1-z_j)}{\pi_{1i0j} + \epsilon_{1i0j}}(\pi_{1i0j} - \pi_{1i}\pi_{0j})\frac{y_i}{\pi_{1i}}\frac{y_j}{\pi_{0j}} \\\ & + \sum_{\forall j \colon \pi_{1i1j} = 0} \left( z_i \frac{y^2_i}{2\pi_{1i}} + z_j \frac{y^2_j}{\pi_{1j}}\right) + \sum_{\forall j \colon \pi_{0i0j} = 0} \left( (1-z_i) \frac{y^2_i}{2\pi_{0i}} + (1-z_j) \frac{y^2_j}{\pi_{0j}}\right) \Bigg] \end{aligned} $$

There are some simplifications of the above for simpler designs that follow algebraically from the above. For example, if there are no two units for which the joint probability of being in either condition is 0, which is the case for most experiments that are not matched-pair experiments, then we get:

$$ \begin{aligned} \widehat{\mathbb{V}}{Y}[\widehat{\tau}] = \frac{1}{N^2} \sum^N{i=1} \Bigg[& z_i \left(\frac{y_i}{\pi_{1i}}\right)^2 + (1 - z_i) \left(\frac{y_i}{\pi_{0i}}\right)^2 + \sum_{j \neq i} \bigg(\frac{z_i z_j}{\pi_{1i1j}}(\pi_{1i1j} - \pi_{1i}\pi_{1j})\frac{y_i}{\pi_{1i}}\frac{y_j}{\pi_{1j}} \\\ & + \frac{(1-z_i) (1-z_j)}{\pi_{0i0j}}(\pi_{0i0j} - \pi_{0i}\pi_{0j})\frac{y_i}{\pi_{0i}}\frac{y_j}{\pi_{0j}} - 2 \frac{z_i (1-z_j)}{\pi_{1i0j}}(\pi_{1i0j} - \pi_{1i}\pi_{0j})\frac{y_i}{\pi_{1i}}\frac{y_j}{\pi_{0j}} \Bigg] \end{aligned} $$

If we further simplify to the case where there is simple random assignment, and there is absolutely no dependence among units (i.e., $\pi_{ziwj} = \pi_{zi}\pi_{wj} \;\;\forall\;\;z,w,i,j$), we get:

$$ \begin{aligned} \widehat{\mathbb{V}}{Y}[\widehat{\tau}] = \frac{1}{N^2} \sum^N{i=1} \Bigg[& z_i \left(\frac{y_i}{\pi_{1i}}\right)^2 + (1 - z_i) \left(\frac{y_i}{\pi_{0i}}\right)^2\Bigg] \end{aligned} $$

Clustered designs

For clustered designs, we use the following collapsed estimator by setting collapsed = TRUE. Here, $M$ is the total number of clusters, $y_k$ is the total of the outcomes $y_i$ for all $i$ units in cluster $k$, $\pi_zk$ is the marginal probability of cluster $k$ being in condition $z \in {0, 1}$, and $z_k$ and $\pi_{zkwl}$ are defined analogously. Warning! If one passes condition_pr_mat to horvitz_thompson for a clustered design, but not clusters, the function will not use the collapsed estimator the the variance estimate will be inaccurate.

$$ \begin{aligned} \widehat{\mathbb{V}}{Y}[\widehat{\tau}] = \frac{1}{N^2} \sum^M{k=1} \Bigg[& z_k \left(\frac{y_k}{\pi_{1k}}\right)^2 + (1 - z_k) \left(\frac{y_k}{\pi_{0k}}\right)^2 + \sum_{l \neq k} \bigg(\frac{z_k z_l}{\pi_{1k1l} + \epsilon_{1k1l}}(\pi_{1k1l} - \pi_{1k}\pi_{1l})\frac{y_k}{\pi_{1k}}\frac{y_l}{\pi_{1l}} \\\ & + \frac{(1-z_k) (1-z_l)}{\pi_{0k0l} + \epsilon_{0k0l}}(\pi_{0k0l} - \pi_{0k}\pi_{0l})\frac{y_k}{\pi_{0k}}\frac{y_l}{\pi_{0l}} - 2 \frac{z_k (1-z_l)}{\pi_{1k0l} + \epsilon_{1k0l}}(\pi_{1k0l} - \pi_{1k}\pi_{0l})\frac{y_k}{\pi_{1k}}\frac{y_l}{\pi_{0l}} \\\ & + \sum_{\forall l \colon \pi_{1k1l} = 0} \left( z_k \frac{y^2_k}{2\pi_{1k}} + z_l \frac{y^2_l}{\pi_{1l}}\right) + \sum_{\forall l \colon \pi_{0k0l} = 0} \left( (1-z_k) \frac{y^2_k}{2\pi_{0k}} + (1-z_l) \frac{y^2_l}{\pi_{0l}}\right) \Bigg] \end{aligned} $$

Constant effects

Alternatively, one can assume constant treatment effects and, under that assumption, estimate the variance that is consistent under that assumption but less conservative. Again, this estimator is only implemented for the simple randomized case.

$$ \begin{aligned} \widehat{\mathbb{V}}{C}[\widehat{\tau}] = \frac{1}{N^2} \sum^N{i=1} \Bigg[& (1 - \pi_{0i}) \pi_{0i} \left(\frac{y_{0i}}{\pi_{0i}}\right)^2 + (1 - \pi_{1i}) \pi_{1i} \left(\frac{y_{1i}}{\pi_{1i}}\right)^2 - 2 y_{1i} y_{0i} \\\ & + \sum_{j \neq i} \Big( (\pi_{0i0j} - \pi_{0i} \pi_{0j}) \frac{y_{0i}}{\pi_{0i}} \frac{y_{0j}}{\pi_{0j}} + (\pi_{1i1j} - \pi_{1i} \pi_{1j}) \frac{y_{1i}}{\pi_{1i}} \frac{y_{1j}}{\pi_{1j}} \\\ &- 2 (\pi_{1i0j} - \pi_{1i} \pi_{0j}) \frac{y_{1i}}{\pi_{1i}} \frac{y_{0j}}{\pi_{0j}} \Big)\Bigg] \end{aligned} $$

Confidence intervals and hypothesis testing

Theory on hypothesis testing with the Horvitz-Thompson estimator is yet to be developed. We rely on a normal approximation and construct confidence intervals in the following way: [ \mathrm{CI}^{1-\alpha} = \left(\widehat{\tau} + z_{\alpha/2} \sqrt{\widehat{\mathbb{V}}[\widehat{\tau}]}, \widehat{\tau} + z_{1 - \alpha/2} \sqrt{\widehat{\mathbb{V}}[\widehat{\tau}]}\right) ]

The associated p-values for a two-sided null hypothesis test are computed using a normal distribution and the aforementioned significance level $\alpha$.

References



graemeblair/DDestimate documentation built on Sept. 10, 2019, 7:38 p.m.