set.seed(42) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(digits = 2)

This document provides the mathematical notes for each of the estimators in `estimatr`

. The most up-to-date version of this can be found on the DeclareDesign website here.

The current estimators we provide are:

`lm_robust`

- for fitting linear models with heteroskedasticity/cluster-robust standard errors`lm_lin`

- a wrapper for`lm_robust()`

to simplify interacting centered pre-treatment covariates with a treatment variable`iv_robust`

- two stage least squares estimation of instrumental variables regression`difference_in_means`

- for estimating differences in means with appropriate standard errors for unit-randomized, cluster-randomized, block-randomized, matched-pair randomized, and matched-pair clustered designs`horvitz_thompson`

- for estimating average treatment effects taking into consideration treatment probabilities or sampling probabilities for simple and cluster randomized designs

`lm_robust`

notesThe `lm_robust`

method uses the C++ library Eigen, via the `RcppEigen`

package, to estimate the coefficients, variance-covariance matrix, and, in some cases, the degrees of freedom of linear models.

The default estimators have been selected for efficiency in large samples and low bias in small samples as well as for their similarities to design-based randomization estimators [@samiiaronow2012]. This section outlines the various kinds of variance estimators one can employ within `lm_robust`

.

[ \widehat{\beta} =(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbf{y} ]

Our algorithm solves the least squares problem using a rank-revealing column-pivoting QR factorization that eliminates the need to invert $(\mathbf{X}^{\top}\mathbf{X})^{-1}$ explicitly and behaves much like the default `lm`

function in R. However, when $\mathbf{X}$ is rank deficient, there are certain conditions under which the QR factorization algorithm we use, from the Eigen C++ library, drops different coefficients from the output than the default `lm`

function. In general, users should avoid specifying models with rank-deficiencies. In fact, if users are certain their data are not rank deficient, they can improve the speed of `lm_robust`

by setting `try_cholesky = TRUE`

. This replaces the QR factorization with a Cholesky factorization that is only guaranteed to work $\mathbf{X}$ is of full rank.

If weights are included, we transform the data as below and then proceed as normal, following advice from @romanowolf2017 that this weighted estimator has attractive properties. We do so by first scaling all of the weights so that they sum to one. Then we multiply each row of the design matrix $\mathbf{X}$ by the square root each unit's weight, $\mathbf{x}_i \sqrt{w_i}$, and then do the same to the outcome, $\mathbf{y}_i \sqrt{w_i}$. This results in our coefficients being estimated as follows, where $\mathbf{W}$ is a diagonal matrix with the scaled weights on the diagonal.

Weighted: [ \widehat{\beta} =(\mathbf{X}^{\top}\mathbf{W}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbf{W}\mathbf{y} ]

The transformed data are then used in the analysis below, where $(\mathbf{X}^{\top}\mathbf{X})^{-1}$ is now $(\mathbf{X}^{\top}\mathbf{W}\mathbf{X})^{-1}$ and $\mathbf{X}$ is now $\mathbf{X} \mathrm{sqrt}[W]$, where $\mathrm{sqrt}[.]$ is an operator that applies a square root to the coefficients of some matrix.

We should note that this transformation yields the same standard errors as specifying weights using `aweight`

in Stata for the "classical", "HC0", and "HC1" ("stata") variance estimators. Furthermore, in the clustered case, our weighted estimator for the "stata" cluster-robust variance also matches Stata. Thus Stata's main robust standard error estimators, "HC1" and their clustered estimator, match our package when weights are applied. Nonetheless, Stata uses a slightly different Hat matrix and thus "HC2" and "HC3" estimates in Stata when weights are specified may differ from our estimates---more on that here.

In addition to solving for OLS coefficients faster than `lm`

, we provide a variety of robust variance estimators. Below we outline them for the non-clustered and clustered cases. You can see some simulations about the unbiasedness of the classical variance estimators with homoskedasticity and the consistency of the HC2 estimators with heteroskedasticity in these simulations.

The default variance estimator without clusters is the HC2 variance, first proposed by @mackinnonwhite1985. This estimator has the advantage of being equivalent to a conservative randomization-based "Neyman" estimator of the variance [@samiiaronow2012]. Furthermore, while it is somewhat less efficient than the HC1 variance estimator, the default in Stata, it tends to perform better in small samples (evidence for that can be found in our simulations here).

| `se_type =`

| Variance Estimator ($\widehat{\mathbb{V}}[\widehat{\beta}]$) | Degrees of Freedom | Notes |
|----------|--------------------------|------|----------------------------|
| `"classical"`

| $\frac{\mathbf{e}^\top\mathbf{e}}{N-K} (\mathbf{X}^{\top}\mathbf{X})^{-1}$ | N - K | |
| `"HC0"`

| $(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathrm{diag}\left[e_i^2\right]\mathbf{X}(\mathbf{X}^{\top}\mathbf{X})^{-1}$ | N - K | |
| `"HC1"`

, `"stata"`

| $\frac{N}{N-K}(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathrm{diag}\left[e_i^2\right]\mathbf{X}(\mathbf{X}^{\top}\mathbf{X})^{-1}$ | N - K | Often called the Eicker-Huber-White variance (or similar) |
| `"HC2"`

(default) | $(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathrm{diag}\left[\frac{e_i^2}{1-h_{ii}}\right]\mathbf{X}(\mathbf{X}^{\top}\mathbf{X})^{-1}$ | N - K | |
| `"HC3"`

| $(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}{\top}\mathrm{diag}\left[\frac{e_i^2}{(1-h_{ii})^2}\right]\mathbf{X}(\mathbf{X}^{\top}\mathbf{X})^{-1}$ | N - K | |
* $\mathbf{x}_i$ is the $i$th row of $\mathbf{X}$.
* $h_{ii} = \mathbf{x}_i(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{x}^{\top}_i$
* $e_i = y_i - \mathbf{x}_i\widehat{\beta}$
* $\mathrm{diag}[.]$ is an operator that creates a diagonal matrix from a vector
* $N$ is the number of observations
* $K$ is the number of elements in $\beta$.

For cluster-robust inference, we provide several estimators that are essentially analogs of the heteroskedastic-consistent variance estimators for the clustered case. Our default is the CR2 variance estimator, analogous to HC2 standard errors, and perform quite well in small samples without sacrificing much in the way of efficiency in larger samples. This estimator was originally proposed in @bellmccaffrey2002, although we implement a generalized version of the algorithm outlined in @pustejovskytipton2016; these authors provide an R package for CR2 variance estimation, clubSandwich, that applies these standard errors to a wide variety of models. For a good overview of the different cluster-robust variance estimators and simulations of their accuracy in small samples, again users can see @imbenskolesar2016. For an overview of when to use cluster-robust estimators, especially in an experimental setting, see @abadieetal2017.

| `se_type =`

| Variance Estimator ($\widehat{\mathbb{V}}[\widehat{\beta}]$) | Degrees of Freedom | Notes |
|-------------------|--------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `"CR0"`

| $(\mathbf{X}^{\top}\mathbf{X})^{-1} \sum^S_{s=1} \left[\mathbf{X}^\top_s \mathbf{e}*s\mathbf{e}^\top_s \mathbf{X}_s \right] (\mathbf{X}^{\top}\mathbf{X})^{-1}$ | $S - 1$ | |
| "stata" | $\frac{N-1}{N-K}\frac{S}{S-1} (\mathbf{X}^{\top}\mathbf{X})^{-1} \sum^S*{s=1} \left[\mathbf{X}^\top_s \mathbf{e}

`"CR2"`

(default) | $(\mathbf{X}^{\top}\mathbf{X})^{-1} \sum^S**Further notes on CR2:** The variance estimator we implement is shown in equations (4) and (5) in @pustejovskytipton2016 and equation (11), where we set $\mathbf{\Phi}$ to be $I$, following @bellmccaffrey2002. Further note that the @pustejovskytipton2016 CR2 estimator and the @bellmccaffrey2002 estimator are identical when $\mathbf{B_s}$ is full rank. It could be rank-deficient if there were dummy variables, or fixed effects, that were also your clusters. In this case, the original @bellmccaffrey2002 estimator could not be computed. You can see the simpler @bellmccaffrey2002 estimator written up plainly on page 709 of @imbenskolesar2016 along with the degrees of freedom denoted as $K_{BM}$.

In the CR2 variance calculation, we get $\mathbf{A}_s$ as follows:

$$
\begin{aligned}
\mathbf{H} &= \mathbf{X}(\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^\top \\\
\mathbf{B}*s &= (I*{N} - \mathbf{H})*s (I*{N} - \mathbf{H})^\top_s \\\
\mathbf{A}_s &= \mathbf{B}^{+1/2}_s
\end{aligned}
$$

where $\mathbf{B}^{+1/2}_s$ is the symmetric square root of the Mooreâ€“Penrose inverse of $\mathbf{B}_s$ and $(I - \mathbf{H})_s$ are the $N_s$ columns that correspond to cluster $s$. To get the corresponding degrees of freedom, note that

[
\mathbf{p}*s = (I_N - \mathbf{H})^\top_s \mathbf{A}_s \mathbf{X}_s (\mathbf{X}^{\top}\mathbf{X})^{-1} \mathbf{z}*{k}
]
where $\mathbf{z}_{k}$ is a vector of length $K$, the number of coefficients, where the $k$th element is 1 and all other elements are 0. The $k$ signifies the coefficient for which we are computing the degrees of freedom.

If $\widehat{\mathbb{V}}_k$ is the $k$th diagonal element of $\widehat{\mathbb{V}}$, we build confidence intervals using the user specified $\alpha$ as:

[
\mathrm{CI}^{1-\alpha} = \left(\widehat{\beta_k} + t^{df}*{\alpha/2} \sqrt{\widehat{\mathbb{V}}_k}, \widehat{\beta_k} + t^{df}*{1 - \alpha/2} \sqrt{\widehat{\mathbb{V}}_k}\right)
]

We also provide two-sided p-values using a t-distribution with the aforementioned significance level $\alpha$ and degrees of freedom $df$.

`lm_lin`

notesThe `lm_lin`

estimator is a data pre-processor for `lm_robust`

that implements the regression method for covariate adjustment suggested by @lin2013.

This estimator works by taking the outcome and treatment variable as the main formula (`formula`

) and takes a right-sided formula of all pre-treatment covariates (`covariates`

). These pre-treatment covariates are then centered to be mean zero and interacted with the treatment variable before being added to the formula and passed to `lm_robust`

. In other words, instead of fitting a simple model adjusting for pre-treatment covariates such as

[ y_i = \tau z_i + \mathbf{\beta}^\top \mathbf{x}_i + \epsilon_i ]

with the following model

[ y_i = \tau z_i + \mathbf{\beta}^\top \mathbf{x}^c_i + \mathbf{\gamma}^\top \mathbf{x}^c_i z_i + \epsilon_i ]

where $\mathbf{x}^c_i$ is a vector of pre-treatment covariates for unit $i$ that have been centered to have mean zero and $z_i$ is an indicator for the treatment group. @lin2013 proposed this estimator in response to the critique by @freedman2008 that using regression to adjust for pre-treatment covariates could bias estimates of treatment effects.

The estimator `lm_lin`

also works for multi-valued treatments by creating a full set of dummies for each treatment level and interacting each with the centered pre-treatment covariates. The rest of the options for the function and corresponding estimation is identical to `lm_robust`

.

`iv_robust`

notesOur `iv_robust`

estimator uses a two-stage least squares estimation.

[ \widehat{\beta}_{2SLS} =(\mathbf{X}^{\top}\mathbf{P_z}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbf{P_z}\mathbf{y}, ]

where $\mathbf{X}$ are the endogenous regressors, $\mathbf{P_Z} = \mathbf{Z}(\mathbf{Z}^{\top}\mathbf{Z})^{-1}\mathbf{Z}^\top$, and $\mathbf{Z}$ are the instruments. This is equivalent to estimating the first stage regression,

[ \mathbf{X} = \mathbf{Z}\beta_{FS} + \mathbf{\zeta}, ]

and using the first stage predicted values in the second stage regression,

$$
\begin{aligned}
\widehat{\mathbf{X}} &= \mathbf{Z}\widehat{\beta}*{FS} \
\mathbf{y} &= \widehat{\mathbf{X}}\beta*{2SLS} + \mathbf{\epsilon}.
\end{aligned}
$$

When weights are applied, we use the same estimation strategy as in `lm_robust`

where we first transform the data by the square root of the weights and proceed with estimation as usual.

The variances estimates for `iv_robust`

are the same as the estimates for `lm_robust`

although two changes are made. First, we replace $\mathbf{X}$ with the second stage regressors, $\widehat{\mathbf{X}}$, and we replace the residuals, $e_i$, with $\mathbf{y} - \mathbf{X} \beta_{2SLS}$. That is, we use the residuals from the final coefficients and the endogenous, uninstrumented regressors $\mathbf{X}$.

Because Stata does not default to using finite sample corrections and tests with its `ivregress 2sls`

estimator, the correspondence between our instrumental variables estimator and theirs can be a bit unclear. The following table shows the options in Stata that correspond to our estimators.

| `estimatr`

| Stata | Notes |
|----------------------------------------------------------------------|-------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|
| N/A | `ivregress 2sls y (x = z)`

| Stata's default has no finite sample correction (i.e., $\widehat{\sigma}^2 = \mathbf{e}^\top \mathbf{e}$). Stata here also uses z-tests. |
| `iv_robust(y ~ x | z, se_type = "classical")`

| `ivregress 2sls y (x = z), small`

| $\widehat{\sigma}^2 = \frac{\mathbf{e}^\top \mathbf{e}}{N - k}$. |
| `iv_robust(y ~ x | z, se_type = "HC0")`

| `ivregress 2sls y (x = z), rob`

| Stata uses z-tests here. |
| `iv_robust(y ~ x | z, se_type = "HC1")`

| `ivregress 2sls y (x = z), rob small`

| |
| `iv_robust(y ~ x | z, se_type = "HC2")`

(default) | N/A | |
| `iv_robust(y ~ x | z, se_type = "HC3")`

| N/A | |
| `iv_robust(y ~ x | z, clusters = clust,`

`se_type = "CR0")`

| `ivregress 2sls y (x = z), vce(cl clust)`

| Stata uses z-tests here. |
| `iv_robust(y ~ x | z, clusters = clust,`

`se_type = "stata")`

| `ivregress 2sls y (x = z), vce(cl clust) small`

| |
| `iv_robust(y ~ x | z, clusters = clust,`

`se_type = "CR2")`

(default) | N/A | |

If $\widehat{\mathbb{V}}_k$ is the $k$th diagonal element of $\widehat{\mathbb{V}}$, we build confidence intervals using the user specified $\alpha$ as:

[
\mathrm{CI}^{1-\alpha} = \left(\widehat{\beta_{2SLS, k}} + t^{df}*{\alpha/2} \sqrt{\widehat{\mathbb{V}}_k}, \widehat{\beta*{2SLS, k}} + t^{df}_{1 - \alpha/2} \sqrt{\widehat{\mathbb{V}}_k}\right)
]

We also provide two-sided p-values using a t-distribution with the aforementioned significance level $\alpha$ and degrees of freedom $df$, which come from the second-stage regression. As mentioned in the table above, these results will be different from Stata in certain cases as Stata uses z-tests when `small`

is not specified.

`difference_in_means`

notesThere are six kinds of experimental designs for which our `difference_in_means`

estimator can estimate treatment effects, standard errors, confidence intervals, and provide p-values. We list the different designs here along with how the software learns the design:

- Simple (both
`clusters`

and`blocks`

are unused) - Clustered (
`clusters`

is specified while`blocks`

is not) - Blocked (
`blocks`

is specified while`clusters`

is not) - Blocked and clustered (both are specified)

There are two subsets of blocked designs that we also consider:

- Matched-pairs (only
`blocks`

is specified and all blocks are size two) - Matched-pair clustered design (both names are specified and each block only has two clusters)

Note: if there are blocks of size two and blocks greater than size two, we default to the matched-pairs estimators described below.

For each design, our estimator is informed by the recent statistical literature on the analysis of experimental data.

**Any unblocked design**
[
\widehat{\tau} = \frac{1}{N} \sum^N_{i=1} z_i y_i - (1 - z_i) y_i
]
where $z_i$ is the treatment variable, $y_i$ is the outcome, and $N$ is the total number of units.

**Blocked design (including matched-pairs designs)**
[
\widehat{\tau} = \sum^J_{j=1} \frac{N_j}{N} \widehat{\tau_j}
]
where $J$ is the number of blocks, $N_j$ is the size of those blocks, and $\widehat{\tau_j}$ is the estimated difference-in-means in block $j$.

If the user specifies weights, treatment effects (or block-level treatment effects) and their standard errors are estimated by `lm_robust`

. There are three exceptions. First, we still compute the degrees of freedom as in the below table. Second, if the design is blocked, a weighted treatment effect and variance estimate are computed within each block using `lm_robust`

and then combined as below. Third, specifying weights with a matched-pairs estimator in `difference_in_means`

is not supported at the moment.

| Design type | Variance $\widehat{\mathbb{V}}[\widehat{\tau}]$ | Degrees of Freedom | Notes |
|---------------------------------|--------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| No blocks or clusters (standard) | $\frac{\widehat{\mathbb{V}}[y_{i,0}]}{N_0} + \frac{\widehat{\mathbb{V}}[y_{i,1}]}{N_1}$ | $\widehat{\mathbb{V}}[\widehat{\tau}]^2 \left(\frac{(\widehat{\mathbb{V}}[y_{i,1}]/ N_1)^2}{N_1 - 1} + \frac{(\widehat{\mathbb{V}}[y_{i,0}]/ N_0)^2}{N_0 - 1}\right)$ | Where $\widehat{\mathbb{V}}[y_{i,k}]$ is the Bessel-corrected variance of all units where $z_i = k$ and $N_k$ is the number of units in condition $k$. This is equivalent to the variance and Welchâ€“Satterthwaite approximation of the degrees of freedom used by R's `t.test`

. |
| Blocked | $\sum^J_{j=1} \left(\frac{N_j}{N}\right)^2 \widehat{\mathbb{V}}[\widehat{\tau_j}]$ | $N - 2 * J$ | Where $\widehat{\mathbb{V}}[\widehat{\tau_j}]$ is the variance of the estimated difference-in-means in block $j$. See footnote 17 on page 74 of [@gerbergreen2012] for a reference. The degrees of freedom are equivalent to a regression with a full set of block specific treatment effects. |
| Clusters | Same as `lm_robust`

CR2 estimator | Same as `lm_robust`

CR2 estimator | This variance is the same as that recommended by @gerbergreen2012 in equation 3.23 on page 83 when the clusters are even sizes. |
| Blocked and clustered | $\sum^J_{j=1} \left(\frac{N_j}{N}\right)^2 \widehat{\mathbb{V}}[\widehat{\tau_j}]$ | $S - 2 * J$ | Where $\widehat{\mathbb{V}}[\widehat{\tau_j}]$ is the variance of the estimated difference-in-means in block $j$ and S is the number of clusters. See footnote 17 on page 74 of @gerbergreen2012 for a reference. The degrees of freedom are equivalent to a regression on data collapsed by cluster with a full set of block specific treatment effects. |
| Matched pairs | $\frac{1}{J(J-1)} \sum^J_{j=1} \left(\widehat{\tau_j} - \widehat{\tau}\right)^2$ | $J - 1$ | See equation 3.16 on page 77 of @gerbergreen2012 for a reference. |
| Matched pair cluster randomized | $\frac{J}{(J-1)N^2} \sum^J_{j=1} \left(N_j \widehat{\tau_j} - \frac{N \widehat{\tau}}{J}\right)^2$ | $J - 1$ | See the variance for the SATE defined in equation 6 on page 36 of [@imaietal2009] and the suggested degrees of freedom on page 37. |

We build confidence intervals using the user specified $\alpha$ as:

[
\mathrm{CI}^{1-\alpha} = \left(\widehat{\tau} + t^{df}*{\alpha/2} \sqrt{\widehat{\mathbb{V}}[\widehat{\tau}]},\widehat{\tau}] + t^{df}*{1 - \alpha/2} \sqrt{\widehat{\mathbb{V}}[\widehat{\tau}]}\right)
]

We also provide two-sided p-values using a t-distribution with the aforementioned significance level $\alpha$ and degrees of freedom $df$.

`horvitz_thompson`

notesWe provide Horvitz-Thompson estimators for two-armed trials and can be used to estimate unbiased treatment effects when the randomization is known. Horvitz-Thompson estimators require information about the probability each unit is in treatment and control, as well as the joint probability each unit is in the treatment, in the control, and in opposite treatment conditions.

The estimator we implement here, `horvitz_thompson()`

, can be told the design of an experiment in several ways, and the reference page is a good place to see some of those examples. Users can see a description of the estimator and its properties in @aronowmiddleton2013, @middletonaronow2015, and @aronowsamii2017.

Some definitions used below:

- $\pi_{zi}$ is the marginal probability of being in condition $z \in {0, 1}$ for unit i
- $\pi_{ziwj}$ is the joint probability of unit $i$ being in condition $z$ and unit $j$ being in condition $w \in {0, 1}$
- $\epsilon_{ziwj}$ is the indicator function $\mathbb{1}\left(\pi_{ziwj} = 0\right)$

**Simple, complete, clustered**

[ \widehat{\tau} = \frac{1}{N} \sum^N_{i=1} z_i \frac{y_i}{\pi_{1i}} - (1 - z_i) \frac{y_i}{\pi_{0i}} ]

**Blocked**

[ \widehat{\tau} = \sum^J_{j=1} \frac{N_j}{N} \widehat{\tau_j} ] where $J$ is the number of blocks, $N_j$ is the size of those blocks, and $\widehat{\tau_j}$ is the Horvitz-Thompson estimate in block $j$.

Currently we provide variance estimates that rely on two separate assumptions:

`"youngs"`

which implements a conservative variance estimate using Young's inequality, described in equation 35 on page 147 of @aronowmiddleton2013 and in @aronowsamii2017 on pages 11-15.`"constant"`

which assumes constant treatment effects across all units but is less conservative. We only provide this estimator for simple randomized experiments.

**Young's inequality**

For designs that that are not clustered we use the following variance:

$$
\begin{aligned}
\widehat{\mathbb{V}}*{Y}[\widehat{\tau}] = \frac{1}{N^2} \sum^N*{i=1} \Bigg[& z_i \left(\frac{y_i}{\pi_{1i}}\right)^2 + (1 - z_i) \left(\frac{y_i}{\pi_{0i}}\right)^2 + \sum_{j \neq i} \bigg(\frac{z_i z_j}{\pi_{1i1j} + \epsilon_{1i1j}}(\pi_{1i1j} - \pi_{1i}\pi_{1j})\frac{y_i}{\pi_{1i}}\frac{y_j}{\pi_{1j}} \\\
& + \frac{(1-z_i) (1-z_j)}{\pi_{0i0j} + \epsilon_{0i0j}}(\pi_{0i0j} - \pi_{0i}\pi_{0j})\frac{y_i}{\pi_{0i}}\frac{y_j}{\pi_{0j}} - 2 \frac{z_i (1-z_j)}{\pi_{1i0j} + \epsilon_{1i0j}}(\pi_{1i0j} - \pi_{1i}\pi_{0j})\frac{y_i}{\pi_{1i}}\frac{y_j}{\pi_{0j}} \\\
& + \sum_{\forall j \colon \pi_{1i1j} = 0} \left( z_i \frac{y^2_i}{2\pi_{1i}} + z_j \frac{y^2_j}{\pi_{1j}}\right) + \sum_{\forall j \colon \pi_{0i0j} = 0} \left( (1-z_i) \frac{y^2_i}{2\pi_{0i}} + (1-z_j) \frac{y^2_j}{\pi_{0j}}\right)
\Bigg]
\end{aligned}
$$

There are some simplifications of the above for simpler designs that follow algebraically from the above. For example, if there are no two units for which the joint probability of being in either condition is 0, which is the case for most experiments that are not matched-pair experiments, then we get:

$$
\begin{aligned}
\widehat{\mathbb{V}}*{Y}[\widehat{\tau}] = \frac{1}{N^2} \sum^N*{i=1} \Bigg[& z_i \left(\frac{y_i}{\pi_{1i}}\right)^2 + (1 - z_i) \left(\frac{y_i}{\pi_{0i}}\right)^2 + \sum_{j \neq i} \bigg(\frac{z_i z_j}{\pi_{1i1j}}(\pi_{1i1j} - \pi_{1i}\pi_{1j})\frac{y_i}{\pi_{1i}}\frac{y_j}{\pi_{1j}} \\\
& + \frac{(1-z_i) (1-z_j)}{\pi_{0i0j}}(\pi_{0i0j} - \pi_{0i}\pi_{0j})\frac{y_i}{\pi_{0i}}\frac{y_j}{\pi_{0j}} - 2 \frac{z_i (1-z_j)}{\pi_{1i0j}}(\pi_{1i0j} - \pi_{1i}\pi_{0j})\frac{y_i}{\pi_{1i}}\frac{y_j}{\pi_{0j}}
\Bigg]
\end{aligned}
$$

If we further simplify to the case where there is simple random assignment, and there is absolutely no dependence among units (i.e., $\pi_{ziwj} = \pi_{zi}\pi_{wj} \;\;\forall\;\;z,w,i,j$), we get:

$$
\begin{aligned}
\widehat{\mathbb{V}}*{Y}[\widehat{\tau}] = \frac{1}{N^2} \sum^N*{i=1} \Bigg[& z_i \left(\frac{y_i}{\pi_{1i}}\right)^2 + (1 - z_i) \left(\frac{y_i}{\pi_{0i}}\right)^2\Bigg]
\end{aligned}
$$

**Clustered designs**

For clustered designs, we use the following collapsed estimator by setting `collapsed = TRUE`

. Here, $M$ is the total number of clusters, $y_k$ is the total of the outcomes $y_i$ for all $i$ units in cluster $k$, $\pi_zk$ is the marginal probability of cluster $k$ being in condition $z \in {0, 1}$, and $z_k$ and $\pi_{zkwl}$ are defined analogously. Warning! If one passes `condition_pr_mat`

to `horvitz_thompson`

for a clustered design, but not `clusters`

, the function will not use the collapsed estimator the the variance estimate will be inaccurate.

$$
\begin{aligned}
\widehat{\mathbb{V}}*{Y}[\widehat{\tau}] = \frac{1}{N^2} \sum^M*{k=1} \Bigg[& z_k \left(\frac{y_k}{\pi_{1k}}\right)^2 + (1 - z_k) \left(\frac{y_k}{\pi_{0k}}\right)^2 + \sum_{l \neq k} \bigg(\frac{z_k z_l}{\pi_{1k1l} + \epsilon_{1k1l}}(\pi_{1k1l} - \pi_{1k}\pi_{1l})\frac{y_k}{\pi_{1k}}\frac{y_l}{\pi_{1l}} \\\
& + \frac{(1-z_k) (1-z_l)}{\pi_{0k0l} + \epsilon_{0k0l}}(\pi_{0k0l} - \pi_{0k}\pi_{0l})\frac{y_k}{\pi_{0k}}\frac{y_l}{\pi_{0l}} - 2 \frac{z_k (1-z_l)}{\pi_{1k0l} + \epsilon_{1k0l}}(\pi_{1k0l} - \pi_{1k}\pi_{0l})\frac{y_k}{\pi_{1k}}\frac{y_l}{\pi_{0l}} \\\
& + \sum_{\forall l \colon \pi_{1k1l} = 0} \left( z_k \frac{y^2_k}{2\pi_{1k}} + z_l \frac{y^2_l}{\pi_{1l}}\right) + \sum_{\forall l \colon \pi_{0k0l} = 0} \left( (1-z_k) \frac{y^2_k}{2\pi_{0k}} + (1-z_l) \frac{y^2_l}{\pi_{0l}}\right)
\Bigg]
\end{aligned}
$$

**Constant effects**

Alternatively, one can assume constant treatment effects and, under that assumption, estimate the variance that is consistent under that assumption but less conservative. Again, this estimator is only implemented for the simple randomized case.

- $y_{zi}$ is the potential outcome for condition $z$ for unit $i$. This is either observed if $z_i = z$ or estimated using the constant effects assumption if $z_i \neq z$, where $z_i$ is the condition for unit $i$. To be precise $y_{1i} = z_i y_{i} + (1 - z_i) (y_{i} + \widehat{\tau})$ and $y_{0i} = z_i (y_{i} - \widehat{\tau}) + (1 - z_i) y_{i}$, where $\widehat{\tau}$ is the estimated treatment effect.

$$
\begin{aligned}
\widehat{\mathbb{V}}*{C}[\widehat{\tau}] = \frac{1}{N^2} \sum^N*{i=1} \Bigg[& (1 - \pi_{0i}) \pi_{0i} \left(\frac{y_{0i}}{\pi_{0i}}\right)^2 + (1 - \pi_{1i}) \pi_{1i} \left(\frac{y_{1i}}{\pi_{1i}}\right)^2 - 2 y_{1i} y_{0i} \\\
& + \sum_{j \neq i} \Big( (\pi_{0i0j} - \pi_{0i} \pi_{0j}) \frac{y_{0i}}{\pi_{0i}} \frac{y_{0j}}{\pi_{0j}} + (\pi_{1i1j} - \pi_{1i} \pi_{1j}) \frac{y_{1i}}{\pi_{1i}} \frac{y_{1j}}{\pi_{1j}} \\\
&- 2 (\pi_{1i0j} - \pi_{1i} \pi_{0j}) \frac{y_{1i}}{\pi_{1i}} \frac{y_{0j}}{\pi_{0j}}
\Big)\Bigg]
\end{aligned}
$$

Theory on hypothesis testing with the Horvitz-Thompson estimator is yet to be developed. We rely on a normal approximation and construct confidence intervals in the following way: [ \mathrm{CI}^{1-\alpha} = \left(\widehat{\tau} + z_{\alpha/2} \sqrt{\widehat{\mathbb{V}}[\widehat{\tau}]}, \widehat{\tau} + z_{1 - \alpha/2} \sqrt{\widehat{\mathbb{V}}[\widehat{\tau}]}\right) ]

The associated p-values for a two-sided null hypothesis test are computed using a normal distribution and the aforementioned significance level $\alpha$.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.