View source: R/estimatr_difference_in_means.R

difference_in_means | R Documentation |

Difference-in-means estimators that selects the appropriate point estimate, standard errors, and degrees of freedom for a variety of designs: unit randomized, cluster randomized, block randomized, block-cluster randomized, matched-pairs, and matched-pair cluster randomized designs

```
difference_in_means(
formula,
data,
blocks,
clusters,
weights,
subset,
se_type = c("default", "none"),
condition1 = NULL,
condition2 = NULL,
ci = TRUE,
alpha = 0.05
)
```

`formula` |
an object of class formula, as in |

`data` |
A |

`blocks` |
An optional bare (unquoted) name of the block variable. Use for blocked designs only. |

`clusters` |
An optional bare (unquoted) name of the variable that corresponds to the clusters in the data; used for cluster randomized designs. For blocked designs, clusters must nest within blocks. |

`weights` |
the bare (unquoted) names of the weights variable in the supplied data. |

`subset` |
An optional bare (unquoted) expression specifying a subset of observations to be used. |

`se_type` |
An optional string that can be one of |

`condition1` |
value in the treatment vector of the condition
to be the control. Effects are
estimated with |

`condition2` |
value in the treatment vector of the condition to be the
treatment. See |

`ci` |
logical. Whether to compute and return p-values and confidence intervals, TRUE by default. |

`alpha` |
The significance level, 0.05 by default. |

This function implements a difference-in-means estimator, with support for blocked, clustered, matched-pairs, block-clustered, and matched-pair clustered designs. One specifies their design by passing the blocks and clusters in their data and this function chooses which estimator is most appropriate.

If you pass only `blocks`

, if all blocks are of size two, we will
infer that the design is a matched-pairs design. If they are all size four
or larger, we will infer that it is a regular blocked design. If you pass
both `blocks`

and `clusters`

, we will similarly
infer whether it is a matched-pairs clustered design or a block-clustered
design the number of clusters per block. If the user passes only
`clusters`

, we will infer that the design was cluster-randomized. If
the user specifies neither the `blocks`

nor the `clusters`

,
a regular Welch's t-test will be performed.

Importantly, if the user specifies weights, the estimation is handed off
to `lm_robust`

with the appropriate robust standard errors
as weighted difference-in-means estimators are not implemented here.
More details of the about each of the estimators can be found in the
mathematical notes.

Returns an object of class `"difference_in_means"`

.

The post-estimation commands functions `summary`

and `tidy`

return results in a `data.frame`

. To get useful data out of the return,
you can use these data frames, you can use the resulting list directly, or
you can use the generic accessor functions `coef`

and
`confint`

.

An object of class `"difference_in_means"`

is a list containing at
least the following components:

`coefficients` |
the estimated difference in means |

`std.error` |
the estimated standard error |

`statistic` |
the t-statistic |

`df` |
the estimated degrees of freedom |

`p.value` |
the p-value from a two-sided t-test using |

`conf.low` |
the lower bound of the |

`conf.high` |
the upper bound of the |

`term` |
a character vector of coefficient names |

`alpha` |
the significance level specified by the user |

`N` |
the number of observations used |

`outcome` |
the name of the outcome variable |

`design` |
the name of the design learned from the arguments passed |

Gerber, Alan S, and Donald P Green. 2012. Field Experiments: Design, Analysis, and Interpretation. New York: W.W. Norton.

Imai, Kosuke, Gary King, Clayton Nall. 2009. "The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation." Statistical Science 24 (1). Institute of Mathematical Statistics: 29-53. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/08-STS274")}.

`lm_lin`

```
library(fabricatr)
library(randomizr)
# Get appropriate standard errors for unit-randomized designs
# ----------
# 1. Unit randomized
# ----------
dat <- fabricate(
N = 100,
Y = rnorm(100),
Z_comp = complete_ra(N, prob = 0.4),
)
table(dat$Z_comp)
difference_in_means(Y ~ Z_comp, data = dat)
# ----------
# 2. Cluster randomized
# ----------
# Accurates estimates and standard errors for clustered designs
dat$clust <- sample(20, size = nrow(dat), replace = TRUE)
dat$Z_clust <- cluster_ra(dat$clust, prob = 0.6)
table(dat$Z_clust, dat$clust)
summary(difference_in_means(Y ~ Z_clust, clusters = clust, data = dat))
# ----------
# 3. Block randomized
# ----------
dat$block <- rep(1:10, each = 10)
dat$Z_block <- block_ra(dat$block, prob = 0.5)
table(dat$Z_block, dat$block)
difference_in_means(Y ~ Z_block, blocks = block, data = dat)
# ----------
# 4. Block cluster randomized
# ----------
# Learns this design if there are two clusters per block
dat$small_clust <- rep(1:50, each = 2)
dat$big_blocks <- rep(1:5, each = 10)
dat$Z_blcl <- block_and_cluster_ra(
blocks = dat$big_blocks,
clusters = dat$small_clust
)
difference_in_means(
Y ~ Z_blcl,
blocks = big_blocks,
clusters = small_clust,
data = dat
)
# ----------
# 5. Matched-pairs
# ----------
# Matched-pair estimates and standard errors are also accurate
# Specified same as blocked design, function learns that
# it is matched pair from size of blocks!
dat$pairs <- rep(1:50, each = 2)
dat$Z_pairs <- block_ra(dat$pairs, prob = 0.5)
table(dat$pairs, dat$Z_pairs)
difference_in_means(Y ~ Z_pairs, blocks = pairs, data = dat)
# ----------
# 6. Matched-pair cluster randomized
# ----------
# Learns this design if there are two clusters per block
dat$small_clust <- rep(1:50, each = 2)
dat$cluster_pairs <- rep(1:25, each = 4)
table(dat$cluster_pairs, dat$small_clust)
dat$Z_mpcl <- block_and_cluster_ra(
blocks = dat$cluster_pairs,
clusters = dat$small_clust
)
difference_in_means(
Y ~ Z_mpcl,
blocks = cluster_pairs,
clusters = small_clust,
data = dat
)
# ----------
# Other examples
# ----------
# Also works with multi-valued treatments if users specify
# comparison of interest
dat$Z_multi <- simple_ra(
nrow(dat),
conditions = c("Treatment 2", "Treatment 1", "Control"),
prob_each = c(0.4, 0.4, 0.2)
)
# Only need to specify which condition is treated `condition2` and
# which is control `condition1`
difference_in_means(
Y ~ Z_multi,
condition1 = "Treatment 2",
condition2 = "Control",
data = dat
)
difference_in_means(
Y ~ Z_multi,
condition1 = "Treatment 1",
condition2 = "Control",
data = dat
)
# Specifying weights will result in estimation via lm_robust()
dat$w <- runif(nrow(dat))
difference_in_means(Y ~ Z_comp, weights = w, data = dat)
lm_robust(Y ~ Z_comp, weights = w, data = dat)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.