lm_lin: Linear regression with the Lin (2013) covariate adjustment

Description Usage Arguments Details Value Examples

View source: R/estimatr_lm_lin.R

Description

This function is a wrapper for lm_robust that is useful for estimating treatment effects with pre-treatment covariate data. This implements the method described by Lin (2013) to reduce the bias of such estimation

Usage

1
2
3
lm_lin(formula, covariates, data, weights, subset, clusters, se_type = NULL,
  ci = TRUE, alpha = 0.05, coefficient_name = NULL, return_vcov = TRUE,
  try_cholesky = FALSE)

Arguments

formula

an object of class formula, as in lm, such as Y ~ Z with only one variable on the right-hand side, the treatment

covariates

a right-sided formula with pre-treatment covaraites on the right hand side, such as ~ x1 + x2 + x3.

data

A data.frame

weights

the bare (unquoted) names of the weights variable in the supplied data.

subset

An optional bare (unquoted) expression specifying a subset of observations to be used.

clusters

An optional bare (unquoted) name of the variable that corresponds to the clusters in the data.

se_type

The sort of standard error sought. Without clustering: "HC0", "HC1" (or "stata", the equivalent), "HC2" (default), "HC3", or "classical". With clustering: "CR0", "CR2" (default), or "stata". are permissible.

ci

A boolean for whether to compute and return pvalues and confidence intervals, TRUE by default.

alpha

The significance level, 0.05 by default.

coefficient_name

a character or character vector that indicates which coefficients should be reported. If left unspecified, returns all coefficients. Especially for models with clustering where only one coefficient is of interest, specifying a coefficient of interest may result in improvements in speed

return_vcov

a boolean for whether to return the variance-covariance matrix for later usage, TRUE by default.

try_cholesky

a boolean for whether to try using a Cholesky decomposition to solve LS instead of a QR decomposition, FALSE by default. Using a Cholesky decomposition may result in speed gains, but should only be used if users are sure their model is full-rank (i.e. there is no perfect multi-collinearity)

Details

This function is simply a wrapper for lm_robust. This method pre-processes the data by taking the covariates specified in the `covariates` argument, centering them by subtracting from each covariate its mean, and interacting them with the treatment. If the treatment has multiple values, a series of dummies for each value is created and each of those is interacted with the demeaned covariates. More details can be found in the Getting Started vignette and the technical notes.

Value

lm_lin returns an object of class "lm_robust".

The functions summary and tidy can be used to get the results as a data.frame. To get useful data out of the return, you can use these data frames, you can use the resulting list directly, or you can use the generic accessor functions coef, vcov, confint, and predict.

An object of class "lm_robust" is a list containing at least the following components:

est

the estimated coefficients

se

the estimated standard errors

df

the estimated degrees of freedom

p

the p-values from the t-test using est, se, and df

ci_lower

the lower bound of the 1 - alpha percent confidence interval

ci_upper

the upper bound of the 1 - alpha percent confidence interval

coefficient_name

a character vector of coefficient names

alpha

the significance level specified by the user

res_var

the residual variance, used for uncertainty when using predict

N

the number of observations used

k

the number of columns in the design matrix (includes linearly dependent columns!)

rank

the rank of the fitted model

vcov

the fitted variance covariance matrix

weighted

whether or not weights were applied

scaled_center

the means of each of the covariates used for centering them

We also return terms and contrasts, used by predict.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
library(fabricatr)
library(randomizr)
dat <- fabricate(
  N = 40,
  x = rnorm(N, mean = 2.3),
  x2 = rpois(N, lambda = 2),
  x3 = runif(N),
  y0 = rnorm(N) + x,
  y1 = rnorm(N) + x + 0.35
)

dat$z <- simple_ra(N = nrow(dat))
dat$y <- ifelse(dat$z == 1, dat$y1, dat$y0)

# Same specification as `lm_robust()` with one additional argument
lmlin_out <- lm_lin(y ~ z, covariates = ~ x, data = dat)
tidy(lmlin_out)

# Works with multiple pre-treatment covariates
lm_lin(y ~ z, covariates = ~ x + x2, data = dat)

# Also centers data AFTER evaluating any functions in formula
lm_lin(y ~ z, covariates = ~ x + log(x3), data = dat)

# Works easily with clusters
dat$clusterID <- rep(1:20, each = 2)
dat$z_clust <- cluster_ra(clusters = dat$clusterID)

lm_lin(y ~ z_clust, covariates = ~ x, data = dat, clusters = clusterID)

# Works with multi-valued treatments
dat$z_multi <- sample(1:3, size = nrow(dat), replace = TRUE)
lm_lin(y ~ z_multi, covariates = ~ x, data = dat)

DeclareDesign/DDestimate documentation built on Jan. 23, 2018, 7:01 a.m.