std_comp: Generate standardized complete cases
In dbw: Doubly Robust Distribution Balancing Weighting Estimation

std_comp

R Documentation

Generate standardized complete cases

Description

Standardizes covariates for propensity score estimation before the distribution balancing weighting dbw with the regularization.

Usage

std_comp(
  formula_y,
  formula_ps,
  estimand = "ATE",
  method_y = "wls",
  data,
  std = TRUE,
  weights = NULL
)

Arguments

`formula_y`	an object of class `formula` (or one that can be coerced to that class): a symbolic description of the potential outcome model to be fitted. When you want to use non-DR type estimators, only include "1" in the right hand side of the formula for the Hajek estimator and only include "0" for the Horvitz-Thompson estimator. See example of `dbw` for more details.
`formula_ps`	an object of class `formula` (or one that can be coerced to that class): a symbolic description of the propensity score model to be fitted. For the entropy balancing weighting (`method = "eb"`), variables in the right hand side of the formula will be mean balanced.
`estimand`	a character string specifying a parameter of interest. Choose "ATT" for the average treatment effects on the treated estimation, "ATE" for the average treatment effects estimation, "ATC" for the average outcomes estimation in missing outcome cases. You can choose "ATEcombined" for the combined estimation for the average treatment effects estimation when using the covariate balancing weighting (`method = "cb"`).
`method_y`	a character string specifying a method for potential outcome prediction. Choose "wls" for the linear model, "logit" for the logistic regression, "gam" for the generalized additive model for the continuous outcome, and "gambinom" for the generalized additive model for the binary outcome.
`data`	a data frame (or one that can be coerced to that class) containing the outcomes and the variables in the model.
`std`	a logical parameter indicating whether to standardize the covariates for propensity score estimation.
`weights`	an optional vector of ‘prior weights’ (e.g. sampling weights) to be used in the fitting process. Should be NULL or a numeric vector.

Details

std_comp first extracts complete cases for both propensity score estimation and outcome model estimation. Then it standardizes covariates for propensity score estimation by takeing the provided weights into account. The returned data frame is the "design matrix", which contains a set of dummy variables (depending on the contrasts) for factors and similarly expanded interaction terms.

For the AO estimation, NA values for the outcome variable for missing cases (the response variable taking "0") are not deleted. For this processing, the outcome variable name must not contain spaces.

Value

`data`	the complete-case data frame containing the outcome variable, the response (treatment) variable, and the standardized covariates for propensity score estimation.
`weights`	the initially-supplied weights for the complete cases, a vector of 1s if none were.
`formula_ps`	the automatically-created formula for propensity score estimation, which includes expanded factors and interactions.
`mean_x`	the weighted mean of each covariates.
`sd_x`	the weighted standard deviation of each covariates.

Author(s)

Hiroto Katsumata

Examples

# Simulation from Kang and Shafer (2007) and Imai and Ratkovic (2014)
# ATE estimation
# True ATE is 10
tau <- 10
set.seed(12345)
n <- 1000
X <- matrix(stats::rnorm(n * 4, mean = 0, sd = 1), nrow = n, ncol = 4)
prop <- 1 / (1 + exp(X[, 1] - 0.5 * X[, 2] + 
                     0.25 * X[, 3] + 0.1 * X[, 4]))
treat <- rbinom(n, 1, prop)
y <- 210 + 
     27.4 * X[, 1] + 13.7 * X[, 2] + 13.7 * X[, 3] + 13.7 * X[, 4] + 
     tau * treat + stats::rnorm(n = n, mean = 0, sd = 1)
ybinom <- (y > 210) + 0
df0 <- data.frame(X, treat, y, ybinom)
colnames(df0) <- c("x1", "x2", "x3", "x4", "treat", "y", "ybinom")

# Variables for a misspecified model
Xmis <- data.frame(x1mis = exp(X[, 1] / 2), 
                   x2mis = X[, 2] * (1 + exp(X[, 1]))^(-1) + 10,
                   x3mis = (X[, 1] * X[, 3] / 25 + 0.6)^3, 
                   x4mis = (X[, 2] + X[, 4] + 20)^2)

# Data frame and formulas for propensity score estimation
df <- data.frame(df0, Xmis)
formula_ps_c <- stats::as.formula(treat ~ x1 + x2 + x3 + x4)
formula_ps_m <- stats::as.formula(treat ~ x1mis + x2mis + 
                                          x3mis + x4mis)

# Formula for a misspecified outcome model
formula_y <- stats::as.formula(y ~ x1mis + x2mis + x3mis + x4mis)


# Distribution balancing weighting with regularization
# Standardization
res_std_comp <- std_comp(formula_y = formula_y, 
                         formula_ps = formula_ps_m, 
                         estimand = "ATE", method_y = "wls", 
                         data = df, std = TRUE,
                         weights = NULL)
fitdbwmr <- dbw(formula_y = formula_y, 
                formula_ps = res_std_comp$formula_ps, 
                estimand = "ATE", method = "dbw", method_y = "wls",
                data = res_std_comp$data, vcov = TRUE, 
                lambda = 0.01, weights = res_std_comp$weights, 
                clevel = 0.95)
summary(fitdbwmr)

dbw documentation built on Sept. 11, 2024, 6:50 p.m.