std_comp: Generate standardized complete cases

std_compR Documentation

Generate standardized complete cases

Description

Standardizes covariates for propensity score estimation before the distribution balancing weighting dbw with the regularization.

Usage

std_comp(
  formula_y,
  formula_ps,
  estimand = "ATE",
  method_y = "wls",
  data,
  std = TRUE,
  weights = NULL
)

Arguments

formula_y

an object of class formula (or one that can be coerced to that class): a symbolic description of the potential outcome model to be fitted. When you want to use non-DR type estimators, only include "1" in the right hand side of the formula for the Hajek estimator and only include "0" for the Horvitz-Thompson estimator. See example of dbw for more details.

formula_ps

an object of class formula (or one that can be coerced to that class): a symbolic description of the propensity score model to be fitted. For the entropy balancing weighting (method = "eb"), variables in the right hand side of the formula will be mean balanced.

estimand

a character string specifying a parameter of interest. Choose "ATT" for the average treatment effects on the treated estimation, "ATE" for the average treatment effects estimation, "ATC" for the average outcomes estimation in missing outcome cases. You can choose "ATEcombined" for the combined estimation for the average treatment effects estimation when using the covariate balancing weighting (method = "cb").

method_y

a character string specifying a method for potential outcome prediction. Choose "wls" for the linear model, "logit" for the logistic regression, "gam" for the generalized additive model for the continuous outcome, and "gambinom" for the generalized additive model for the binary outcome.

data

a data frame (or one that can be coerced to that class) containing the outcomes and the variables in the model.

std

a logical parameter indicating whether to standardize the covariates for propensity score estimation.

weights

an optional vector of ‘prior weights’ (e.g. sampling weights) to be used in the fitting process. Should be NULL or a numeric vector.

Details

std_comp first extracts complete cases for both propensity score estimation and outcome model estimation. Then it standardizes covariates for propensity score estimation by takeing the provided weights into account. The returned data frame is the "design matrix", which contains a set of dummy variables (depending on the contrasts) for factors and similarly expanded interaction terms.

For the AO estimation, NA values for the outcome variable for missing cases (the response variable taking "0") are not deleted. For this processing, the outcome variable name must not contain spaces.

Value

data

the complete-case data frame containing the outcome variable, the response (treatment) variable, and the standardized covariates for propensity score estimation.

weights

the initially-supplied weights for the complete cases, a vector of 1s if none were.

formula_ps

the automatically-created formula for propensity score estimation, which includes expanded factors and interactions.

mean_x

the weighted mean of each covariates.

sd_x

the weighted standard deviation of each covariates.

Author(s)

Hiroto Katsumata

See Also

dbw

Examples

# Simulation from Kang and Shafer (2007) and Imai and Ratkovic (2014)
# ATE estimation
# True ATE is 10
tau <- 10
set.seed(12345)
n <- 1000
X <- matrix(stats::rnorm(n * 4, mean = 0, sd = 1), nrow = n, ncol = 4)
prop <- 1 / (1 + exp(X[, 1] - 0.5 * X[, 2] + 
                     0.25 * X[, 3] + 0.1 * X[, 4]))
treat <- rbinom(n, 1, prop)
y <- 210 + 
     27.4 * X[, 1] + 13.7 * X[, 2] + 13.7 * X[, 3] + 13.7 * X[, 4] + 
     tau * treat + stats::rnorm(n = n, mean = 0, sd = 1)
ybinom <- (y > 210) + 0
df0 <- data.frame(X, treat, y, ybinom)
colnames(df0) <- c("x1", "x2", "x3", "x4", "treat", "y", "ybinom")

# Variables for a misspecified model
Xmis <- data.frame(x1mis = exp(X[, 1] / 2), 
                   x2mis = X[, 2] * (1 + exp(X[, 1]))^(-1) + 10,
                   x3mis = (X[, 1] * X[, 3] / 25 + 0.6)^3, 
                   x4mis = (X[, 2] + X[, 4] + 20)^2)

# Data frame and formulas for propensity score estimation
df <- data.frame(df0, Xmis)
formula_ps_c <- stats::as.formula(treat ~ x1 + x2 + x3 + x4)
formula_ps_m <- stats::as.formula(treat ~ x1mis + x2mis + 
                                          x3mis + x4mis)

# Formula for a misspecified outcome model
formula_y <- stats::as.formula(y ~ x1mis + x2mis + x3mis + x4mis)


# Distribution balancing weighting with regularization
# Standardization
res_std_comp <- std_comp(formula_y = formula_y, 
                         formula_ps = formula_ps_m, 
                         estimand = "ATE", method_y = "wls", 
                         data = df, std = TRUE,
                         weights = NULL)
fitdbwmr <- dbw(formula_y = formula_y, 
                formula_ps = res_std_comp$formula_ps, 
                estimand = "ATE", method = "dbw", method_y = "wls",
                data = res_std_comp$data, vcov = TRUE, 
                lambda = 0.01, weights = res_std_comp$weights, 
                clevel = 0.95)
summary(fitdbwmr)

dbw documentation built on Sept. 11, 2024, 6:50 p.m.