sbw: Stable Balancing Weights

Description Usage Arguments Details Value Author(s) References Examples

Description

Function for finding stable weights (e.g., of minimum variance) that balance the empirical distribution of the observed covariates up to levels prespecified by the user.

Usage

1
sbw(data_frame, t_ind, bal_covs, bal_tols, bal_tols_sd = TRUE, target = "treated", l_norm = "l_2", w_min = 0, normalize = 1, solver, display = 0, max_iter = 100000, rel_tol = 1e-4, abs_tol = 1e-4, gap_stop = TRUE, adaptive_rho = TRUE)

Arguments

data_frame

a data frame with the treatment or nonresponse indicator and covariates organized in columns.

t_ind

a string equal to the name of the treatment or nonresponse indicator in data_frame.

bal_covs

a vector of strings with the names of the covariates in data_frame to be balanced. Note that the covariates in data_frame can be transformations of original covariates to balance higher order single dimensional moments such as variances and skewness, and multidimensional moments such as correlations. If the transformations of the covariates are indicators of the quantiles of the empirical distribution of a covariate, then balancing all these indicators will tend to balance the entire marginal distribution of the covariate. Note that if bal_covs is specified, then bal_tols needs to be specified.

bal_tols

a scalar or vector of scalars defining the tolerances or maximum differences in means after weighting for the covariates defined in bal_covs. Note that if bal_tols is a vector then its length has to be equal to the number of columns of bal_covs.

bal_tols_sd

a logical that indicates whether the tolerances specified in bal_tols are expressed in the original units of the covariates or in standard deviations. The defaul is TRUE, meaning that the tolerances are expressed in standard deviations.

target

a string that determines whether the weights are constructed (i) to represent or recover the covariate structure of the treated units in the weighted controls (here, the target sample is defined by the units with 1's in t_ind so the 0's in t_ind are weighted to represent the 1's; this is the option target="treated"); (ii) to represent the control units in the weighted treated units (option target="controls"); or (iii), to represent or recover the covariate structure of both the treated and control units in the weighted controls (option target="all"). Typically, (iii) is be the appropriate option for adjusting for nonresponse in sample surveys, provided the 1's in t_ind encode the nonrespondents and the 0's the respondents, so the 0's in t_ind are weighted to recover the structure of the observed covariates of all the units in t_ind. In observational studies, (i) will typically be the appropriate option for estimating the average treatment effect on the treated (ATT), (ii) to estimate the average treatment on the controls (ATC), and (iii) can be applied twice to estimate the average treatment effect (ATE), once to recover the covariate structure of both the treated and control units in the weighted control units, and another time to recover the covariate structure of both the treated and control units in the weighted treated units.

l_norm

a string that defines the norm to be used in the objective function. If l_norm = "l_2" then the "ell-2" norm is used and the variance of the weights is minimized. If l_norm = "l_1" then the "ell-1" norm is used. The default is l_norm = "l_2".

w_min

a scalar determining the minimum value of the weights. The default is 0.

normalize

a binary scalar equal to 1 if the weights are constrained to add up to one, and 0 otherwise. The default is normalize = 1.

solver

a string equal to "cplex", "gurobi", "pogs" or "quadprog". CPLEX and Gurobi are commercial solvers, but free for academic users. On the other hand, "pogs" and "quadprog" are free for all. In our experience, "pogs" is the fastest solver option and able to handle larger datasets, but at the present it is rather difficult to install for non-Mac users. The default option is solver = "quadprog".

display

a binary scalar taking the value 1 if the output is to be displayed or 0 if not. The default is 0. This option is specific to "cplex", "gurobi" and "pogs".

max_iter

a scalar specifying the maximum number of iterations to be used with the solver option "pogs". The default is max_iter = 10000. Please see the POGS documentation.

rel_tol

a scalar specific to the solver option "pogs". The default is rel_tol = 1e-10. Please see the POGS documentation.

abs_tol

a scalar specific to the solver option "pogs". The default is abs_tol = 1e-10. Please see the POGS documentation.

gap_stop

a logical specific to the solver option "pogs". The default is gap_stop = TRUE. Please see the POGS documentation.

adaptive_rho

a logical specific to the solver option "pogs". The default is adaptive_rho = TRUE. Please see the POGS documentation.

Details

"sbw" finds stable weights, e.g., of minimum variance, that balance the empirical distribution of the observed covariates up to levels prespecified by the user. This method allows the user to directly balance the means of the observed covariates and other features of their marginal and joint distributions such as variances and correlations and also, say, the quantiles of interactions of pairs of observed covariates, thus balancing entire two-way marginals. The dual variables of the covariate balance constraints provide insight into the behavior of the variance of the optimal weights in relation to the level of covariate balance adjustment. The package also contains functions for weight diagnostics.

Value

A list with the following columns:

obj_total

value of the objective function at the optimum;

time

time elapsed to find the optimal solution;

status

whether the optimal weights were found;

data_frame_weights

data frame with the optimal weights;

dual_vars

dual variables or shadow prices of the covariate balancing constraints.

Author(s)

Jose R. Zubizarreta <zubizarreta@columbia.edu>

References

Zubizarreta, J. R., “Stable Weights that Balance Covariates for Causal Inference and Estimation with Incomplete Data," Journal of the American Statistical Association, in press.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
	
	# Simulate data
	kangschafer = function(n_obs) {
	#! Z are the true covariates
	#! t is the indicator for the respondents (treated)
	#! y is the outcome
	#! X are the observed covariates
	#! Returns Z, t y and X sorted in decreasing order by t
		Z = mvrnorm(n_obs, mu=rep(0, 4), Sigma=diag(4))
		p = 1/(1+exp(Z[, 1]-.5*Z[, 2]+.25*Z[, 3]+.1*Z[, 4]))
		t = rbinom(n_obs, 1, p)
		Zt = cbind(Z, p, t)
		Zt = Zt[order(t), ]
		Z = Zt[, 1:4]
		p = Zt[, 5]
		t = Zt[, 6]
		y = 210+27.4*Z[, 1]+13.7*Z[, 2]+13.7*Z[, 3]+13.7*Z[, 4]+rnorm(n_obs)
		X = cbind(exp(Z[, 1]/2), (Z[, 2]/(1+exp(Z[, 1])))+10, (Z[, 1]*Z[, 3]/25+.6)^3, (Z[, 2]+Z[, 4]+20)^2)	
		return(list(Z=Z, p=p, t=t, y=y, X=X)) 
	}	
	set.seed(1)
	n_obs = 200
	aux = kangschafer(n_obs)
	Z = aux$Z
	p = aux$p
	t = aux$t
	y = aux$y
	X = aux$X
		
	# Data frame	
	t_ind = t
	bal_covs = X
	data_frame = as.data.frame(cbind(t_ind, bal_covs))
	names(data_frame) = c("t_ind", "X1", "X2", "X3", "X4")	
	
	# Treatment indicator 
	t_ind = "t_ind"
	
	# Moment covariates
	bal_covs = c("X1", "X2", "X3", "X4")
	
	# Moment tolerances
	bal_tols = 1/100
	
	# Whether the moment tolerances are expressed in standard deviations
	bal_tols_sd = TRUE
	
	# Here, the 0's in t_ind are weighted to "represent" the 1's and estimate the average treatment on the treated
	target = "all"
	
	# Here, the "ell-2" norm is used to minimize the variance of the weights
	l_norm = "l_2"	
	
	# Minimum value of the weights
	w_min = 0 
	
	# Here, the weights are constrained to add up to one
	normalize = 1		
	
	# Solver
	solver = "quadprog"
		
	# Output display 
	display = 0

	# Find optimal weights
	out = sbw(data_frame, t_ind, bal_covs, bal_tols, bal_tols_sd, target, l_norm, w_min, normalize, solver, display)
	
	# Check balance
	data_frame_weights = out$data_frame_weights
	t_ind = "t_ind"
	bal_covs = c("X1", "X2", "X3", "X4")	
	weights = "weights"
	target = "all"
	meanbal(data_frame_weights, t_ind, bal_covs, weights, target, digits=2)

ngreifer/sbw documentation built on May 29, 2019, 3:17 p.m.