sbw: Stable Balancing Weights
In ngreifer/sbw: Stable weights that balance covariates

Description Usage Arguments Details Value Author(s) References Examples

Function for finding stable weights (e.g., of minimum variance) that balance the empirical distribution of the observed covariates up to levels prespecified by the user.

sbw(data_frame, t_ind, bal_covs, bal_tols, bal_tols_sd = TRUE, target = "treated", l_norm = "l_2", w_min = 0, normalize = 1, solver, display = 0, max_iter = 100000, rel_tol = 1e-4, abs_tol = 1e-4, gap_stop = TRUE, adaptive_rho = TRUE)

`data_frame`	a data frame with the treatment or nonresponse indicator and covariates organized in columns.
`t_ind`	a string equal to the name of the treatment or nonresponse indicator in `data_frame`.
`bal_covs`	a vector of strings with the names of the covariates in `data_frame` to be balanced. Note that the covariates in `data_frame` can be transformations of original covariates to balance higher order single dimensional moments such as variances and skewness, and multidimensional moments such as correlations. If the transformations of the covariates are indicators of the quantiles of the empirical distribution of a covariate, then balancing all these indicators will tend to balance the entire marginal distribution of the covariate. Note that if `bal_covs` is specified, then `bal_tols` needs to be specified.
`bal_tols`	a scalar or vector of scalars defining the tolerances or maximum differences in means after weighting for the covariates defined in `bal_covs`. Note that if `bal_tols` is a vector then its length has to be equal to the number of columns of `bal_covs`.
`bal_tols_sd`	a logical that indicates whether the tolerances specified in `bal_tols` are expressed in the original units of the covariates or in standard deviations. The defaul is TRUE, meaning that the tolerances are expressed in standard deviations.
`target`	a string that determines whether the weights are constructed (i) to represent or recover the covariate structure of the treated units in the weighted controls (here, the target sample is defined by the units with 1's in `t_ind` so the 0's in `t_ind` are weighted to represent the 1's; this is the option `target="treated"`); (ii) to represent the control units in the weighted treated units (option `target="controls"`); or (iii), to represent or recover the covariate structure of both the treated and control units in the weighted controls (option `target="all"`). Typically, (iii) is be the appropriate option for adjusting for nonresponse in sample surveys, provided the 1's in `t_ind` encode the nonrespondents and the 0's the respondents, so the 0's in `t_ind` are weighted to recover the structure of the observed covariates of all the units in `t_ind`. In observational studies, (i) will typically be the appropriate option for estimating the average treatment effect on the treated (ATT), (ii) to estimate the average treatment on the controls (ATC), and (iii) can be applied twice to estimate the average treatment effect (ATE), once to recover the covariate structure of both the treated and control units in the weighted control units, and another time to recover the covariate structure of both the treated and control units in the weighted treated units.
`l_norm`	a string that defines the norm to be used in the objective function. If `l_norm = "l_2"` then the "ell-2" norm is used and the variance of the weights is minimized. If `l_norm = "l_1"` then the "ell-1" norm is used. The default is `l_norm = "l_2"`.
`w_min`	a scalar determining the minimum value of the weights. The default is 0.
`normalize`	a binary scalar equal to 1 if the weights are constrained to add up to one, and 0 otherwise. The default is `normalize = 1`.
`solver`	a string equal to `"cplex"`, `"gurobi"`, `"pogs"` or `"quadprog"`. CPLEX and Gurobi are commercial solvers, but free for academic users. On the other hand, `"pogs"` and `"quadprog"` are free for all. In our experience, `"pogs"` is the fastest solver option and able to handle larger datasets, but at the present it is rather difficult to install for non-Mac users. The default option is `solver = "quadprog"`.
`display`	a binary scalar taking the value 1 if the output is to be displayed or 0 if not. The default is 0. This option is specific to `"cplex"`, `"gurobi"` and `"pogs"`.
`max_iter`	a scalar specifying the maximum number of iterations to be used with the solver option `"pogs"`. The default is `max_iter = 10000`. Please see the POGS documentation.
`rel_tol`	a scalar specific to the solver option `"pogs"`. The default is `rel_tol = 1e-10`. Please see the POGS documentation.
`abs_tol`	a scalar specific to the solver option `"pogs"`. The default is `abs_tol = 1e-10`. Please see the POGS documentation.
`gap_stop`	a logical specific to the solver option `"pogs"`. The default is `gap_stop = TRUE`. Please see the POGS documentation.
`adaptive_rho`	a logical specific to the solver option `"pogs"`. The default is `adaptive_rho = TRUE`. Please see the POGS documentation.

"sbw" finds stable weights, e.g., of minimum variance, that balance the empirical distribution of the observed covariates up to levels prespecified by the user. This method allows the user to directly balance the means of the observed covariates and other features of their marginal and joint distributions such as variances and correlations and also, say, the quantiles of interactions of pairs of observed covariates, thus balancing entire two-way marginals. The dual variables of the covariate balance constraints provide insight into the behavior of the variance of the optimal weights in relation to the level of covariate balance adjustment. The package also contains functions for weight diagnostics.

A list with the following columns:

`obj_total`	value of the objective function at the optimum;
`time`	time elapsed to find the optimal solution;
`status`	whether the optimal weights were found;
`data_frame_weights`	data frame with the optimal weights;
`dual_vars`	dual variables or shadow prices of the covariate balancing constraints.

Jose R. Zubizarreta <zubizarreta@columbia.edu>

Zubizarreta, J. R., “Stable Weights that Balance Covariates for Causal Inference and Estimation with Incomplete Data," Journal of the American Statistical Association, in press.

	
	# Simulate data
	kangschafer = function(n_obs) {
	#! Z are the true covariates
	#! t is the indicator for the respondents (treated)
	#! y is the outcome
	#! X are the observed covariates
	#! Returns Z, t y and X sorted in decreasing order by t
		Z = mvrnorm(n_obs, mu=rep(0, 4), Sigma=diag(4))
		p = 1/(1+exp(Z[, 1]-.5*Z[, 2]+.25*Z[, 3]+.1*Z[, 4]))
		t = rbinom(n_obs, 1, p)
		Zt = cbind(Z, p, t)
		Zt = Zt[order(t), ]
		Z = Zt[, 1:4]
		p = Zt[, 5]
		t = Zt[, 6]
		y = 210+27.4*Z[, 1]+13.7*Z[, 2]+13.7*Z[, 3]+13.7*Z[, 4]+rnorm(n_obs)
		X = cbind(exp(Z[, 1]/2), (Z[, 2]/(1+exp(Z[, 1])))+10, (Z[, 1]*Z[, 3]/25+.6)^3, (Z[, 2]+Z[, 4]+20)^2)	
		return(list(Z=Z, p=p, t=t, y=y, X=X)) 
	}	
	set.seed(1)
	n_obs = 200
	aux = kangschafer(n_obs)
	Z = aux$Z
	p = aux$p
	t = aux$t
	y = aux$y
	X = aux$X
		
	# Data frame	
	t_ind = t
	bal_covs = X
	data_frame = as.data.frame(cbind(t_ind, bal_covs))
	names(data_frame) = c("t_ind", "X1", "X2", "X3", "X4")	
	
	# Treatment indicator 
	t_ind = "t_ind"
	
	# Moment covariates
	bal_covs = c("X1", "X2", "X3", "X4")
	
	# Moment tolerances
	bal_tols = 1/100
	
	# Whether the moment tolerances are expressed in standard deviations
	bal_tols_sd = TRUE
	
	# Here, the 0's in t_ind are weighted to "represent" the 1's and estimate the average treatment on the treated
	target = "all"
	
	# Here, the "ell-2" norm is used to minimize the variance of the weights
	l_norm = "l_2"	
	
	# Minimum value of the weights
	w_min = 0 
	
	# Here, the weights are constrained to add up to one
	normalize = 1		
	
	# Solver
	solver = "quadprog"
		
	# Output display 
	display = 0

	# Find optimal weights
	out = sbw(data_frame, t_ind, bal_covs, bal_tols, bal_tols_sd, target, l_norm, w_min, normalize, solver, display)
	
	# Check balance
	data_frame_weights = out$data_frame_weights
	t_ind = "t_ind"
	bal_covs = c("X1", "X2", "X3", "X4")	
	weights = "weights"
	target = "all"
	meanbal(data_frame_weights, t_ind, bal_covs, weights, target, digits=2)