View source: R/optweight.fit.R
optweight.fit | R Documentation |
optweight.fit()
and optweightMV.fit()
perform the optimization for optweight()
and optweightMV()
and should, in most cases, not be used directly. Little processing of inputs is performed, so they must be given exactly as described below.
optweight.fit(
covs,
treat,
tols = 0,
estimand = "ATE",
targets = NULL,
s.weights = NULL,
b.weights = NULL,
focal = NULL,
norm = "l2",
std.binary = FALSE,
std.cont = TRUE,
min.w = 1e-08,
verbose = FALSE,
solver = NULL,
...
)
optweightMV.fit(
covs.list,
treat.list,
tols.list = list(0),
estimand = "ATE",
targets = NULL,
s.weights = NULL,
b.weights = NULL,
norm = "l2",
std.binary = FALSE,
std.cont = TRUE,
min.w = 1e-08,
verbose = FALSE,
solver = NULL,
...
)
covs |
a numeric matrix of covariates to be balanced. |
treat |
a vector of treatment statuses. Non-numeric (i.e., factor or character) vectors are allowed. |
tols |
a vector of balance tolerance values for each covariate. Default is 0. |
estimand |
the desired estimand, which determines the target population. For binary treatments, can be "ATE", "ATT", "ATC", or |
targets |
an optional vector of target population mean values for each baseline covariate. The resulting weights will yield sample means within |
s.weights |
an optional vector of sampling weights. Default is a vector of 1s. |
b.weights |
an optional vector of base weights. Default is a vector of 1s. |
focal |
when multi-categorical treatments are used and the |
norm |
|
std.binary , std.cont |
|
min.w |
|
verbose |
|
solver |
string; the name of the optimization solver to use. Allowable options depend on |
... |
Options that are passed to the settings function corresponding to |
covs.list |
a list containing one numeric matrix of covariates to be balanced for each treatment. |
treat.list |
a list containing one vector of treatment statuses for each treatment. |
tols.list |
a list of balance tolerance vectors, one for each treatment, each with a value for each covariate. |
optweight.fit()
and optweightMV.fit()
transform the inputs into the required inputs for the optimization functions, which are (sparse) matrices and vectors, and then supplies the outputs (the weights, dual variables, and convergence information) back to optweight()
or optweightMV()
. Little processing of inputs is performed, as this is normally handled by optweight()
or optweightMV()
.
Target and balance constraints are applied to the product of the estimated weights and the sampling weights. In addition,the sum of the product of the estimated weights and the sampling weights is constrained to be equal to the sum of the product of the base weights and sampling weights. For binary and multi-category treatments, these constraints apply within each treatment group.
norm
The objective function for the optimization problem is f\left(w_i, b_i, s_i\right)
, where w_i
is the estimated weight for unit i
, s_i
is the sampling weight for unit i
(supplied by s.weights
) and b_i
is the base weight for unit i
(supplied by b.weights
). The norm
argument determines f(.,.,.)
, as detailed below:
when norm = "l2"
, f\left(w_i, b_i, s_i\right) = \frac{1}{n} \sum_i {s_i(w_i - b_i)^2}
when norm = "l1"
, f\left(w_i, b_i, s_i\right) = \frac{1}{n} \sum_i {s_i \vert w_i - b_i \vert}
when norm = "linf"
, f\left(w_i, b_i, s_i\right) = \max_i {\vert w_i - b_i \vert}
when norm = "entropy"
, f\left(w_i, b_i, s_i\right) = \frac{1}{n} \sum_i {s_i w_i \log \frac{w_i}{b_i}}
when norm = "log"
, f\left(w_i, b_i, s_i\right) = \frac{1}{n} \sum_i {-s_i \log \frac{w_i}{b_i}}
By default, s.weights
and b.weights
are set to 1 for all units unless supplied. b.weights
must be positive when norm
is "entropy"
or "log"
, and norm = "linf"
cannot be used when s.weights
are supplied.
When norm = "l2"
and both s.weights
and b.weights
are NULL
, weights are estimated to maximize the effective sample size. When norm = "entropy"
, the estimated weights are equivalent to entropy balancing weights (Källberg & Waernbaum, 2023). When norm = "log"
, b.weights
are ignored in the optimization, as they do not affect the estimated weights.
solver
The solver
argument controls which optimization solver is used. Different solvers are compatible with each norm
. See the table below for allowable options, which package they require, which function does the solving, and which function controls the settings.
solver | norm | Package | Solver function | Settings function |
"osqp" | "l2" , "l1" , "linf" | osqp | osqp::solve_osqp() | osqp::osqpSettings() |
"highs" | "l2" , "l1" , "linf" | highs | \pkgfunhighshighs_solve | \pkgfunhighshighs_control / \pkgfunhighshighs_available_solver_options |
"lpsolve" | "l1" , "linf" | lpSolve | \pkgfunlpSolvelp | . |
"scs" | "entropy" , "log" | scs | \pkgfunscsscs | \pkgfunscsscs_control |
"clarabel" | "entropy" , "log" | clarabel | \pkgfunclarabelclarabel | \pkgfunclarabelclarabel_control |
Note that "lpsolve"
can only be used when min.w
is nonnegative.
The default solver
for each norm
is as follows:
norm | Default solver |
"l2" | "osqp" |
"l1" | "highs" |
"linf" | "highs" |
"entropy" | "scs" |
"log" | "scs" |
If the package corresponding to a default solver
is not installed but the package for a different eligible solver is, that will be used. Otherwise, you will be asked to install the required package. osqp is required for optweight, and so will be the default for the "l1"
and "linf"
norms if highs is not installed. The default package is the one has shown good performance for the given norm; generally, all eligible solvers perform about equally well in terms of accuracy but differ in time taken.
Sometimes the optimization will fail to converge at a solution. There are a variety of reasons why this might happen, which include that the constraints are nearly impossible to satisfy or that the optimization surface is relatively flat. It can be hard to know the exact cause or how to solve it, but this section offers some solutions one might try. Typically, solutions can be found most easily when using the "l2"
norm; other norms, especially "linf"
and "l1"
, are more likely to see problems.
Rarely is the problem too few iterations, though this is possible. Most problems can be solved in the default 200,000 iterations, but sometimes it can help to increase this number with the max_iter
argument. Usually, though, this just ends up taking more time without a solution found.
If the problem is that the constraints are too tight, it can be helpful to loosen the constraints. Sometimes examining the dual variables of a solution that has failed to converge can reveal which constraints are causing the problem.
Sometimes a suboptimal solution is possible; such a solution does not satisfy the constraints exactly but will come pretty close. To allow these solutions, the argument eps
can be increased to larger values.
Sometimes using a different solver can improve performance. Using the default solver
for each norm
, as described above, can reduce the probability of convergence failures.
An optweight.fit
or optweightMV.fit
object with the following elements:
w |
The estimated weights, one for each unit. |
duals |
A data.frame containing the dual variables for each covariate (for |
info |
A list containing information about the performance of the optimization at termination. |
Chattopadhyay, A., Cohn, E. R., & Zubizarreta, J. R. (2024). One-Step Weighting to Generalize and Transport Treatment Effect Estimates to a Target Population. The American Statistician, 78(3), 280–289. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00031305.2023.2267598")}
Källberg, D., & Waernbaum, I. (2023). Large Sample Properties of Entropy Balancing Estimators of Average Causal Effects. Econometrics and Statistics. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.ecosta.2023.11.004")}
Wang, Y., & Zubizarreta, J. R. (2020). Minimal dispersion approximately balancing weights: Asymptotic properties and practical considerations. Biometrika, 107(1), 93–105. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/biomet/asz050")}
Zubizarreta, J. R. (2015). Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data. Journal of the American Statistical Association, 110(511), 910–922. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/01621459.2015.1023805")}
optweight()
and optweightMV()
which you should use for estimating the balancing weights, unless you know better.
library("cobalt")
data("lalonde", package = "cobalt")
treat <- lalonde$treat
covs <- splitfactor(lalonde[2:8], drop.first = "if2")
ow.fit <- optweight.fit(covs,
treat,
tols = .02,
estimand = "ATE",
norm = "l2")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.