View source: R/energybalance.R
energybalance | R Documentation |
Estimates energy balancing weights for matching a sample distribution to a target distribution.
energybalance(sampleX, Z = NULL, targetX = NULL, sampleW = NULL, targetW = NULL, std = "studentized", improved = TRUE, lambda = 0)
sampleX |
a matrix or data frame of covariates composing the sample distribution to be weighted. See Details for how this is processed. |
Z |
optional; a vector denoting treatment group membership. If non- |
targetX |
a matrix or data frame of covariates composing the target distribution. If |
sampleW |
optional; a vector of sampling weights for the sample. The product of the estimated weights and |
targetW |
optional; a vector of sampling weights for the target. |
std |
|
improved |
|
lambda |
a penalty on the sum of the squared weights. |
energybalance
is the main function of the energybalance package. It estimates energy balancing weights for a broad set of scenarios, including estimating average treatment effects and generalizing or transporting estimates. The wrappers for energybalance
–eb_ate
, eb_att
, eb_target
, and eb_mediation
–simplify its use for specific purposes.
Essentially, energybalance
estimates energy balancing weights to make the sample distribution (sampleX
) resemble the target distribution (targetX
) by minimizing the weighted energy distance between them. When a treatment vector (Z
) is supplied, the weights make each treatment group resemble the target distribution. When sampling weights for the target distribution (targetW
) are supplied, the weights make the sample resemble the weighted target distribution; this is especially useful when weighting a sample to resemble a representative sample that requires sampling weights.
The energy distance between two groups is dependent on the scale of the variables because its original formulation relies on the Euclidean distance matrix between the two groups. When std
is "studentized"
(the default), the scaled Euclidean distance is used instead, which eliminates the dependence on scale. When std
is "mahalanobis"
, the Mahalanobis distance is used instead, which eliminates the dependence on scale and the correlations between variables. Performance may vary for the different distances. Using the Mahalanobis distance eliminates the depence on the coding scheme used for factor variables (i.e., which category is dropped), but can yield poorer balancing performance for other covariates.
When sampleW
are used, the final weights should be multiplied by sampleW
prior to using them in an analysis.
When sampleX
or targetX
are supplied as data frames, they are turned into matrices, first by running cobalt::splitfactor(., drop.first = "if2")
. This means factor or character variables will first be turned into dummy (0/1) variables. With more than two categories, all levels will have a corresponding dummy. With two categories, the dummy for one category will be dropped. The encoding of factor variables does not matter with std = "mahalanobis"
; otherwise, different manual coding schemes can yield different results. Letting the function split the variables on its own is therefore preferred.
A vector of weights, one for each row of sampleX
.
Noah Greifer
The wrapper functions:
eb_ate
for weights that estimate the average treatment effect (ATE).
eb_att
for weights that estimate the average treatment effect on the treated (ATT).
eb_target
for weights that allow generalizability or transportability of a sample to a target or adjust for censoring by loss to follow-up.
eb_mediation
for weights that estimate the mean cross-world potential outcome for estimating natural direct and indirect effects in mediation analysis.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.