energybalance: Estimate Energy Balancing Weights

View source: R/energybalance.R

energybalanceR Documentation

Estimate Energy Balancing Weights

Description

Estimates energy balancing weights for matching a sample distribution to a target distribution.

Usage

energybalance(sampleX, Z = NULL, targetX = NULL,
              sampleW = NULL, targetW = NULL,
              std = "studentized", improved = TRUE,
              lambda = 0)

Arguments

sampleX

a matrix or data frame of covariates composing the sample distribution to be weighted. See Details for how this is processed.

Z

optional; a vector denoting treatment group membership. If non-NULL, each treatment group will be weighted to resemble the target distribution. If NULL, the entire sample distribution will be weighted to resemble the target distribution.

targetX

a matrix or data frame of covariates composing the target distribution. If targetX is NULL and Z is specified, sampleX will be used. See Details for how this is processed.

sampleW

optional; a vector of sampling weights for the sample. The product of the estimated weights and sampleW will be the distribution-matching weights.

targetW

optional; a vector of sampling weights for the target.

std

character; whether to standardize the covariates. If "studentized", the distance matrix used will be the Euclidean distance with each variable scaled using its standard deviation in the target distribution (weighted by targetW if supplied). If "mahalanobis", the distance matrix used will be the Mahalanobis distance computed using the covariance matrix of the target distribution (weighted by targetW if supplied). If "none", the distance matrix used will be the Euclidean distance matrix. Default is "studentized". Abbreviations allowed.

improved

logical; when Z is specified, whether to additionally balance the distributions between treatment groups. If TRUE, the energy distance between treatment groups and between each treatment group and the target distribution will be minimized. If FALSE, only the energy distance between each treatment group and the target distribution will be minimized. Default is TRUE as recommended by Huling and Mak (2020). Ignored when Z is NULL.

lambda

a penalty on the sum of the squared weights. lambda/nrow(sampleX)^2 times the sum of the squared weights is added to the objective function. Increasing lambda preserves the effective sample size of the weighted sample at the expense of balance.

Details

energybalance is the main function of the energybalance package. It estimates energy balancing weights for a broad set of scenarios, including estimating average treatment effects and generalizing or transporting estimates. The wrappers for energybalanceeb_ate, eb_att, eb_target, and eb_mediation–simplify its use for specific purposes.

Essentially, energybalance estimates energy balancing weights to make the sample distribution (sampleX) resemble the target distribution (targetX) by minimizing the weighted energy distance between them. When a treatment vector (Z) is supplied, the weights make each treatment group resemble the target distribution. When sampling weights for the target distribution (targetW) are supplied, the weights make the sample resemble the weighted target distribution; this is especially useful when weighting a sample to resemble a representative sample that requires sampling weights.

The energy distance between two groups is dependent on the scale of the variables because its original formulation relies on the Euclidean distance matrix between the two groups. When std is "studentized" (the default), the scaled Euclidean distance is used instead, which eliminates the dependence on scale. When std is "mahalanobis", the Mahalanobis distance is used instead, which eliminates the dependence on scale and the correlations between variables. Performance may vary for the different distances. Using the Mahalanobis distance eliminates the depence on the coding scheme used for factor variables (i.e., which category is dropped), but can yield poorer balancing performance for other covariates.

When sampleW are used, the final weights should be multiplied by sampleW prior to using them in an analysis.

When sampleX or targetX are supplied as data frames, they are turned into matrices, first by running cobalt::splitfactor(., drop.first = "if2"). This means factor or character variables will first be turned into dummy (0/1) variables. With more than two categories, all levels will have a corresponding dummy. With two categories, the dummy for one category will be dropped. The encoding of factor variables does not matter with std = "mahalanobis"; otherwise, different manual coding schemes can yield different results. Letting the function split the variables on its own is therefore preferred.

Value

A vector of weights, one for each row of sampleX.

Author(s)

Noah Greifer

See Also

The wrapper functions:

  • eb_ate for weights that estimate the average treatment effect (ATE).

  • eb_att for weights that estimate the average treatment effect on the treated (ATT).

  • eb_target for weights that allow generalizability or transportability of a sample to a target or adjust for censoring by loss to follow-up.

  • eb_mediation for weights that estimate the mean cross-world potential outcome for estimating natural direct and indirect effects in mediation analysis.


ngreifer/energybalance documentation built on July 27, 2022, 5:50 a.m.