Home

/

GitHub

/

ngreifer/energybalance

/

energybalance: Estimate Energy Balancing Weights

energybalance: Estimate Energy Balancing Weights
In ngreifer/energybalance: Distribution Matching Using Energy Balancing

View source: R/energybalance.R

energybalance

R Documentation

Estimate Energy Balancing Weights

Description

Estimates energy balancing weights for matching a sample distribution to a target distribution.

Usage

energybalance(sampleX, Z = NULL, targetX = NULL,
              sampleW = NULL, targetW = NULL,
              std = "studentized", improved = TRUE,
              lambda = 0)

Arguments

`sampleX`	a matrix or data frame of covariates composing the sample distribution to be weighted. See Details for how this is processed.
`Z`	optional; a vector denoting treatment group membership. If non-`NULL`, each treatment group will be weighted to resemble the target distribution. If `NULL`, the entire sample distribution will be weighted to resemble the target distribution.
`targetX`	a matrix or data frame of covariates composing the target distribution. If `targetX` is `NULL` and `Z` is specified, `sampleX` will be used. See Details for how this is processed.
`sampleW`	optional; a vector of sampling weights for the sample. The product of the estimated weights and `sampleW` will be the distribution-matching weights.
`targetW`	optional; a vector of sampling weights for the target.
`std`	`character`; whether to standardize the covariates. If `"studentized"`, the distance matrix used will be the Euclidean distance with each variable scaled using its standard deviation in the target distribution (weighted by `targetW` if supplied). If `"mahalanobis"`, the distance matrix used will be the Mahalanobis distance computed using the covariance matrix of the target distribution (weighted by `targetW` if supplied). If `"none"`, the distance matrix used will be the Euclidean distance matrix. Default is `"studentized"`. Abbreviations allowed.
`improved`	`logical`; when `Z` is specified, whether to additionally balance the distributions between treatment groups. If `TRUE`, the energy distance between treatment groups and between each treatment group and the target distribution will be minimized. If `FALSE`, only the energy distance between each treatment group and the target distribution will be minimized. Default is `TRUE` as recommended by Huling and Mak (2020). Ignored when `Z` is `NULL`.
`lambda`	a penalty on the sum of the squared weights. `lambda/nrow(sampleX)^2` times the sum of the squared weights is added to the objective function. Increasing `lambda` preserves the effective sample size of the weighted sample at the expense of balance.

Details

energybalance is the main function of the energybalance package. It estimates energy balancing weights for a broad set of scenarios, including estimating average treatment effects and generalizing or transporting estimates. The wrappers for energybalance–eb_ate, eb_att, eb_target, and eb_mediation–simplify its use for specific purposes.

Essentially, energybalance estimates energy balancing weights to make the sample distribution (sampleX) resemble the target distribution (targetX) by minimizing the weighted energy distance between them. When a treatment vector (Z) is supplied, the weights make each treatment group resemble the target distribution. When sampling weights for the target distribution (targetW) are supplied, the weights make the sample resemble the weighted target distribution; this is especially useful when weighting a sample to resemble a representative sample that requires sampling weights.

The energy distance between two groups is dependent on the scale of the variables because its original formulation relies on the Euclidean distance matrix between the two groups. When std is "studentized" (the default), the scaled Euclidean distance is used instead, which eliminates the dependence on scale. When std is "mahalanobis", the Mahalanobis distance is used instead, which eliminates the dependence on scale and the correlations between variables. Performance may vary for the different distances. Using the Mahalanobis distance eliminates the depence on the coding scheme used for factor variables (i.e., which category is dropped), but can yield poorer balancing performance for other covariates.

When sampleW are used, the final weights should be multiplied by sampleW prior to using them in an analysis.

When sampleX or targetX are supplied as data frames, they are turned into matrices, first by running cobalt::splitfactor(., drop.first = "if2"). This means factor or character variables will first be turned into dummy (0/1) variables. With more than two categories, all levels will have a corresponding dummy. With two categories, the dummy for one category will be dropped. The encoding of factor variables does not matter with std = "mahalanobis"; otherwise, different manual coding schemes can yield different results. Letting the function split the variables on its own is therefore preferred.

Value

A vector of weights, one for each row of sampleX.

Author(s)

Noah Greifer

ngreifer/energybalance
Distribution Matching Using Energy Balancing

energybalance: Estimate Energy Balancing Weights
In ngreifer/energybalance: Distribution Matching Using Energy Balancing

Estimate Energy Balancing Weights

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to energybalance in ngreifer/energybalance...

R Package Documentation

Browse R Packages

We want your feedback!

ngreifer/energybalance Distribution Matching Using Energy Balancing

energybalance: Estimate Energy Balancing Weights In ngreifer/energybalance: Distribution Matching Using Energy Balancing

Estimate Energy Balancing Weights

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to energybalance in ngreifer/energybalance...

R Package Documentation

Browse R Packages

We want your feedback!

ngreifer/energybalance
Distribution Matching Using Energy Balancing

energybalance: Estimate Energy Balancing Weights
In ngreifer/energybalance: Distribution Matching Using Energy Balancing