bal: Construct Covariate Balance Statistics for Models with...
In mvGPS: Causal Inference using Multivariate Generalized Propensity Score

Description Usage Arguments Details Value References Examples

View source: R/bal.R

Assessing balance between exposure(s) and confounders is key when performing causal analysis using propensity scores. We provide a list of several models to generate weights to use in causal inference for multivariate exposures, and test the balancing property of these weights using weighted Pearson correlations. In addition, returns the effective sample size.

bal(
  model_list,
  D,
  C,
  common = FALSE,
  trim_w = FALSE,
  trim_quantile = 0.99,
  all_uni = TRUE,
  ...
)

`model_list`	character string identifying which methods to use when constructing weights. See details for a list of available models
`D`	numeric matrix of dimension n by m designating values of the exposures
`C`	either a list of numeric matrices of length m of dimension n by p_j designating values of the confounders for each exposure value or if `common` is TRUE a single matrix of of dimension n by p that represents common confounders for all exposures.
`common`	logical indicator for whether C is a single matrix of common confounders for all exposures. default is FALSE meaning C must be specified as list of confounders of length m.
`trim_w`	logical indicator for whether to trim weights. default is FALSE
`trim_quantile`	numeric scalar used to specify the upper quantile to trim weights if applicable. default is 0.99
`all_uni`	logical indicator. If TRUE then all univariate models specified in model_list will be estimated for each exposure. If FALSE will only estimate weights for the first exposure
`...`	additional arguments to pass to `weightit` function if specifying one of these models in the model_list

When using propensity score methods for causal inference it is crucial to check the balancing property of the covariates and exposure(s). To do this in the multivariate case we first use a weight generating method from the available list shown below.

Methods Available

"mvGPS": Multivariate generalized propensity score using Gaussian densities
"entropy": Estimating weights using entropy loss function without specifying propensity score \insertCitetbbicke2020entropymvGPS
"CBPS": Covariate balancing propensity score for continuous treatments which adds balance penalty while solving for propensity score parameters \insertCitefong2018mvGPS
"PS": Generalized propensity score estimated using univariate Gaussian densities
"GBM": Gradient boosting to estimate the mean function of the propensity score, but still maintains Gaussian distributional assumptions \insertCitezhu_boostingmvGPS

Note that only the mvGPS method is multivariate and all others are strictly univariate. For univariate methods weights are estimated for each exposure separately using the weightit function given the confounders for that exposure in C when all_uni=TRUE. To estimate weights for only the first exposure set all_uni=FALSE.

It is also important to note that the weights for each method can be trimmed at the desired quantile by setting trim_w=TRUE and setting trim_quantile in \[0.5, 1\]. Trimming is done at both the upper and lower bounds. For further details see mvGPS on how trimming is performed.

Balance Metrics

In this package we include three key balancing metrics to summarize balance across all of the exposures.

Euclidean distance
Maximum absolute correlation
Average absolute correlation

Euclidean distance is calculated using the origin point as reference, e.g. for m=2 exposures the reference point is \[0, 0\]. In this way we are calculating how far the observed set of correlation points are from perfect balance.

Maximum absolute correlation reports the largest single imbalance between the exposures and the set of confounders. It is often a key diagnostic as even a single confounder that is sufficiently out of balance can reduce performance.

Average absolute correlation is the sum of the exposure-confounder correlations. This metric summarizes how well, on average, the entire set of exposures is balanced.

Effective Sample Size

Effective sample size, ESS, is defined as

ESS=(Σ_i w_i)^{2}/Σ_i w_i^2,

where w_i are the estimated weights for a particular method \insertCitekish_essmvGPS. Note that when w=1 for all units that the ESS is equal to the sample size n. ESS decreases when there are extreme weights or high variability in the weights.

W: list of weights generated for each model
cor_list: list of weighted Pearson correlation coefficients for all confounders specified
bal_metrics: data.frame with the Euclidean distance, maximum absolute correlation, and average absolute correlation by method
ess: effective sample size for each of the methods used to generate weights
models: vector of models used

\insertAllCited

#simulating data
sim_dt <- gen_D(method="u", n=150, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2,
k=3, C_mu=rep(0, 3), C_cov=0.1, C_var=1, d1_beta=c(0.5, 1, 0),
d2_beta=c(0, 0.3, 0.75), seed=06112020)
D <- sim_dt$D
C <- sim_dt$C

#generating weights using mvGPS and potential univariate alternatives
require(WeightIt)
bal_sim <- bal(model_list=c("mvGPS", "entropy", "CBPS", "PS", "GBM"), D,
C=list(C[, 1:2], C[, 2:3]))

#overall summary statistics
bal_sim$bal_metrics

#effective sample sizes
bal_sim$ess

#we can also trim weights for all methods
bal_sim_trim <- bal(model_list=c("mvGPS", "entropy", "CBPS", "PS", "GBM"), D,
C=list(C[, 1:2], C[, 2:3]), trim_w=TRUE, trim_quantile=0.9, p.mean=0.5)
#note that in this case we can also pass additional arguments using in
#WeighIt package for entropy, CBPS, PS, and GBM such as specifying the p.mean

#can check to ensure all the weights have been properly trimmed at upper and
#lower bound
all.equal(unname(unlist(lapply(bal_sim$W, quantile, 0.99))),
unname(unlist(lapply(bal_sim_trim$W, max))))
all.equal(unname(unlist(lapply(bal_sim$W, quantile, 1-0.99))),
unname(unlist(lapply(bal_sim_trim$W, min))))