inspect_balance: Inspect balance of covariates.
In leoguelman/uplift2: Uplift Modeling

inspect_balance

R Documentation

Inspect balance of covariates.

Description

inspect_balance calculates standardized differences for each covariate between two treatment levels, and tests for conditional independence between the treatment and the covariates.

Usage

## S3 method for class 'formula'
inspect_balance(formula, data, method = "dev",
  nPerm = NULL, midpval = TRUE, na.rm = FALSE, treatLevel = NULL, ...)

## S3 method for class 'inspect_balance'
print(x, ...)

## S3 method for class 'inspect_balance'
summary(object, ...)

Arguments

`formula`	A formula containing an indicator of treatment assignment on the left hand side and covariates on the right. The treatment indicator should be numeric with 0/1 values.
`data`	A data frame in which to interpret the variables named in the formula.
`method`	The method used to compute a p-value associated with the balance test. See details.
`nPerm`	The number of random permutations of the treatment assignment. Only applicable with methods `"pdev"` and `"paic"`.
`midpval`	Should the mid p-value be used?
`na.rm`	Should observations with NAs on any variables named in the RHS of formula be removed from the covariate balance summary table?
`treatLevel`	A character string for the treatment level of interest. By default, the treatment is coerced to a factor and the last level is used as the `treatLevel`. This argument is only relevant for calculating the standardized bias of covariates.
`...`	Additional arguments passed to the various methods. Specifically, for methods `"dev"` and `"pdev"`, arguments are passed to `stats::glm`. For `"paic"`, arguments are passed to `brglm::brglm`, and for `"hansen"` they are passed to `RItools::xBalance`.
`x`	A `inspect_balance` object.
`object`	A `inspect_balance` object.

Details

In randomized experiments, the assignment of subjects to treatment and control groups is independent of their baseline covariates. As the sample size grows, random assignment tends to balance covariates, in the sense that both groups have similar distributions of covariates. Following Rosenbaum and Rubin (1985), we define the standardized bias on a covariate as

\frac{\bar{x}_t-\bar{x}_c}{√{\frac{s_t^2 + s_c^2}{2}}}

where \bar{x_t} and \bar{x_c} represent the sample means of a covariate in the treated and control groups, respectively, and s_t^2 and s_c^2 reresent their sample variances.

Another way to think about balance is that covariates X should have no predictive power for treatment assignment Z. That is, Prob(Z|X) = Prob(Z). Logistic regression is well suited for this task. If method = "dev" (default), we follow the approach suggested by Imai (2005). First regress treatment assignment Z on the covariates X and a constant, then on a constant alone, and then compare the two fits using a standard asymptotic likelihood-ratio test. This test is likely to perform poorly (i.e., high Type I error rates) in small samples (see Hansen, 2008). If method = "pdev", we compute a permutation distribution of the likelihood ratio statistic between the two models and compare it to the observe test statistic to obtain a p-value. Models are fitted using standard logistic regression. If method = "paic", the test statistic is given by the difference in AIC between the two models. A permutation distribution of this test statistic is computed and compared to its observed value to obtain a p-value for the test. Models are fitted using penalized likelihood using Jeffreys prior (Firth, 1993). Finally, if method = "hansen", p-values are computed using RItools::xBalance (Hansen, 2008).

We note that balance tests of this kind are subject to criticism, since balance is a characteristic of the sample, not some hypothetical population (see Ho et al., 2007).

Value

An object of class inspect_balance, which is a list with the following components

fit The fitted model object (NULL for method ="hansen").
pvalue The p-value of the test.
nObs The number of observations used by the procedure.
cbs The covariate balance summary table.
pdata The underlying data used in ggplot.inspect_balance.
treatLevel The treatment level of interest.
yLabel The name of the treatment indicator.
call The call to inspect_balance.

Author(s)

Leo Guelman leo.guelman@gmail.com

References

Firth, D. (1993). "Bias reduction of maximum likelihood estimates". Biometrika 80, pp.27-38

Hansen, B.B. and Bowers, J. (2008)."Covariate Balance in Simple, Stratified and Clustered Comparative Studies". Statistical Science, 23, pp.219–236.

Ho, D., Kosuke I., King, G. and Stuart, E. (2007). "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference". Political Analysis 15, pp.199–236.

Kosuke, I. (2005). "Do Get-Out-The-Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments". American Political Science Review, Vol. 99, No. 2 (May), pp. 283–300.

Rosenbaum, P.R. and Rubin, D.B. (1985). "Constructing a control group using multivariate matched sampling methods that incorporate the propensity score". The American Statistician, 39, pp.33–38.

Examples

set.seed(343)
df <- sim_uplift(n = 200, p = 50, response = "binary")
df$T <- ifelse(df$T == 1, 1, 0)
ib <- inspect_balance(T~ X1 + X2 + X3, data = df, method ="pdev", nPerm = 500)
ib
summary(ib)

leoguelman/uplift2 documentation built on April 15, 2022, 4:34 a.m.