View source: R/inspect_balance.R
inspect_balance | R Documentation |
inspect_balance
calculates standardized differences for each covariate
between two treatment levels, and tests for conditional independence between
the treatment and the covariates.
## S3 method for class 'formula' inspect_balance(formula, data, method = "dev", nPerm = NULL, midpval = TRUE, na.rm = FALSE, treatLevel = NULL, ...) ## S3 method for class 'inspect_balance' print(x, ...) ## S3 method for class 'inspect_balance' summary(object, ...)
formula |
A formula containing an indicator of treatment assignment on the left hand side and covariates on the right. The treatment indicator should be numeric with 0/1 values. |
data |
A data frame in which to interpret the variables named in the formula. |
method |
The method used to compute a p-value associated with the balance test. See details. |
nPerm |
The number of random permutations of the treatment assignment.
Only applicable with methods |
midpval |
Should the mid p-value be used? |
na.rm |
Should observations with NAs on any variables named in the RHS of formula be removed from the covariate balance summary table? |
treatLevel |
A character string for the treatment level of interest. By
default, the treatment is coerced to a factor and the last level is used as
the |
... |
Additional arguments passed to the various methods. Specifically,
for methods |
x |
A |
object |
A |
In randomized experiments, the assignment of subjects to treatment and control groups is independent of their baseline covariates. As the sample size grows, random assignment tends to balance covariates, in the sense that both groups have similar distributions of covariates. Following Rosenbaum and Rubin (1985), we define the standardized bias on a covariate as
\frac{\bar{x}_t-\bar{x}_c}{√{\frac{s_t^2 + s_c^2}{2}}}
where \bar{x_t} and \bar{x_c} represent the sample means of a covariate in the treated and control groups, respectively, and s_t^2 and s_c^2 reresent their sample variances.
Another way to think about balance is that covariates X should have no
predictive power for treatment assignment Z. That is, Prob(Z|X) =
Prob(Z). Logistic regression is well suited for this task. If method =
"dev"
(default), we follow the approach suggested by Imai (2005). First
regress treatment assignment Z on the covariates X and a constant,
then on a constant alone, and then compare the two fits using a standard
asymptotic likelihood-ratio test. This test is likely to perform poorly (i.e.,
high Type I error rates) in small samples (see Hansen, 2008). If method
= "pdev"
, we compute a permutation distribution of the likelihood ratio
statistic between the two models and compare it to the observe test statistic
to obtain a p-value. Models are fitted using standard logistic regression. If
method = "paic"
, the test statistic is given by the difference in AIC
between the two models. A permutation distribution of this test statistic is
computed and compared to its observed value to obtain a p-value for the test.
Models are fitted using penalized likelihood using Jeffreys prior (Firth,
1993). Finally, if method = "hansen"
, p-values are computed using
RItools::xBalance
(Hansen, 2008).
We note that balance tests of this kind are subject to criticism, since balance is a characteristic of the sample, not some hypothetical population (see Ho et al., 2007).
An object of class inspect_balance
, which is a list with the
following components
fit
The fitted model object
(NULL
for method ="hansen"
).
pvalue
The p-value
of the test.
nObs
The number of observations used by the
procedure.
cbs
The covariate balance summary table.
pdata
The underlying data used in ggplot.inspect_balance
.
treatLevel
The treatment level of interest.
yLabel
The name of the treatment indicator.
call
The call to
inspect_balance
.
Leo Guelman leo.guelman@gmail.com
Firth, D. (1993). "Bias reduction of maximum likelihood estimates". Biometrika 80, pp.27-38
Hansen, B.B. and Bowers, J. (2008)."Covariate Balance in Simple, Stratified and Clustered Comparative Studies". Statistical Science, 23, pp.219–236.
Ho, D., Kosuke I., King, G. and Stuart, E. (2007). "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference". Political Analysis 15, pp.199–236.
Kosuke, I. (2005). "Do Get-Out-The-Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments". American Political Science Review, Vol. 99, No. 2 (May), pp. 283–300.
Rosenbaum, P.R. and Rubin, D.B. (1985). "Constructing a control group using multivariate matched sampling methods that incorporate the propensity score". The American Statistician, 39, pp.33–38.
ggplot.inspect_balance
.
set.seed(343) df <- sim_uplift(n = 200, p = 50, response = "binary") df$T <- ifelse(df$T == 1, 1, 0) ib <- inspect_balance(T~ X1 + X2 + X3, data = df, method ="pdev", nPerm = 500) ib summary(ib)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.