balanceTest: Standardized Differences for Stratified Comparisons

Description Usage Arguments Details Value Note Author(s) References Examples

View source: R/balanceTest.R

Description

Covariate balance, with treatment/covariate association tests

Usage

1
2
3
4
5
balanceTest(fmla, data, strata = NULL, report = c("std.diffs", "z.scores",
  "adj.means", "adj.mean.diffs", "chisquare.test", "p.values", "all")[1:2],
  unit.weights, stratum.weights = harmonic_times_mean_weight, subset,
  include.NA.flags = TRUE, covariate.scaling = NULL,
  post.alignment.transform = NULL, p.adjust.method = "holm")

Arguments

fmla

A formula containing an indicator of treatment assignment on the left hand side and covariates at right.

data

A data frame in which fmla and strata are to be evaluated.

strata

A list of right-hand-side-only formulas containing the factor(s) identifying the strata, with NULL entries interpreted as no stratification; or a factor with length equal to the number of rows in data; or a data frame of such factors. See below for examples.

report

Character vector listing measures to report for each stratification; a subset of c("adj.means", "adj.mean.diffs", "chisquare.test", "std.diffs", "z.scores", "p.values", "all"). P-values reported are two-sided for the null-hypothesis of no effect. The option "all" requests all measures.

unit.weights

Per-unit weight, or 0 if unit does not meet condition specified by subset argument. If there are clusters, the cluster weight is the sum of unit weights of elements within the cluster. Within each stratum, unit weights will be normalized to sum to the number of clusters in the stratum.

stratum.weights

Function returning non-negative weight for each stratum; see details.

subset

Optional: condition or vector specifying a subset of observations to be permitted to have positive unit weights.

include.NA.flags

Present item missingness comparisons as well as covariates themselves?

covariate.scaling

A scale factor to apply to covariates in calculating std.diffs (currently ignored).

post.alignment.transform

Optional transformation applied to covariates just after their stratum means are subtracted off.

p.adjust.method

Method of p-value adjustment.

Details

Given a grouping variable (treatment assignment, exposure status, etc) and variables on which to compare the groups, compare averages across groups and test hypothesis of no selection into groups on the basis of that variable. The multivariate test is the method of combined differences discussed by Hansen and Bowers (2008, Statist. Sci.), a variant of Hotelling's T-squared test; the univariate tests are presented with multiplicity adjustments, the details of which can be controlled by the user. Clustering, weighting and/or stratification variables can be provided, and are addressed by the tests.

The function assembles various univariate descriptive statistics for the groups to be compared: (weighted) means of treatment and control groups; differences of these (adjusted differences); and adjusted differences as multiples of a pooled S.D. of the variable in the treatment and control groups (standard differences). This is done separately for each provided stratifying factor and, by default, for the unstratified comparison, in each case reflecting a standardization appropriate to the designated (post-) stratification of the sample. In the case without stratification or clustering, the only weighting used to calculate treatment and control group means is that provided by the user as unit.weights; in the absence of such an argument, these means are unweighted. When there are strata, within-stratum means of treatment or of control observations are calculated using unit.weights, if provided, and then these are combined across strata according to a ‘effect of treatment on treated’-type weighting scheme. (The function's stratum.weights argument figures in the function's inferential calculations but not these descriptive calculations.) To figure a stratum's effect of treatment on treated weight, the sum of all unit.weights associated with treatment or control group observations within the stratum is multiplied by the fraction of clusters in that stratum that are associated with the treatment rather than the control condition. (Unless this fraction is 0 or 1, in which case the stratum is downweighted to 0.)

The function also calculates univariate and multivariate inferential statistics, targeting the hypothesis that assignment was random within strata. These calculations also pool unit.weights-weighted, within-stratum group means across strata, but the default weighting of strata differs from that of the descriptive calculations. With stratum.weights=harmonic_times_mean_weight (the default), each stratum is weighted in proportion to the product of the stratum mean of unit.weights and the harmonic mean 1/[(1/a + 1/b)/2]=2*a*b/(a+b) of the number of treated units (a) and control units (b) in the stratum; this weighting is optimal under certain modeling assumptions (discussed in Kalton 1968 and Hansen and Bowers 2008, Sections 3.2 and 5). The multivariate assessment is based on a Mahalanobis-type distance that combines each of the univariate mean differences while accounting for correlations among them. It's similar to the Hotelling's T-squared statistic, except standarized using a permutation covariance. See Hansen and Bowers (2008).

In contrast to the earlier function xBalance that it is intended to replace, balanceTest accepts only binary assignment variables (for now).

stratum.weights must be a function of a single argument, a data frame containing the variables in data and additionally Tx.grp, stratum.code, and unit.weights, returning a named numeric vector of non-negative weights identified by stratum. (For an example, enter getFromNamespace("harmonic", "RItools").) the data stratum.weights function.

If the stratifying factor has NAs, these cases are dropped. On the other hand, if NAs in a covariate are found then those observations are dropped for descriptive calculations and "imputed" to the stratum mean of the variable for inferential calculations. When covariate values are dropped due to missingness, proportions of observations not missing on that variable are recorded and returned. The printed output presents non-missing proportions alongside of the variables themselves, distinguishing the former by placing them at the bottom of the list and enclosing the variable's name in parentheses. If a variable shares a missingness pattern with other another variable, its missingness information may be labeled with the name of the other variable in the output.

Value

An object of class c("xbal", "list"). There are plot, print, and xtable methods for class "xbal"; the print method is demonstrated in the examples.

Note

Evidence pertaining to the hypothesis that a treatment variable is not associated with differences in covariate values is assessed by comparing the differences of means, without standardization, to their distributions under hypothetical shuffles of the treatment variable, a permutation or randomization distribution. For the unstratified comparison, this reference distribution consists of differences as the treatment assignments of clusters are freely permuted. For stratified comparisons, the reference distributions describes re-randomizations of this type performed separately in each stratum. Significance assessments are based on the large-sample Normal approximation to these reference distributions.

Author(s)

Ben Hansen and Jake Bowers and Mark Fredrickson

References

Hansen, B.B. and Bowers, J. (2008), “Covariate Balance in Simple, Stratified and Clustered Comparative Studies,” Statistical Science 23.

Kalton, G. (1968), “Standardization: A technique to control for extraneous variables,” Applied Statistics 17, 118–136.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
data(nuclearplants)
##No strata, default output
balanceTest(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
         data=nuclearplants)

##No strata, all output
balanceTest(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
         data=nuclearplants,
         report=c("all"))

##Stratified, all output
balanceTest(pr~.-cost-pt + strata(pt),
         data=nuclearplants,
         report=c("adj.means", "adj.mean.diffs",
                  "chisquare.test", "std.diffs",
                  "z.scores", "p.values"))

##Comparing unstratified to stratified, just adjusted means and
#omnibus test
balanceTest(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n + strata(pt),
         data=nuclearplants,
         report=c("adj.means", "chisquare.test"))

##Comparing unstratified to stratified, just adjusted means and
#omnibus test
balanceTest(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n + strata(pt),
         data=nuclearplants,
         report=c("adj.means", "chisquare.test"))

##Missing data handling.
testdata<-nuclearplants
testdata$date[testdata$date<68]<-NA



##Comparing unstratified to stratified, just one-by-one wilcoxon
#rank sum tests and omnibus test of multivariate differences on
#rank scale.
balanceTest(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n + strata(pt),
         data=nuclearplants,
         report=c("adj.means", "chisquare.test"),
	 post.alignment.transform=rank)

markmfredrickson/RItools documentation built on Oct. 3, 2018, 1:07 p.m.