View source: R/summary.matchit.R
summary.matchit | R Documentation |
matchit
objectComputes and prints balance statistics for matchit
and
matchit.subclass
objects. Balance should be assessed to ensure the
matching or subclassification was effective at eliminating treatment group
imbalance and should be reported in the write-up of the results of the
analysis.
## S3 method for class 'matchit'
summary(
object,
interactions = FALSE,
addlvariables = NULL,
standardize = TRUE,
data = NULL,
pair.dist = TRUE,
un = TRUE,
improvement = FALSE,
...
)
## S3 method for class 'matchit.subclass'
summary(
object,
interactions = FALSE,
addlvariables = NULL,
standardize = TRUE,
data = NULL,
pair.dist = FALSE,
subclass = FALSE,
un = TRUE,
improvement = FALSE,
...
)
## S3 method for class 'summary.matchit'
print(x, digits = max(3, getOption("digits") - 3), ...)
object |
a |
interactions |
|
addlvariables |
additional variable for which balance statistics are to
be computed along with the covariates in the |
standardize |
|
data |
a optional data frame containing variables named in
|
pair.dist |
|
un |
|
improvement |
|
... |
ignored. |
subclass |
after subclassification, whether to display balance for
individual subclasses, and, if so, for which ones. Can be |
x |
a |
digits |
the number of digits to round balance statistics to. |
summary()
computes a balance summary of a matchit
object. This
include balance before and after matching or subclassification, as well as
the percent improvement in balance. The variables for which balance
statistics are computed are those included in the formula
,
exact
, and mahvars
arguments to matchit()
, as well as the
distance measure if distance
is was supplied as a numeric vector or
method of estimating propensity scores. The X
component of the
matchit
object is used to supply the covariates.
The standardized mean differences are computed both before and after
matching or subclassification as the difference in treatment group means
divided by a standardization factor computed in the unmatched (original)
sample. The standardization factor depends on the argument supplied to
estimand
in matchit()
: for "ATT"
, it is the standard
deviation in the treated group; for "ATC"
, it is the standard
deviation in the control group; for "ATE"
, it is the square root of
the average of the variances within each treatment group. The post-matching
mean difference is computed with weighted means in the treatment groups
using the matching or subclassification weights.
The variance ratio is computed as the ratio of the treatment group
variances. Variance ratios are not computed for binary variables because
their variance is a function solely of their mean. After matching, weighted
variances are computed using the formula used in cov.wt()
. The percent
reduction in bias is computed using the log of the variance ratios.
The eCDF difference statistics are computed by creating a (weighted) eCDF for each group and taking the difference between them for each covariate value. The eCDF is a function that outputs the (weighted) proportion of units with covariate values at or lower than the input value. The maximum eCDF difference is the same thing as the Kolmogorov-Smirnov statistic. The values are bounded at zero and one, with values closer to zero indicating good overlap between the covariate distributions in the treated and control groups. For binary variables, all eCDF differences are equal to the (weighted) difference in proportion and are computed that way.
The QQ difference statistics are computed by creating two samples of the same size by interpolating the values of the larger one. The values are arranged in order for each sample. The QQ difference for each quantile is the difference between the observed covariate values at that quantile between the two groups. The difference is on the scale of the original covariate. Values close to zero indicate good overlap between the covariate distributions in the treated and control groups. A weighted interpolation is used for post-matching QQ differences. For binary variables, all QQ differences are equal to the (weighted) difference in proportion and are computed that way.
The pair distance is the average of the absolute differences of a variable
between pairs. For example, if a treated unit was paired with four control
units, that set of units would contribute four absolute differences to the
average. Within a subclass, each combination of treated and control unit
forms a pair that contributes once to the average. The pair distance is
described in Stuart and Green (2008) and is the value that is minimized when
using optimal (full) matching. When standardize = TRUE
, the
standardized versions of the variables are used, where the standardization
factor is as described above for the standardized mean differences. Pair
distances are not computed in the unmatched sample (because there are no
pairs). Because pair distance can take a while to compute, especially with
large datasets or for many covariates, setting pair.dist = FALSE
is
one way to speed up summary()
.
The effective sample size (ESS) is a measure of the size of a hypothetical unweighted sample with roughly the same precision as a weighted sample. When non-uniform matching weights are computed (e.g., as a result of full matching, matching with replacement, or subclassification), the ESS can be used to quantify the potential precision remaining in the matched sample. The ESS will always be less than or equal to the matched sample size, reflecting the loss in precision due to using the weights. With non-uniform weights, it is printed in the sample size table; otherwise, it is removed because it does not contain additional information above the matched sample size.
After subclassification, the aggregate balance statistics are computed using the subclassification weights rather than averaging across subclasses.
All balance statistics (except pair differences) are computed incorporating
the sampling weights supplied to matchit()
, if any. The unadjusted
balance statistics include the sampling weights and the adjusted balance
statistics use the matching weights multiplied by the sampling weights.
When printing, NA
values are replaced with periods (.
), and
the pair distance column in the unmatched and percent balance improvement
components of the output are omitted.
For matchit
objects, a summary.matchit
object, which
is a list with the following components:
call |
the original call to |
nn |
a matrix of the sample sizes in the original (unmatched) and matched samples |
sum.all |
if |
sum.matched |
a matrix of balance statistics for each covariate in the matched sample |
reduction |
if |
For match.subclass
objects, a summary.matchit.subclass
object,
which is a list as above containing the following components:
call |
the original call to |
sum.all |
if |
sum.subclass |
if |
sum.across |
a matrix of balance statistics for each covariate computed using the subclassification weights |
reduction |
if |
qn |
a matrix of sample sizes within each subclass |
nn |
a matrix of the sample sizes in the original (unmatched) and matched samples |
summary()
for the generic method; plot.summary.matchit()
for
making a Love plot from summary()
output.
cobaltbal.tab.matchit, which also displays balance for matchit
objects.
data("lalonde")
m.out <- matchit(treat ~ age + educ + married +
race + re74, data = lalonde,
method = "nearest", exact = ~ married,
replace = TRUE)
summary(m.out, interactions = TRUE)
s.out <- matchit(treat ~ age + educ + married +
race + nodegree + re74 + re75,
data = lalonde, method = "subclass")
summary(s.out, addlvariables = ~log(age) + I(re74==0))
summary(s.out, subclass = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.