match_gps | R Documentation |
The match_gps()
function performs sample matching based on
generalized propensity scores (GPS). It utilizes the k-means clustering
algorithm to partition the data into clusters and subsequently matches all
treatment groups within these clusters. This approach ensures efficient and
structured comparisons across treatment levels while accounting for the
propensity score distribution.
match_gps(
csmatrix = NULL,
method = "nnm",
caliper = 0.2,
reference = NULL,
ratio = NULL,
replace = NULL,
order = NULL,
ties = NULL,
min_controls = NULL,
max_controls = NULL,
kmeans_args = NULL,
kmeans_cluster = 5,
verbose_output = FALSE,
...
)
csmatrix |
An object of class |
method |
A single string specifying the matching method to use. The
default is |
caliper |
A numeric value specifying the caliper width, which defines
the allowable range within which observations can be matched. It is
expressed as a percentage of the standard deviation of the
logit-transformed generalized propensity scores. To perform matching
without a caliper, set this parameter to a very large value. For exact
matching, set |
reference |
A single string specifying the exact level of the treatment variable to be used as the reference in the matching process. All other treatment levels will be matched to this reference level. Ideally, this should be the control level. If no natural control is present, avoid selecting a level with extremely low or high covariate or propensity score values. Instead, choose a level with covariate or propensity score distributions that are centrally positioned among all treatment groups to maximize the number of matches. |
ratio |
A scalar for the number of matches which should be found for
each control observation. The default is one-to-one matching. Only
available for the methods |
replace |
Logical value indicating whether matching should be done with
replacement. If |
order |
A string specifying the order in which logit-transformed GPS values are sorted before matching. The available options are:
|
ties |
A logical flag indicating how tied matches should be handled.
Available only for the |
min_controls |
The minimum number of treatment observations that should
be matched to each control observation. Available only for the |
max_controls |
The maximum number of treatment observations that can be
matched to each control observation. Available only for the |
kmeans_args |
A list of arguments to pass to stats::kmeans. These
arguments must be provided inside a |
kmeans_cluster |
An integer specifying the number of clusters to pass to stats::kmeans. |
verbose_output |
a logical flag. If |
... |
Additional arguments to be passed to the matching function. |
Propensity score matching can be performed using various matching
algorithms. Lopez and Gutman (2017) do not explicitly specify the matching
algorithm used, but it is assumed they applied the commonly used k-nearest
neighbors matching algorithm, implemented as method = "nnm"
. However,
this algorithm can sometimes be challenging to use, especially when
treatment and control groups have unequal sizes. When replace = FALSE
,
the number of matches is strictly limited by the smaller group, and even
with replace = TRUE
, the results may not always be satisfactory. To
address these limitations, we have implemented an additional matching
algorithm to maximize the number of matched observations within a dataset.
The available matching methods are:
"nnm"
– classic k-nearest neighbors matching, implemented using
Matching::Matchby()
. The tunable parameters in match_gps()
are
caliper
, ratio
, replace
, order
, and ties
. Additional arguments
can be passed to Matching::Matchby()
via the ...
argument.
"fullopt"
– optimal full matching algorithm, implemented with
optmatch::fullmatch()
. This method calculates a discrepancy matrix to
identify all possible matches, often optimizing the percentage of matched
observations. The available tuning parameters are caliper
,
min_controls
, and max_controls
.
"pairmatch"
– optimal 1:1 and 1:k matching algorithm, implemented using
optmatch::pairmatch()
, which is actually a wrapper around
optmatch::fullmatch()
. Like "fullopt"
, this method calculates a
discrepancy matrix and finds matches that minimize its sum. The available
tuning parameters are caliper
and ratio
.
A data.frame
similar to the one provided as the data
argument in
the estimate_gps()
function, containing the same columns but only the
observations for which a match was found. The returned object includes two
attributes, accessible with the attr()
function:
original_data
: A data.frame
with the original data returned by the
csregion()
or estimate_gps()
function, after the estimation of the csr
and filtering out observations not within the csr.
matching_filter
: A logical vector indicating which rows from
original_data
were included in the final matched dataset.
Michael J. Lopez, Roee Gutman "Estimation of Causal Effects with Multiple Treatments: A Review and New Ideas," Statistical Science, Statist. Sci. 32(3), 432-454, (August 2017)
estimate_gps()
for the calculation of generalized propensity
scores; MatchIt::matchit()
, optmatch::fullmatch()
and
optmatch::pairmatch()
for the documentation of the matching functions;
stats::kmeans()
for the documentation of the k-Means algorithm.
# Defining the formula used for gps estimation
formula_cancer <- formula(status ~ age + sex)
# Step 1.) Estimation of the generalized propensity scores
gp_scores <- estimate_gps(formula_cancer,
data = cancer,
method = "multinom",
reference = "control",
verbose_output = TRUE
)
# Step 2.) Defining the common support region
gps_csr <- csregion(gp_scores)
# Step 3.) Matching the gps
matched_cancer <- match_gps(gps_csr,
caliper = 0.25,
reference = "control",
method = "fullopt",
kmeans_cluster = 2,
kmeans_args = list(
iter.max = 200,
algorithm = "Forgy"
),
verbose_output = TRUE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.