estimate_gps | R Documentation |
estimate_gps()
computes generalized propensity scores for
treatment groups by applying a user-defined formula and method. It returns
a matrix of GPS probabilities for each subject and treatment group
estimate_gps(
formula,
data = NULL,
method = "multinom",
link = NULL,
reference = NULL,
by = NULL,
subset = NULL,
ordinal_treat = NULL,
fit_object = FALSE,
verbose_output = FALSE,
...
)
formula |
a valid R formula, which describes the model used to
calculating the probabilities of receiving a treatment. The variable to be
balanced is on the left side, while the covariates used to predict the
treatment variable are on the right side. To define the interactions
between covariates, use |
data |
a data frame with columns specified in the |
method |
a single string describing the model used for the calculation
of generalized propensity scores. The default value is set to |
link |
a single string; determines an alternative model for a method used for estimation. For available links, see Details. |
reference |
a single string describing one class from the treatment variable, referred to as the baseline category in the calculation of generalized propensity scores. |
by |
a single string with the name of a column, contained in the |
subset |
a logical atomic vector of length equal to the number of rows
in the |
ordinal_treat |
an atomic vector of the length equal to the length of
unique levels of the treatment variable. Confirms, that the treatment
variable is an ordinal variable and adjusts its levels, to the order of
levels specified in the argument. Is a call to the function |
fit_object |
a logical flag. If |
verbose_output |
a logical flag. If |
... |
additional arguments, that can be passed to the fitting function and are not controlled by the above arguments. For more details and examples refer to the Details section and documentations of corresponding functions. |
The main goal of the estimate_gps()
function is to calculate the
generalized propensity scores aka. treatment allocation probabilities. It
is the first step in the workflow of the vector matching algorithm and is
essential for the further analysis. The returned matrix of class gps
can
then be passed to the csregion()
function to calculate the rectangular
common support region boundaries and drop samples not eligible for the
further analysis. The list of available methods operated by the
estimate_gps()
is provided below with a short description and function
used for the calculations:
multinom
- multinomial logistic regression model nnet::multinom()
vglm
- vector generalized linear model for multinomial data
VGAM::vglm()
,
brglm2
- bias reduction model for multinomial responses using the
Poisson trick brglm2::brmultinom()
,
mblogit
- baseline-category logit models mclogit::mblogit()
.
polr
- ordered logistic or probit regression only for ordered factor
variables from MASS::polr()
. The method
argument of the underlying
MASS::polr()
package function can be controlled with the link
argument.
Available options: link = c("logistic", "probit", "loglog", "cloglog", "cauchit")
A numeric matrix of class gps
with the number of columns equal to
the number of unique treatment variable levels plus one (for the treatment
variable itself) and the number of row equal to the number of subjects in
the initial dataset. The original dataset used for estimation can be
accessed as original_data
attribute.
csregion()
for the calculation of common support region,
match_gps()
for the matching of generalized propensity scores
library("brglm2")
# Conducting covariate balancing on the `airquality` dataset. Our goal was to
# compare ozone levels by month, but we discovered that ozone levels are
# strongly correlated with wind intensity (measured in mph), and the average
# wind intensity varies across months. Therefore, we need to balance the
# months by wind values to ensure a valid comparison of ozone levels.
# Initial imbalance of means
tapply(airquality$Wind, airquality$Month, mean)
# Formula definition
formula_air <- formula(Month ~ Wind)
# Estimating the generalized propensity scores using brglm2 method using
# maximum penalized likelihood estimators with powers of the Jeffreys
gp_scores <- estimate_gps(formula_air,
data = airquality, method = "brglm2",
reference = "5", verbose_output = TRUE,
control = brglmControl(type = "MPL_Jeffreys")
)
# Filtering the observations outside the csr region
gps_csr <- csregion(gp_scores)
# Calculating imbalance after csr
filter_which <- attr(gps_csr, "filter_vector")
filtered_air <- airquality[filter_which, ]
tapply(filtered_air$Wind, filtered_air$Month, mean)
# We can also investigate the imbalance using the raincloud function
raincloud(filtered_air,
y = Wind,
group = Month,
significance = "t_test"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.