View source: R/U02-propensity-score.R
| estimate_ps | R Documentation |
Functions for estimating propensity scores for binary and multiple treatment groups. Estimate Propensity Scores
Fits a propensity score model and extracts propensity scores for binary or multiple treatment groups. For binary treatments, uses binomial logistic regression. For multiple treatments (>2 levels), uses multinomial logistic regression to estimate generalized propensity scores.
estimate_ps(data, treatment_var, ps_formula, ps_control = list())
data |
A data.frame containing the analysis data (typically the cleaned data with complete cases). |
treatment_var |
A character string specifying the name of the treatment
variable in |
ps_formula |
A formula object for the propensity score model, of the
form |
ps_control |
An optional list of control parameters to pass to the
model fitting function ( |
Propensity Score Definition: Returns P(Z = observed | X) for each individual, not P(Z=1|X) for all (as in Rosenbaum & Rubin 1983). This definition enables direct use in IPW and extends naturally to multiple treatments.
Binary Treatments (2 levels):
Fits binomial logistic regression via glm(). Treatment is factorized
with levels sorted by sort(): numerically for numeric, alphabetically
for character, by factor level order for factor. Returns P(Z = observed | X).
Multiple Treatments (>2 levels):
Fits multinomial logistic regression via nnet::multinom(). Returns
P(Z = observed | X) for each individual from the generalized PS matrix.
Control Parameters (ps_control):
Binary: glm.control() parameters (default: epsilon=1e-08, maxit=25)
Multiple: multinom() parameters (default: MaxNWts=10000, maxit=100, trace=FALSE)
A list with the following components:
ps_model |
The fitted propensity score model object (class |
ps |
A numeric vector of propensity scores representing the probability
of receiving the actual treatment each individual received. Length equals
the number of rows in |
ps_matrix |
A numeric matrix of dimension n × K where n is the number of observations and K is the number of treatment levels. Each row contains the predicted probabilities for all treatment levels. Column names correspond to treatment levels. |
n_levels |
An integer indicating the number of treatment levels. |
treatment_levels |
A vector of unique treatment values sorted by
|
# Example 1: Binary treatment
data(simdata_bin)
ps_bin <- estimate_ps(
data = simdata_bin,
treatment_var = "Z",
ps_formula = Z ~ X1 + X2 + X3 + B1 + B2
)
summary(ps_bin$ps)
table(simdata_bin$Z)
# Example 2: Multiple treatments
data(simdata_multi)
ps_multi <- estimate_ps(
data = simdata_multi,
treatment_var = "Z",
ps_formula = Z ~ X1 + X2 + X3 + B1 + B2
)
head(ps_multi$ps_matrix)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.