ps_trim: Trim Propensity Scores

View source: R/ps_trim.R

ps_trimR Documentation

Trim Propensity Scores

Description

Trim observations with extreme propensity scores by replacing them with NA, effectively removing those units from downstream analyses. The returned object has the same length (or dimensions) as the input, with trimmed entries set to NA. After trimming, refit the propensity score model on the retained observations with ps_refit().

Usage

ps_trim(
  ps,
  method = c("ps", "adaptive", "pctl", "pref", "cr", "optimal"),
  lower = NULL,
  upper = NULL,
  .exposure = NULL,
  .focal_level = NULL,
  .reference_level = NULL,
  ...,
  .treated = NULL,
  .untreated = NULL
)

Arguments

ps

A numeric vector of propensity scores in (0, 1) for binary exposures, or a matrix / data frame where each column gives the propensity score for one level of a categorical exposure.

method

Trimming method. One of:

  • "ps" (default): Fixed threshold. Observations with propensity scores outside ⁠[lower, upper]⁠ are trimmed. For categorical exposures, observations where any column falls below lower (the symmetric threshold delta) are trimmed.

  • "adaptive": Data-driven threshold that minimizes the asymptotic variance of the IPW estimator (Crump et al., 2009). The lower and upper arguments are ignored.

  • "pctl": Quantile-based. Observations outside the ⁠[lower, upper]⁠ quantiles of the propensity score distribution are trimmed. Defaults: lower = 0.05, upper = 0.95.

  • "pref": Preference score trimming. Transforms propensity scores to the preference scale (Walker et al., 2013) and trims outside ⁠[lower, upper]⁠. Requires .exposure. Binary exposures only. Defaults: lower = 0.3, upper = 0.7.

  • "cr": Common range (clinical equipoise). Trims to the overlap region of the propensity score distributions across exposure groups. Requires .exposure. Binary exposures only. The lower and upper arguments are ignored.

  • "optimal": Multi-category optimal trimming (Yang et al., 2016). Categorical exposures only. Requires .exposure.

For categorical exposures, only "ps" and "optimal" are supported.

lower, upper

Numeric thresholds whose interpretation depends on method:

  • "ps": absolute propensity score bounds (defaults: 0.1, 0.9). For categorical exposures, only lower is used as the symmetric threshold.

  • "pctl": quantile probabilities (defaults: 0.05, 0.95).

  • "pref": preference score bounds (defaults: 0.3, 0.7).

  • "adaptive", "cr", "optimal": ignored (thresholds are data-driven).

.exposure

An exposure variable. Required for "pref", "cr" (binary vector), and "optimal" (factor or character). Not required for other methods.

.focal_level

The value of .exposure representing the focal (treated) group. For binary exposures, defaults to the higher value. Required for wt_att() and wt_atu() with categorical exposures.

.reference_level

The value of .exposure representing the reference (control) group. Automatically detected if not supplied.

...

Additional arguments passed to methods.

.treated

[Deprecated] Use .focal_level instead.

.untreated

[Deprecated] Use .reference_level instead.

Details

How trimming works

Trimming identifies observations with extreme (near 0 or 1) propensity scores and sets them to NA. These observations are excluded from subsequent weight calculations and effect estimation. The goal is to remove units that lack sufficient overlap between exposure groups, which would otherwise receive extreme weights and destabilize estimates.

Choosing a method

  • Use "ps" when you have a specific threshold in mind or want a simple default.

  • Use "adaptive" for a principled, data-driven cutoff that targets variance reduction.

  • Use "pctl" to trim a fixed percentage of extreme values from each tail.

  • Use "pref" when you want to restrict to the region of clinical equipoise based on the preference score.

  • Use "cr" to restrict to the common support region where both exposure groups have observed propensity scores.

  • Use "optimal" for multi-category (3+) exposures; this is the only data-driven method available for categorical treatments.

Typical workflow

  1. Fit a propensity score model

  2. Apply ps_trim() to flag extreme values

  3. Call ps_refit() to re-estimate propensity scores on the retained sample

  4. Compute weights with wt_ate() or another weight function

Object behavior

Arithmetic operations on ps_trim objects return plain numeric vectors, since transformed propensity scores (e.g., 1/ps) are no longer propensity scores. Trimmed values propagate as NA in calculations; use na.rm = TRUE where appropriate.

When combining ps_trim objects with c(), trimming parameters must match. Mismatched parameters trigger a warning and return a numeric vector.

Use ps_trim_meta() to inspect the trimming metadata, including the method, cutoffs, and which observations were retained or trimmed.

Value

A ps_trim object (a numeric vector with class "ps_trim", or a matrix with class "ps_trim_matrix"). Trimmed observations are NA. Metadata is stored in the "ps_trim_meta" attribute and can be accessed with ps_trim_meta(). Key fields include:

  • method: the trimming method used

  • keep_idx: integer indices of retained observations

  • trimmed_idx: integer indices of trimmed (NA) observations

  • Method-specific fields such as cutoff (adaptive), q_lower/q_upper (pctl), cr_lower/cr_upper (cr), delta (categorical ps), or lambda (optimal)

References

Crump, R. K., Hotz, V. J., Imbens, G. W., & Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187–199.

Walker, A. M., Patrick, A. R., Lauer, M. S., et al. (2013). A tool for assessing the feasibility of comparative effectiveness research. Comparative Effectiveness Research, 3, 11–20.

Yang, S., Imbens, G. W., Cui, Z., Faries, D. E., & Kadziola, Z. (2016). Propensity score matching and subclassification in observational studies with multi-level treatments. Biometrics, 72(4), 1055–1065.

See Also

ps_trunc() for bounding (winsorizing) instead of discarding, ps_refit() to re-estimate propensity scores after trimming, ps_calibrate() for calibration-based adjustment, ps_trim_meta() to inspect trimming metadata, is_ps_trimmed() and is_unit_trimmed() for logical queries.

Examples

set.seed(2)
n <- 300
x <- rnorm(n)
z <- rbinom(n, 1, plogis(1.3 * x))
fit <- glm(z ~ x, family = binomial)
ps <- predict(fit, type = "response")

# Fixed threshold trimming (default)
trimmed <- ps_trim(ps, method = "ps", lower = 0.1, upper = 0.9)
trimmed

# How many observations were trimmed?
sum(is_unit_trimmed(trimmed))

# Data-driven adaptive trimming
ps_trim(ps, method = "adaptive")

# Quantile-based trimming at 5th and 95th percentiles
ps_trim(ps, method = "pctl")

# Refit after trimming, then compute weights
trimmed <- ps_trim(ps, method = "adaptive")
refitted <- ps_refit(trimmed, fit)
wt_ate(refitted, .exposure = z)


propensity documentation built on March 3, 2026, 1:06 a.m.