PSPI_generalizability: Propensity Scores Predictive Inference for Generalizability

View source: R/MCMC_PSPI_R.R

PSPI_generalizabilityR Documentation

Propensity Scores Predictive Inference for Generalizability

Description

This is the main function of the PSPI package. It runs Bayesian models that generalize findings from a clinical trial to a target population, estimating the average treatment effects and potential outcomes. Propensity scores of trial participation play the central role for generalizability analysis. When covariate shift is an issue, we recommend PSPI-SplineBART and PSPI-DSplineBART, which leveraging Bayesian Additive Regression Trees (BART) to model high-dimensional covariates, and propensity scores based splines to extrapolate smoothly.

Users provide trial data (covariates, outcomes, treatment, and propensity scores) along with population-level covariates and propensity scores. Propensity scores can be the true values or estimated from some models. The function then performs Monte Carlo Markov chain (MCMC) for the posterior inference.

Usage

PSPI_generalizability(
  X,
  Y,
  A,
  pi,
  X_pop,
  pi_pop,
  model,
  transformation = "InvGumbel",
  nburn = 4000,
  npost = 4000,
  n_knots_main = NULL,
  n_knots_inter = NULL,
  order_main = 3,
  order_inter = 3,
  ntrees_s = 200,
  verbose = FALSE,
  seed = NULL
)

Arguments

X

Matrix of covariates for the trial data.

Y

Numeric vector of observed outcomes in the trial.

A

Binary vector of treatment assignments (0 = control, 1 = intervention).

pi

Numeric vector of trial propensity scores (probability of trial participation).

X_pop

Matrix of covariates for the target population data.

pi_pop

Numeric vector of the target population propensity scores.

model

Character string specifying which PSPI model to use (see Details).

transformation

Character string indicating the transformation applied to the propensity scores. Options are "Identity", "Logit", "Cloglog", or "InvGumbel" (default).

nburn

Number of burn-in iterations (default = 4000).

npost

Number of posterior iterations saved after burn-in (default = 4000).

n_knots_main, n_knots_inter

Number of spline knots for main and interaction effects. If NULL, defaults are chosen automatically. n_knots_inter is available for SplineBART and DSplineBART; n_knots_main is available only for DSplineBART.

order_main, order_inter

Order of spline basis functions (default = 3). order_inter applies to both SplineBART and DSplineBART; order_main applies only to DSplineBART.

ntrees_s

Number of trees used for the BART component (default = 200).

verbose

Logical; if TRUE, prints progress messages.

seed

Optional random seed for reproducibility.

Details

Model choices

The model argument selects the type of PSPI model to be fitted:

  • "BCF" – Bayesian Causal Forests (Hahn et al., 2020).

  • "BCF_P" – BCF with the propensity score as an additional predictor.

  • "FullBART" – Uses three BARTs to estimate treatment effects.

  • "SplineBART" – Incorporates a natural cubic spline for heterogeneous treatment effects.

  • "DSplineBART" – Adds another natural cubic spline for the prognostic score.

Propensity score transformations

Since splines are sensitive to scales of predictor, robust transformation is needed. The propensity scores (pi for trial, pi_pop for population) can be optionally transformed before modeling using one of the following:

  • "Identity" – uses the raw propensity scores directly (no transformation).

  • "Logit" – applies the logit transform: g(p) = \log(p / (1 - p)).

  • "Cloglog" – complementary log–log transform: g(p) = \log(-\log(1 - p)).

  • "InvGumbel" – inverse Gumbel transform: g(p) = -\log(-\log(p)). Default choice.

Users can experiment with different transformations to assess model sensitivity.

Spline settings

Spline-based models ("SplineBART" and "DSplineBART") allow flexible extrapolation to address covariate shift. The number and order of spline basis functions can be customized through the following parameters:

  • n_knots_inter, order_inter: number and order of spline knots for treatment-interaction effects. Available for both SplineBART and DSplineBART.

  • n_knots_main, order_main: number and order of spline knots for main effects. Available only for DSplineBART.

If any of these are left as NULL, default values are chosen automatically based on the cube root of the sample size (ensuring a reasonable smoothness level).

Value

A list containing posterior samples and model summaries produced by the C++ sampler. Typical elements include:

post_outcome1

Each row is a posterior draw for individual potential outcome under treatment

post_outcome0

Each row is a posterior draw for individual potential outcome under control

post_te

Each row is a posterior draw for individual treatment effects

Note

This function utilizes modified C++ code originally derived from the BART3 package (Bayesian Additive Regression Trees). The original package was developed by Rodney Sparapani and is licensed under GPL-2. Modifications were made by Jungang Zou, 2024. For more information about the original BART3 package, see: https://github.com/rsparapa/bnptools/tree/master/BART3

Examples

# Example with simulated data
sim <- sim_data(scenario = "linear", n_trial = 60)

fit <- PSPI_generalizability(
  X = as.matrix(sim$trials[, paste0("X", 1:10)]),
  Y = sim$trials$Y,
  A = sim$trials$A,
  pi = sim$population$ps[sim$population$selected],
  X_pop = as.matrix(sim$population[, paste0("X", 1:10)]),
  pi_pop = sim$population$ps,
  model = "SplineBART",
  transformation = "InvGumbel",
  verbose = FALSE,
  nburn = 1, npost = 1
)

str(fit)



PSPI documentation built on Dec. 2, 2025, 9:08 a.m.