knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5 ) library(PSpower)
PSpower is a design-stage tool for planning observational and randomized studies that estimate treatment effects using propensity score (PS) weighting. It answers: how large does my study need to be?
Two functions cover the most common outcome types:
| Function | Outcome | Default test |
|---|---|---|
| power_ps() | Continuous or binary | Two-sided |
| power_cox() | Time-to-event (survival) | One-sided |
This is a prospective planning tool. Computing power after data are collected using the observed sample size ("post-hoc power") is uninformative and should not be done.
For all outcome types, the required sample size takes the form
$$N = \left\lceil V \cdot \frac{(z_{1-\alpha/k} + z_\beta)^2}{\delta^2} \right\rceil,$$
where $\delta$ is the standardized effect size, $\alpha$ is the significance level, $\beta$ is the target power, and $k = 2$ for a two-sided test or $k = 1$ for a one-sided test. All reported sample sizes are ceiling integers (rounded up to the nearest whole number).
The asymptotic variance $V$ is what distinguishes the two functions:
power_ps): $V$ is the variance of the Hájek estimator, determined by $r$, $\phi$, $\rho^2$, and the estimand.power_cox): $V$ is the variance of the PS-weighted partial likelihood estimator, determined by $r$, $d_1$, $d_0$, $\tau_0 = \log(\text{HR})$, and (for observational studies) $\phi$ and the estimand.Common inputs
Effect size — the standardized treatment effect $\delta = \tau / \text{SD}(Y)$ for continuous/binary outcomes, or the log hazard ratio $\tau_0 = \log(\text{HR})$ for survival outcomes. For continuous outcomes, $|\delta| = 0.2$ is considered small; 0.5 moderate; 0.8 large. For survival, $\text{HR} = 0.75$ gives $\tau_0 \approx -0.29$.
Treatment proportion r — the fraction of participants who receive treatment, $r = \Pr(Z = 1)$. A balanced design uses $r = 0.5$; in observational data, $r$ is estimated from the study population.
Estimand — the scientific quantity of interest:
power_ps(); for power_cox(), re-label groups and use estimand = "ATT".In a randomized trial all four estimands coincide.
Continuous and binary outcomes (power_ps)
Overlap coefficient φ — a number in $(0, 1)$ measuring how similar the PS distributions of the two groups are. $\phi = 1$ in a randomized trial; $\phi < 1$ reflects confounding, and sample size increases as $\phi$ decreases. Rule of thumb: $\phi \geq 0.95$ good; $[0.90, 0.95)$ moderate; $[0.85, 0.90)$ poor; $< 0.85$ very poor.
$\phi$ can be estimated from a pilot study or any dataset with fitted propensity scores using overlap_coef(). When it is not yet known, supplying a vector of plausible values (e.g., $\phi \in {0.85, 0.90, 0.95}$) allows $N$ to be computed across the plausible range.
Confounder coefficient ρ² — the squared correlation between the potential outcome $Y(0)$ and the PS linear predictor. When $\rho^2 = 0$ (the default), the outcome is unrelated to the confounders beyond the treatment indicator. A sensitivity analysis over $\rho^2 \in [0, 0.05]$ is recommended; values above 0.05 are uncommon in practice.
Survival outcomes (power_cox)
Event rates d₁ and d₀ — the probability of observing an event during follow-up in the treated group ($d_1$) and the control group ($d_0$). These can be estimated from registries, pilot data, or published survival curves. If $d_0$ is omitted it is set equal to $d_1$.
Study type — "rct" for a randomized trial or "obs" for an observational study.
Overlap coefficient φ — required when study_type = "obs"; the same quantity as in power_ps(). See the rule of thumb and estimation guidance under the continuous/binary section above.
Method — "robust" (default) uses the sandwich variance, which accounts for the actual hazard ratio; "schoenfeld" is the classical formula derived under a null effect and may overestimate or underestimate the required $N$ relative to the robust method. Available only for randomized trials.
power_ps()Suppose we plan a study with:
res <- power_ps( effect_size = 0.2, r = 0.5, phi = 0.9, estimand = "ATE", power = 0.80 ) res
Vectors can be supplied to any input; all combinations are computed automatically.
Effect of treatment allocation
The required sample size depends on how evenly participants are split between treatment and control. The following evaluates five values of $r$ across three estimands ($5 \times 3 = 15$ scenarios):
res_r <- power_ps( effect_size = 0.2, r = c(0.30, 0.40, 0.50, 0.60, 0.70), phi = 0.90, estimand = c("ATE", "ATT", "ATO"), power = 0.80 ) plot(res_r)
$N$ is minimized at $r = 0.5$ for ATE; the ATT and ATC estimands are not symmetric in $r$ — ATT favors a larger treated fraction, and vice versa for ATC.
Effect of covariate overlap
res_phi <- power_ps( effect_size = 0.2, r = 0.5, phi = c(0.85, 0.90, 0.95), estimand = c("ATE", "ATT", "ATO"), power = 0.80 ) plot(res_phi)
summary(res_phi)
The ATO estimand consistently requires fewer participants than ATE, especially under poor overlap.
The table below shows how $N$ changes across $\rho^2 \in [0, 0.05]$ for ATE with $\phi = 0.90$. A sensitivity analysis over this range is recommended; values above 0.05 are uncommon in practice.
power_ps( effect_size = 0.2, r = 0.5, phi = 0.90, rho2 = c(0, 0.01, 0.02, 0.05), estimand = "ATE", power = 0.80 )
An upper bound on $\rho^2$ can be obtained from the $R^2$ of a regression of the outcome on the study covariates using historical data.
power_cox()res_rct <- power_cox( effect_size = log(0.75), # HR = 0.75 r = 0.5, d1 = 0.40, study_type = "rct", power = 0.80 ) res_rct
The default method uses the robust sandwich variance, which accounts for the actual hazard ratio. The Schoenfeld formula (derived under a null effect) does not correct for the effect size and may give a larger or smaller $N$:
power_cox( effect_size = log(0.75), r = 0.5, d1 = 0.40, study_type = "rct", method = "schoenfeld", power = 0.80 )
The following evaluates three hazard ratios and three treatment proportions ($3 \times 3 = 9$ scenarios):
res_rct_grid <- power_cox( effect_size = c(log(0.65), log(0.75), log(0.85)), r = c(0.40, 0.50, 0.60), d1 = 0.40, study_type = "rct", power = 0.80 ) plot(res_rct_grid)
summary(res_rct_grid)
For observational studies, the overlap coefficient $\phi$ must also be provided. The ATE estimand uses inverse probability weights; ATO uses overlap weights and is more efficient when overlap is limited.
res_obs <- power_cox( effect_size = log(0.75), r = 0.5, d1 = 0.40, phi = 0.90, study_type = "obs", estimand = "ATE", power = 0.80 ) res_obs
res_obs_grid <- power_cox( effect_size = log(0.75), r = 0.5, d1 = 0.40, phi = c(0.85, 0.90, 0.95), study_type = "obs", estimand = c("ATE", "ATO"), power = 0.80, n_mc = 5e4 # use the default 1e6 in practice ) plot(res_obs_grid)
summary(res_obs_grid)
If a pilot dataset or a similar published cohort is available, $\phi$ can be estimated directly from fitted propensity scores using overlap_coef().
set.seed(2026) n <- 500 X <- rnorm(n) ps <- plogis(0.5 * X) # fitted propensity scores from a PS model Z <- rbinom(n, 1, ps) overlap_coef(ps = ps, Z = Z)
When no data are available, use the rule of thumb above and compute $N$ across the plausible range (e.g., $\phi \in {0.85, 0.90, 0.95}$), as shown in the examples above.
overlap_coef() also accepts Beta distribution parameters directly, which is useful when the PS distribution has been summarized in a published paper:
overlap_coef(a = 3, b = 2) # r = a/(a+b) = 0.60
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.