knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 )
This vignette provides comprehensive guidance on power analysis and sample size determination for method comparison and agreement studies using the SimplyAgree package.
SimplyAgree implements four approaches to power/sample size calculations:
power_agreement_exact() - Exact agreement test [@shieh2019]blandPowerCurve() - Bland-Altman power curves [@lu2016]agree_expected_half() - Expected half-width criterion [@JanShieh2018]agree_assurance() - Assurance probability criterion [@JanShieh2018]library(SimplyAgree)
The methods divide into two categories:
Hypothesis Testing (binary decision):
power_agreement_exact() - Tests if central proportion, essentially tolerance intevals, are within the maximal allowable differenceblandPowerCurve() - Tests if confidence intervals of limits of agreement fall within the maximal allowable differenceEstimation (quantifying precision):
agree_expected_half() - Controls average CI half-width of limits of agreementagree_assurance() - Controls probability of achieving target CI half-width of limits of agreementTests whether the central P* proportion of paired differences falls within the maximal allowable difference [-delta, delta].
Hypotheses:
power_agreement_exact( n = NULL, # Sample size delta = NULL, # Tolerance bound mu = 0, # Mean of differences sigma = NULL, # SD of differences p0_star = 0.95, # Central proportion (tolerance coverage) power = NULL, # Target power alpha = 0.05 # Significance level )
Specify exactly three of: n, delta, power, sigma.
# Blood pressure device comparison result <- power_agreement_exact( delta = 7, # +/-7 mmHg tolerance mu = 0.5, # Expected bias sigma = 2.5, # Expected SD p0_star = 0.95, # 95% must be within bounds power = 0.80, # 80% power alpha = 0.05 ) print(result)
Calculates power curves using approximate Bland-Altman confidence intervals using the method of @lu2016 (which is approximate). Useful for exploring power across sample sizes.
blandPowerCurve( samplesizes = seq(10, 100, 1), # Range of sample sizes mu = 0, # Mean difference SD, # SD of differences delta, # Tolerance bound(s) conf.level = 0.95, # CI confidence level agree.level = 0.95 # LOA agreement level )
# Generate power curve pc <- blandPowerCurve( samplesizes = seq(10, 200, 1), mu = 0, SD = 3.3, delta = 8, conf.level = 0.95, agree.level = 0.95 ) # Plot plot(pc, type = 1) # Find n for 80% power find_n(pc, power = 0.8)
Determines sample size to ensure average CI half-width <= delta across hypothetical repeated studies.
agree_expected_half( conf.level = 0.95, # CI confidence level delta = NULL, # Target expected half-width pstar = 0.95, # Central proportion sigma = 1, # SD of differences n = NULL # Sample size )
Specify either n OR delta.
# Want E[H] <= 2.5*sigma result <- agree_expected_half( conf.level = 0.95, delta = 2.5, # As multiple of sigma pstar = 0.95, sigma = 1 # Standardized ) print(result)
Determines sample size to ensure probability that CI half-width <= omega is at least (1-gamma).
Stronger guarantee than expected half-width --- ensures specific probability of achieving target precision.
agree_assurance( conf.level = 0.95, # CI confidence level assurance = 0.90, # Target assurance probability omega = NULL, # Target half-width bound pstar = 0.95, # Central proportion sigma = 1, # SD of differences n = NULL # Sample size )
Specify either n OR omega.
# Want 90% probability that H <= 2.5*sigma result <- agree_assurance( conf.level = 0.95, assurance = 0.90, # 90% probability omega = 2.5, # Target bound pstar = 0.95, sigma = 1 ) print(result)
Research Goal?
|
|- Hypothesis Testing ->
| \- Need exact Type I error control -> Power for Agreement
|
\- Precision Estimation ->
|- Average precision sufficient -> Expected Half-Width
\- Need probabilistic guarantee -> Assurance Probability
Many studies have clustered data where there are multiple measurements per subject or natural groupings (e.g., repeated measures, multi-center studies). Note, the advice here only applies to clustering but not to situations where replicate measures are taken within a measurement occasion (e.g., multiple measures at the same time point wherein any variation would only represent measurement error).
Standard formulas assume independence^[Implications of which are discussed by @bland2003cluster among many others]. Ignoring clustering can leads to studies that lack precision. To my knowledge, there is no well developed methods for accounting for clustering in sample size calculations for agreement studies, so we use a common approximation from survey sampling and multilevel modeling: the design effect.
The design effect (DEFF) quantifies loss of efficiency due to clustering:
$$\text{DEFF} = 1 + (m - 1) \times \text{ICC}$$
where:
Effect on sample size: $$n_{\text{ESS}} = n_{\text{independent}} \times \text{DEFF}$$
ICC = proportion of variance between clusters:
$$\text{ICC} = \frac{\sigma^2_{\text{between}}}{\sigma^2_{\text{between}} + \sigma^2_{\text{within}}}$$
# Step 1: Independent sample size result <- power_agreement_exact( delta = 7, mu = 0.5, sigma = 2.5, p0_star = 0.95, power = 0.80, alpha = 0.05 ) n_indep <- result$n cat("Independent pairs needed:", n_indep, "\n") # Step 2: Apply design effect m <- 3 # 3 measurements per participant ICC <- 0.15 # from pilot or literature DEFF <- 1 + (m - 1) * ICC cat("Design effect:", round(DEFF, 3), "\n") # Step 3: Calculate participants needed n_ess <- ceiling(n_indep * DEFF) K <- ceiling(n_ess / m) cat("Total observations:", n_ess, "\n") cat("Participants needed:", K, "\n")
Result: Instead of 34 independent pairs, need ~15 participants (45 total observations).
# Compare different ICC values n_indep <- 50 m <- 4 ICC_values <- c(0, 0.05, 0.10, 0.15, 0.20) for (ICC in ICC_values) { DEFF <- 1 + (m - 1) * ICC K <- ceiling(ceiling(n_indep * DEFF) / m) cat(sprintf("ICC = %.2f: Need %d participants\n", ICC, K)) }
Good situations:
Problematic:
For complex designs, consider simulation-based power analysis and consult a statistician.
# Study parameters sigma <- 3.3 delta <- 7 m <- 4 # measurements per participant ICC <- 0.15 dropout <- 0.20 # Step 1: Independent sample size result <- power_agreement_exact( delta = delta, mu = 0, sigma = sigma, p0_star = 0.95, power = 0.80, alpha = 0.05 ) # Step 2: Account for clustering DEFF <- 1 + (m - 1) * ICC n_total <- ceiling(result$n * DEFF) K_pre <- ceiling(n_total / m) # Step 3: Account for dropout K_final <- ceiling(K_pre / (1 - dropout)) # Summary cat("Independent pairs:", result$n, "\n") cat("Design effect:", round(DEFF, 3), "\n") cat("Participants (no dropout):", K_pre, "\n") cat("Participants to recruit:", K_final, "\n") cat("Total measurements:", K_final * m, "\n")
When uncertain:
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.