pmse_samplesize: Compute efficient sample size under user-defined PMSE targets

View source: R/pmse_samplesize.R

pmse_samplesizeR Documentation

Compute efficient sample size under user-defined PMSE targets

Description

pmse_samplesize computes a sample size for a prediction model. The function implements the formulas found in the thesis "Predictive Power and Efficient Sample Size in Linear Regression Models" by Yifan Ma (2023).

Usage

pmse_samplesize(
  k,
  p,
  PMSE_val_k = 1,
  PMSE_val_p = 1,
  efficiency_level = 0.9,
  sigma_k2 = NULL,
  sigma_p2 = NULL,
  cov = NULL,
  corr = NULL,
  SD = 1,
  f2 = NULL,
  f2_2 = NULL,
  R2_full = NULL,
  R2_basic = NULL
)

Arguments

k

Integer. Total number of predictors in the full model.

p

Integer. Number of basic predictors in the reduced model.

PMSE_val_k

Numeric. Target PMSE value for the full model.

PMSE_val_p

Numeric. Target PMSE value for the reduced model.

efficiency_level

Numeric. Target efficiency level. (default is 0.9, meaning 90% of asymptotic pPMSEr)

sigma_k2

Numeric. Predictor error variance for full model. If 'NULL' it is derived.

sigma_p2

Numeric. Predictor error variance for basic model. If 'NULL' it is derived.

cov

Optional covariance matrix. Must be ⁠(k+1) x (k+1)⁠ with the response 1st row and column.

corr

Optional correlation matrix. (Same layout as cov).

SD

Optional numeric vector of standard deviation for the predictors when a correlation matrix is supplied. Default 1

f2

Numeric. Cohen's f2 for effects of all predictors in full model.

f2_2

Numeric. Cohen’s f2 for the effects of new predictors given the basic model.

R2_full

Numeric. Coefficient of determination for full model.

R2_basic

Numeric. Coefficient of determination for basic model.

Details

pmse_samplesize

  • Sample Size Calculation for Prediction Models

pmse_samplesize The function calculates predictor error variance for the full model, with all predictors, and the reduced model, with the basic predictors using a provided covariance matrix or correlation matrix. It can also calculate predictor error variance through Cohen's F^2 and R^2 values. With the predictor error variance it determines a sample size from the efficient sample size at a target efficiency level and a sample size from a PMSE value of the full and reduced model. The final returned sample size is the largest out of the outputs.

Value

Numeric representing the required sample size.

References

Ma, Y. (2023). Predictive Power and Efficient Sample Size in Linear Regression Models. Master’s Thesis, Worcester Polytechnic Institute.

Examples

## Example with a 5-predictor model (k = 5) and 2 basic predictors (p = 2)
pmse_samplesize(
  k = 5, p = 2,
  PMSE_val_k    = 1,
  PMSE_val_p    = 1,
  efficiency_level = 0.9,
  sigma_k2 = 0.50,
  sigma_p2 = 0.60
)


pmsesampling documentation built on Sept. 9, 2025, 5:47 p.m.