predict_guard: Apply a fitted GuardFit transformer to new data
In bioLeak: Leakage-Safe Modeling and Auditing for Genomic and Clinical Data

predict_guard

R Documentation

Apply a fitted GuardFit transformer to new data

Description

Applies the preprocessing steps stored in a GuardFit object to new data without refitting any statistics. This is designed to prevent validation leakage that would occur if imputation, scaling, filtering, or feature selection were recomputed on evaluation data. It enforces the training schema by aligning columns and factor levels, and it errors when a numeric-only training fit receives non-numeric predictors. It does not detect label leakage, duplicate samples, or train/test contamination.

Usage

predict_guard(fit, newdata)

Arguments

`fit`	A `GuardFit` object created by [.guard_fit()]. This required argument (no default) contains the training-time preprocessing settings and statistics. Changing `fit` (for example, a different imputation method or feature selection step) changes the output columns and values.
`newdata`	A matrix or data.frame of predictors with one row per sample. This required argument (no default) is transformed using the training-time parameters in `fit` only. Missing columns are added and filled, extra columns are dropped, and factor levels are aligned to the training levels; if the training fit was numeric-only, non-numeric columns in `newdata` trigger an error.

Value

A data.frame of transformed predictors with the same number of rows as newdata. Column order and content match the training pipeline and may include derived features (one-hot encodings, missingness indicators, or PCA components). This output is not a prediction; it is intended as input to a downstream model and assumes the training-time preprocessing is valid for the new data.

Examples

x_train <- data.frame(a = c(1, 2, NA, 4), b = c(10, 11, 12, 13))
fit <- .guard_fit(
  x_train,
  y = c(0.1, 0.2, 0.3, 0.4),
  steps = list(impute = list(method = "median")),
  task = "gaussian"
)
x_new <- data.frame(a = c(NA, 5), b = c(9, 14))
out <- predict_guard(fit, x_new)
out

bioLeak documentation built on March 26, 2026, 5:09 p.m.