dot-guard_fit: Fit leakage-safe preprocessing pipeline
In bioLeak: Leakage-Safe Modeling and Auditing for Genomic and Clinical Data

.guard_fit

R Documentation

Fit leakage-safe preprocessing pipeline

Description

Builds and fits a guarded preprocessing pipeline on training data, then constructs a transformer for consistent application to new data.

Usage

.guard_fit(
  X,
  y = NULL,
  steps = list(),
  task = c("binomial", "multiclass", "gaussian", "survival")
)

Arguments

`X`	matrix/data.frame of predictors (training).
`y`	Optional outcome for supervised feature selection.
`steps`	List of configuration options (see Details).
`task`	"binomial", "multiclass", "gaussian", or "survival".

Details

The pipeline applies, in order:

Winsorization (optional) to limit outliers.
Imputation learned on training data only.
Normalization (z-score or robust).
Variance/IQR filtering.
Feature selection (optional; t-test, lasso, PCA).

All statistics are estimated on the training data and re-used for new data.

Value

An object of class "GuardFit" with elements 'transform', 'state', 'p_out', and 'steps'.

Examples

x <- data.frame(a = c(1, 2, NA), b = c(3, 4, 5))
fit <- .guard_fit(x, y = c(1, 2, 3),
                  steps = list(impute = list(method = "median")),
                  task = "gaussian")
fit$transform(x)

bioLeak documentation built on March 26, 2026, 5:09 p.m.