dot-guard_fit: Fit leakage-safe preprocessing pipeline

.guard_fitR Documentation

Fit leakage-safe preprocessing pipeline

Description

Builds and fits a guarded preprocessing pipeline on training data, then constructs a transformer for consistent application to new data.

Usage

.guard_fit(
  X,
  y = NULL,
  steps = list(),
  task = c("binomial", "multiclass", "gaussian", "survival")
)

Arguments

X

matrix/data.frame of predictors (training).

y

Optional outcome for supervised feature selection.

steps

List of configuration options (see Details).

task

"binomial", "multiclass", "gaussian", or "survival".

Details

The pipeline applies, in order:

  • Winsorization (optional) to limit outliers.

  • Imputation learned on training data only.

  • Normalization (z-score or robust).

  • Variance/IQR filtering.

  • Feature selection (optional; t-test, lasso, PCA).

All statistics are estimated on the training data and re-used for new data.

Value

An object of class "GuardFit" with elements 'transform', 'state', 'p_out', and 'steps'.

See Also

[predict_guard()]

Examples

x <- data.frame(a = c(1, 2, NA), b = c(3, 4, 5))
fit <- .guard_fit(x, y = c(1, 2, 3),
                  steps = list(impute = list(method = "median")),
                  task = "gaussian")
fit$transform(x)

bioLeak documentation built on March 6, 2026, 1:06 a.m.