summary.LeakAudit: Summarize a leakage audit
In bioLeak: Leakage-Safe Modeling and Auditing for Genomic and Clinical Data

summary.LeakAudit

R Documentation

Summarize a leakage audit

Description

Prints a concise, human-readable report for a 'LeakAudit' object produced by [audit_leakage()]. The summary surfaces four diagnostics when available: label-permutation gap (prediction-label association by default), batch/study association tests (metadata aligned with fold splits), target leakage scan (features strongly associated with the outcome), and near-duplicate detection (high similarity in 'X_ref'). The output reflects the stored audit results only; it does not recompute any tests.

Usage

## S3 method for class 'LeakAudit'
summary(object, digits = 3, ...)

Arguments

`object`	A 'LeakAudit' object from [audit_leakage()]. The summary reads stored results from 'object' and prints them to the console.
`digits`	Integer number of digits to show when formatting numeric statistics in the console output. Defaults to '3'. Increasing 'digits' shows more precision; decreasing it shortens the printout without changing the underlying values.
`...`	Unused. Included for S3 method compatibility; additional arguments are ignored.

Details

The permutation test quantifies prediction-label association when using fixed predictions; refit-based permutations require 'perm_refit = TRUE' (or '"auto"' with refit data). It does not by itself prove or rule out leakage. Batch association flags metadata that align with fold assignment; this may reflect study design rather than leakage. Target leakage scan uses univariate feature-outcome associations and can miss multivariate proxies, interaction leakage, or features not included in 'X_ref'. The multivariate scan (enabled by default for supported tasks) reports an additional model-based score. Duplicate detection only considers the provided 'X_ref' features and the similarity threshold used during [audit_leakage()]. By default, 'duplicate_scope = "train_test"' filters to pairs that cross train/test; set 'duplicate_scope = "all"' to include within-fold duplicates. Sections are reported as "not available" when the corresponding audit component was not computed.

Value

Invisibly returns 'object' after printing the summary.

Examples

set.seed(1)
df <- data.frame(
  subject = rep(1:6, each = 2),
  outcome = rbinom(12, 1, 0.5),
  x1 = rnorm(12),
  x2 = rnorm(12)
)
splits <- make_split_plan(df, outcome = "outcome",
                      mode = "subject_grouped", group = "subject", v = 3)
custom <- list(
  glm = list(
    fit = function(x, y, task, weights, ...) {
      stats::glm(y ~ ., data = as.data.frame(x),
                 family = stats::binomial(), weights = weights)
    },
    predict = function(object, newdata, task, ...) {
      as.numeric(stats::predict(object, newdata = as.data.frame(newdata),
                                type = "response"))
    }
  )
)
fit <- fit_resample(df, outcome = "outcome", splits = splits,
                    learner = "glm", custom_learners = custom,
                    metrics = "auc", refit = FALSE, seed = 1)
audit <- audit_leakage(fit, metric = "auc", B = 5,
                       X_ref = df[, c("x1", "x2")], seed = 1)
summary(audit) # prints the audit report and returns `audit` invisibly

bioLeak documentation built on March 26, 2026, 5:09 p.m.