data.leak: Data leakage detection

Description Usage Arguments Value Author(s) Examples

View source: R/data_leak.R

Description

Fits a decision tree model to determine which features have data leakage

Usage

1
2
data.leak(train, test, id.feats = NULL, sample.size = 0.3,
  seed = 1234, progress = TRUE)

Arguments

train

[required | data.frame] Training data

test

[required | data.frame] Testing data

id.feats

[optional | character | default=NULL] Names of ID features

sample.size

[optional | numeric | default=0.3] Percentage to down sample data for decreased computation time

seed

[optional | integer | default=1234] Random number seed for reproducable results

progress

[Optional | logical | default=TRUE] Display a progress bar

Value

Data frame containing AUC per feature indicating data leakage

Author(s)

Xander Horn

Examples

1
2
3
train <- iris[1:65,]
test <- iris[66:nrow(iris),]
res <- data.leak(train = train, test = test)

XanderHorn/lazy documentation built on Jan. 16, 2021, 6:15 p.m.