cv_split_group_kfold: Resamples for group K-fold cross-validation with...

Description Usage Arguments Details Value Examples

View source: R/k_fold.R

Description

Resamples for group K-fold cross-validation with stratification by mean value of target variable.

Usage

1
2
cv_split_group_kfold(data, y, id, nfolds = 5L, probs = seq(0, 1,
  length.out = 11))

Arguments

data

data.table with y and id.

y

Target variable name (character).

id

Identifier of each group of observations (character).

nfolds

Number of folds (min 2, max 20).

probs

Numeric vector of probabilities for quantile binning with values in [0, 1] range.

Details

Numeric target: quantile binning is used for stratification.

Character/categorical target: resampling performs within categories.

probs can be a vector like c(0, seq(0.99, 1, length.out = 10)) for target with very skewed distribution, e.g. for financial data with 99% of 0's.

Ensures that all observations for each id will be placed in the same fold.

Value

data.table with nfolds columns. Each column is an indicator variable with 1 corresponds to observations in validation dataset (stratified by target).

Examples

1
2
3
4
5
dt <- data.table(
    user = rep(1:100, each = 5),
    target = rnorm(5e2)
)
cv_split_group_kfold(dt, "target", "user")

statist-bhfz/resampleR documentation built on Sept. 2, 2019, 8:14 p.m.