MLCrossValidation: R6 Class to perform cross-validation experiments

MLCrossValidationR Documentation

R6 Class to perform cross-validation experiments

Description

The MLCrossValidation class is used to construct a cross validation object and to perform a k-fold cross validation for a specified machine learning algorithm using one distinct hyperparameter setting.

Details

The MLCrossValidation class requires to provide a named list of predefined row indices for the cross validation folds, e.g., created with the function splitTools::create_folds(). This list also defines the k of the k-fold cross-validation. When wanting to perform a repeated k-fold cross validations, just provide a list with all repeated fold definitions, e.g., when specifying the argument m_rep of splitTools::create_folds().

Super classes

mlexperiments::MLBase -> mlexperiments::MLExperimentsBase -> MLCrossValidation

Public fields

fold_list

A named list of predefined row indices for the cross validation folds, e.g., created with the function splitTools::create_folds().

return_models

A logical. If the fitted models should be returned with the results (default: FALSE).

performance_metric

Either a named list with metric functions, a single metric function, or a character vector with metric names from the measures package. The provided functions must take two named arguments: ground_truth and predictions. For metrics from the measures package, the wrapper function metric() exists in order to prepare them for use with the mlexperiments package.

performance_metric_args

A list. Further arguments required to compute the performance metric.

predict_args

A list. Further arguments required to compute the predictions.

Methods

Public methods

Inherited methods

Method new()

Create a new MLCrossValidation object.

Usage
MLCrossValidation$new(
  learner,
  fold_list,
  seed,
  ncores = -1L,
  return_models = FALSE
)
Arguments
learner

An initialized learner object that inherits from class "MLLearnerBase".

fold_list

A named list of predefined row indices for the cross validation folds, e.g., created with the function splitTools::create_folds().

seed

An integer. Needs to be set for reproducibility purposes.

ncores

An integer to specify the number of cores used for parallelization (default: -1L).

return_models

A logical. If the fitted models should be returned with the results (default: FALSE).

Details

The MLCrossValidation class requires to provide a named list of predefined row indices for the cross validation folds, e.g., created with the function splitTools::create_folds(). This list also defines the k of the k-fold cross-validation. When wanting to perform a repeated k-fold cross validations, just provide a list with all repeated fold definitions, e.g., when specifing the argument m_rep of splitTools::create_folds().

Examples
if (requireNamespace("measures", quietly = TRUE)  &&
requireNamespace("class", quietly = TRUE)) {
  dataset <- do.call(
    cbind,
    c(sapply(paste0("col", 1:6), function(x) {
      rnorm(n = 500)
      },
      USE.NAMES = TRUE,
      simplify = FALSE
     ),
     list(target = sample(0:1, 500, TRUE))
  ))
  fold_list <- splitTools::create_folds(
    y = dataset[, 7],
    k = 3,
    type = "stratified",
    seed = 123
  )
  cv <- MLCrossValidation$new(
    learner = LearnerKnn$new(),
    fold_list = fold_list,
    seed = 123,
    ncores = 2
  )
}


Method execute()

Execute the cross validation.

Usage
MLCrossValidation$execute()
Details

All results of the cross validation are saved in the field ⁠$results⁠ of the MLCrossValidation class. After successful execution of the cross validation, ⁠$results⁠ contains a list with the items:

  • "fold" A list of folds containing the following items for each cross validation fold:

    • "fold_ids" A vector with the utilized in-sample row indices.

    • "ground_truth" A vector with the ground truth.

    • "predictions" A vector with the predictions.

    • "learner.args" A list with the arguments provided to the learner.

    • "model" If return_models = TRUE, the fitted model.

  • "summary" A data.table with the summarized results (same as the returned value of the execute method).

  • "performance" A list with the value of the performance metric calculated for each of the cross validation folds.

Returns

The function returns a data.table with the results of the cross validation. More results are accessible from the field ⁠$results⁠ of the MLCrossValidation class.

Examples
if (requireNamespace("measures", quietly = TRUE)  &&
requireNamespace("class", quietly = TRUE)) {
  dataset <- do.call(
    cbind,
    c(sapply(paste0("col", 1:6), function(x) {
      rnorm(n = 500)
      },
      USE.NAMES = TRUE,
      simplify = FALSE
     ),
     list(target = sample(0:1, 500, TRUE))
  ))
  fold_list <- splitTools::create_folds(
    y = dataset[, 7],
    k = 3,
    type = "stratified",
    seed = 123
  )
  cv <- MLCrossValidation$new(
    learner = LearnerKnn$new(),
    fold_list = fold_list,
    seed = 123,
    ncores = 2
  )
  cv$learner_args <- list(
    k = 20,
    l = 0,
    test = parse(text = "fold_test$x")
  )
  cv$predict_args <- list(type = "response")
  cv$performance_metric_args <- list(
    positive = "1",
    negative = "0"
  )
  cv$performance_metric <- metric("MMCE")

  # set data
  cv$set_data(
    x = data.matrix(dataset[, -7]),
    y = dataset[, 7]
  )

  cv$execute()
}


Method clone()

The objects of this class are cloneable with this method.

Usage
MLCrossValidation$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

splitTools::create_folds()

splitTools::create_folds(), metric()

Examples

if (requireNamespace("measures", quietly = TRUE)  &&
requireNamespace("class", quietly = TRUE)) {

  dataset <- do.call(
    cbind,
    c(sapply(paste0("col", 1:6), function(x) {
      rnorm(n = 500)
      },
      USE.NAMES = TRUE,
      simplify = FALSE
     ),
     list(target = sample(0:1, 500, TRUE))
  ))

  fold_list <- splitTools::create_folds(
    y = dataset[, 7],
    k = 3,
    type = "stratified",
    seed = 123
  )

  cv <- MLCrossValidation$new(
    learner = LearnerKnn$new(),
    fold_list = fold_list,
    seed = 123,
    ncores = 2
  )

  # learner parameters
  cv$learner_args <- list(
    k = 20,
    l = 0,
    test = parse(text = "fold_test$x")
  )

  # performance parameters
  cv$predict_args <- list(type = "response")
  cv$performance_metric_args <- list(
    positive = "1",
    negative = "0"
  )
  cv$performance_metric <- metric("MMCE")

  # set data
  cv$set_data(
    x = data.matrix(dataset[, -7]),
    y = dataset[, 7]
  )

  cv$execute()
}


## ------------------------------------------------
## Method `MLCrossValidation$new`
## ------------------------------------------------

if (requireNamespace("measures", quietly = TRUE)  &&
requireNamespace("class", quietly = TRUE)) {
  dataset <- do.call(
    cbind,
    c(sapply(paste0("col", 1:6), function(x) {
      rnorm(n = 500)
      },
      USE.NAMES = TRUE,
      simplify = FALSE
     ),
     list(target = sample(0:1, 500, TRUE))
  ))
  fold_list <- splitTools::create_folds(
    y = dataset[, 7],
    k = 3,
    type = "stratified",
    seed = 123
  )
  cv <- MLCrossValidation$new(
    learner = LearnerKnn$new(),
    fold_list = fold_list,
    seed = 123,
    ncores = 2
  )
}


## ------------------------------------------------
## Method `MLCrossValidation$execute`
## ------------------------------------------------

if (requireNamespace("measures", quietly = TRUE)  &&
requireNamespace("class", quietly = TRUE)) {
  dataset <- do.call(
    cbind,
    c(sapply(paste0("col", 1:6), function(x) {
      rnorm(n = 500)
      },
      USE.NAMES = TRUE,
      simplify = FALSE
     ),
     list(target = sample(0:1, 500, TRUE))
  ))
  fold_list <- splitTools::create_folds(
    y = dataset[, 7],
    k = 3,
    type = "stratified",
    seed = 123
  )
  cv <- MLCrossValidation$new(
    learner = LearnerKnn$new(),
    fold_list = fold_list,
    seed = 123,
    ncores = 2
  )
  cv$learner_args <- list(
    k = 20,
    l = 0,
    test = parse(text = "fold_test$x")
  )
  cv$predict_args <- list(type = "response")
  cv$performance_metric_args <- list(
    positive = "1",
    negative = "0"
  )
  cv$performance_metric <- metric("MMCE")

  # set data
  cv$set_data(
    x = data.matrix(dataset[, -7]),
    y = dataset[, 7]
  )

  cv$execute()
}


mlexperiments documentation built on Jan. 16, 2026, 5:17 p.m.