MLCrossValidation | R Documentation |
The MLCrossValidation
class is used to construct a cross validation object
and to perform a k-fold cross validation for a specified machine learning
algorithm using one distinct hyperparameter setting.
The MLCrossValidation
class requires to provide a named list of predefined
row indices for the cross validation folds, e.g., created with the function
splitTools::create_folds()
. This list also defines the k
of the k-fold
cross-validation. When wanting to perform a repeated k-fold cross
validations, just provide a list with all repeated fold definitions, e.g.,
when specifying the argument m_rep
of splitTools::create_folds()
.
mlexperiments::MLBase
-> mlexperiments::MLExperimentsBase
-> MLCrossValidation
fold_list
A named list of predefined row indices for the cross
validation folds, e.g., created with the function
splitTools::create_folds()
.
return_models
A logical. If the fitted models should be returned
with the results (default: FALSE
).
performance_metric
Either a named list with metric functions, a
single metric function, or a character vector with metric names from
the mlr3measures
package. The provided functions must take two named
arguments: ground_truth
and predictions
. For metrics from the
mlr3measures
package, the wrapper function metric()
exists in order to prepare them for use with the mlexperiments
package.
performance_metric_args
A list. Further arguments required to compute the performance metric.
predict_args
A list. Further arguments required to compute the predictions.
new()
Create a new MLCrossValidation
object.
MLCrossValidation$new( learner, fold_list, seed, ncores = -1L, return_models = FALSE )
learner
An initialized learner object that inherits from class
"MLLearnerBase"
.
fold_list
A named list of predefined row indices for the cross
validation folds, e.g., created with the function
splitTools::create_folds()
.
seed
An integer. Needs to be set for reproducibility purposes.
ncores
An integer to specify the number of cores used for
parallelization (default: -1L
).
return_models
A logical. If the fitted models should be returned
with the results (default: FALSE
).
The MLCrossValidation
class requires to provide a named list of
predefined row indices for the cross validation folds, e.g., created
with the function splitTools::create_folds()
. This list also defines
the k
of the k-fold cross-validation. When wanting to perform a
repeated k-fold cross validations, just provide a list with all
repeated fold definitions, e.g., when specifing the argument m_rep
of
splitTools::create_folds()
.
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLCrossValidation$new( learner = LearnerKnn$new(), fold_list = fold_list, seed = 123, ncores = 2 )
execute()
Execute the cross validation.
MLCrossValidation$execute()
All results of the cross validation are saved in the field
$results
of the MLCrossValidation
class. After successful execution
of the cross validation, $results
contains a list with the items:
"fold" A list of folds containing the following items for each cross validation fold:
"fold_ids" A vector with the utilized in-sample row indices.
"ground_truth" A vector with the ground truth.
"predictions" A vector with the predictions.
"learner.args" A list with the arguments provided to the learner.
"model" If return_models = TRUE
, the fitted model.
"summary" A data.table with the summarized results (same as
the returned value of the execute
method).
"performance" A list with the value of the performance metric calculated for each of the cross validation folds.
The function returns a data.table with the results of the cross
validation. More results are accessible from the field $results
of
the MLCrossValidation
class.
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLCrossValidation$new( learner = LearnerKnn$new(), fold_list = fold_list, seed = 123, ncores = 2 ) cv$learner_args <- list( k = 20, l = 0, test = parse(text = "fold_test$x") ) cv$predict_args <- list(type = "response") cv$performance_metric <- metric("bacc") # set data cv$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv$execute()
clone()
The objects of this class are cloneable with this method.
MLCrossValidation$clone(deep = FALSE)
deep
Whether to make a deep clone.
splitTools::create_folds()
splitTools::create_folds()
, mlr3measures::measures,
metric()
dataset <- do.call(
cbind,
c(sapply(paste0("col", 1:6), function(x) {
rnorm(n = 500)
},
USE.NAMES = TRUE,
simplify = FALSE
),
list(target = sample(0:1, 500, TRUE))
))
fold_list <- splitTools::create_folds(
y = dataset[, 7],
k = 3,
type = "stratified",
seed = 123
)
cv <- MLCrossValidation$new(
learner = LearnerKnn$new(),
fold_list = fold_list,
seed = 123,
ncores = 2
)
# learner parameters
cv$learner_args <- list(
k = 20,
l = 0,
test = parse(text = "fold_test$x")
)
# performance parameters
cv$predict_args <- list(type = "response")
cv$performance_metric <- metric("bacc")
# set data
cv$set_data(
x = data.matrix(dataset[, -7]),
y = dataset[, 7]
)
cv$execute()
## ------------------------------------------------
## Method `MLCrossValidation$new`
## ------------------------------------------------
dataset <- do.call(
cbind,
c(sapply(paste0("col", 1:6), function(x) {
rnorm(n = 500)
},
USE.NAMES = TRUE,
simplify = FALSE
),
list(target = sample(0:1, 500, TRUE))
))
fold_list <- splitTools::create_folds(
y = dataset[, 7],
k = 3,
type = "stratified",
seed = 123
)
cv <- MLCrossValidation$new(
learner = LearnerKnn$new(),
fold_list = fold_list,
seed = 123,
ncores = 2
)
## ------------------------------------------------
## Method `MLCrossValidation$execute`
## ------------------------------------------------
dataset <- do.call(
cbind,
c(sapply(paste0("col", 1:6), function(x) {
rnorm(n = 500)
},
USE.NAMES = TRUE,
simplify = FALSE
),
list(target = sample(0:1, 500, TRUE))
))
fold_list <- splitTools::create_folds(
y = dataset[, 7],
k = 3,
type = "stratified",
seed = 123
)
cv <- MLCrossValidation$new(
learner = LearnerKnn$new(),
fold_list = fold_list,
seed = 123,
ncores = 2
)
cv$learner_args <- list(
k = 20,
l = 0,
test = parse(text = "fold_test$x")
)
cv$predict_args <- list(type = "response")
cv$performance_metric <- metric("bacc")
# set data
cv$set_data(
x = data.matrix(dataset[, -7]),
y = dataset[, 7]
)
cv$execute()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.