crossval_ml: Generic cross-validation function

Description Usage Arguments Examples

View source: R/crossval_ml.R

Description

Generic cross-validation

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
crossval_ml(
  x,
  y,
  fit_func = crossval::fit_lm,
  predict_func = crossval::predict_lm,
  fit_params = NULL,
  k = 5,
  repeats = 3,
  p = 1,
  seed = 123,
  eval_metric = NULL,
  cl = NULL,
  errorhandling = c("stop", "remove", "pass"),
  packages = c("stats", "Rcpp"),
  verbose = FALSE,
  show_progress = TRUE,
  ...
)

Arguments

x

input covariates' matrix

y

response variable; a vector

fit_func

a function for fitting the model

predict_func

a function for predicting values from the model

fit_params

a list; additional (model-specific) parameters to be passed to fit_func

k

an integer; number of folds in k-fold cross validation

repeats

an integer; number of repeats for the k-fold cross validation

p

a double; proportion of data in the training/testing set, default is 1 and must be > 0.5. If p < 1, a validation set error is calculated on the remaining 1-p fraction data

seed

random seed for reproducibility of results

eval_metric

a function measuring the test errors; if not provided: RMSE for regression and accuracy for classification

cl

an integer; the number of clusters for parallel execution

errorhandling

specifies how a task evalution error should be handled. If value is "stop", then execution will be stopped if an error occurs. If value is "remove", the result for that task will not be returned. If value is "pass", then the error object generated by task evaluation will be included with the rest of the results. The default value is "stop".

packages

character vector of packages that the tasks depend on

verbose

logical flag enabling verbose messages. This can be very useful for troubleshooting.

show_progress

show evolution of the algorithm

...

additional parameters

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# dataset

set.seed(123)
n <- 1000 ; p <- 10
X <- matrix(rnorm(n * p), n, p)
y <- rnorm(n)

# linear model example -----

crossval::crossval_ml(x = X, y = y, k = 5, repeats = 3)


# randomForest example -----

require(randomForest)

# fit randomForest with mtry = 2

crossval::crossval_ml(x = X, y = y, k = 5, repeats = 3,
                  fit_func = randomForest::randomForest, predict_func = predict,
                  packages = "randomForest", fit_params = list(mtry = 2))

# fit randomForest with mtry = 4

crossval::crossval_ml(x = X, y = y, k = 5, repeats = 3,
                  fit_func = randomForest::randomForest, predict_func = predict,
                  packages = "randomForest", fit_params = list(mtry = 4))

# fit randomForest with mtry = 4, with a validation set

crossval::crossval_ml(x = X, y = y, k = 5, repeats = 2, p = 0.8,
                  fit_func = randomForest::randomForest, predict_func = predict,
                  packages = "randomForest", fit_params = list(mtry = 4))

thierrymoudiki/crossval documentation built on Aug. 17, 2020, 5:51 a.m.