grid.search.cross.validation: Grid search K-fold cross-validation

Description Usage Arguments Value

View source: R/grid.search.cross.validation.R

Description

Implementation of the grid search approach using K-fold cross-validation for hyperparameter tuning of a given estimator.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
grid.search.cross.validation(
  formula,
  data,
  estimator,
  params.list,
  n.folds = 5,
  ind.metric,
  comb.metric = mean,
  fold.id = NULL,
  force = F,
  verbose = F,
  plot = F,
  contour.scale = NULL,
  coef.lims = NULL,
  coef.names = NULL,
  seed = NULL,
  use.formula = T,
  ...
)

Arguments

formula

an object of class formula: a symbolic description of the model to be fitted following the standard of lm.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment (formula), typically the environment from which this function is called.

estimator

estimator function that has arguments formula and data and returns a list containing parameters estimates in the component coefficients.

params.list

list or vector containing hyperparameters and their respective values to consider

n.folds

optional number of folds (K) in cross-validation. Default is 5.

ind.metric

metric function taking in a numerical vector of predictions for the dependent variable together with a numerical vector of actual outcomes of the dependent variable that returns a performance metric for the individual folds.

comb.metric

optional function used to combine individual fold metrics. Default is mean.

fold.id

optional vector containing numerical fold identifiers for each row in the data. If NULL, n.folds is used and random fold identifiers are constructed divided the observations equally over K folds. If provided, n.folds is ignored. Default is NULL.

force

optional boolean indicating whether or not to allow for errors due to singularity when applying the estimator. If TRUE, the individual metric of folds with hyperparameter combinations for which the estimator is not able to estimate coefficients due to singularity, are set to Inf and hence the errors are ignored. Default is FALSE.

verbose

optional boolean indicating whether to show a progress bar. Default is FALSE.

plot

optional boolean indicating whether to generate heatmaps of performance on the grid and displaying the estimated coefficients. Default is FALSE.

contour.scale

optional vector containing the hyperparameters and their scales for the axes in the contour plots. Default is NULL.

coef.lims

optimal limits of the coefficients plot. Default is NULL.

coef.names

optional names of coefficients, used in estimates barplot. Default is NULL in which case the column names of the explanatory variables are used.

seed

optimal seed to specify. Default is NULL.

use.formula

whether or not to use the formula data combination of X and y as inputs to the model. Default is TRUE.

...

additional arguments to be passed to the estimator function.

Value

grid.search.cross.validation returns an object of class "gscv". An object of class "gscv" is a list containing at least the following components:

coefficients

a named vector of optimal coefficients.

metric

metric of the optimal hyperparameters.

params

a named vector of optimal hyperparameters.


Accelerytics/mlkit documentation built on Dec. 31, 2020, 9:46 a.m.