tune: Parameter Tuning of Functions Using Grid Search

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/tune.R

Description

This generic function tunes hyperparameters of statistical methods using a grid search over supplied parameter ranges.

Usage

1
2
3
4
tune(method, train.x, train.y = NULL, data = list(), validation.x =
     NULL, validation.y = NULL, ranges = NULL, predict.func = predict,
     tunecontrol = tune.control(), ...)
best.tune(...)

Arguments

method

either the function to be tuned, or a character string naming such a function.

train.x

either a formula or a matrix of predictors.

train.y

the response variable if train.x is a predictor matrix. Ignored if train.x is a formula.

data

data, if a formula interface is used. Ignored, if predictor matrix and response are supplied directly.

validation.x

an optional validation set. Depending on whether a formula interface is used or not, the response can be included in validation.x or separately specified using validation.y. Only used for bootstrap and fixed validation set (see tune.control)

validation.y

if no formula interface is used, the response of the (optional) validation set. Only used for bootstrap and fixed validation set (see tune.control)

ranges

a named list of parameter vectors spanning the sampling space. The vectors will usually be created by seq.

predict.func

optional predict function, if the standard predict behavior is inadequate.

tunecontrol

object of class "tune.control", as created by the function tune.control(). If omitted, tune.control() gives the defaults.

...

Further parameters passed to the training functions.

Details

As performance measure, the classification error is used for classification, and the mean squared error for regression. It is possible to specify only one parameter combination (i.e., vectors of length 1) to obtain an error estimation of the specified type (bootstrap, cross-classification, etc.) on the given data set. For convenience, there are several tune.foo() wrappers defined, e.g., for nnet(), randomForest(), rpart(), svm(), and knn().

Cross-validation randomizes the data set before building the splits which—once created—remain constant during the training process. The splits can be recovered through the train.ind component of the returned object.

Value

For tune, an object of class tune, including the components:

best.parameters

a 1 x k data frame, k number of parameters.

best.performance

best achieved performance.

performances

if requested, a data frame of all parameter combinations along with the corresponding performance results.

train.ind

list of index vectors used for splits into training and validation sets.

best.model

if requested, the model trained on the complete training data using the best parameter combination.

best.tune() returns the best model detected by tune.

Author(s)

David Meyer
[email protected]

See Also

tune.control, plot.tune, tune.svm, tune.wrapper

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  data(iris)
  ## tune `svm' for classification with RBF-kernel (default in svm),
  ## using one split for training/validation set
  
  obj <- tune(svm, Species~., data = iris, 
              ranges = list(gamma = 2^(-1:1), cost = 2^(2:4)),
              tunecontrol = tune.control(sampling = "fix")
             )

  ## alternatively:
  ## obj <- tune.svm(Species~., data = iris, gamma = 2^(-1:1), cost = 2^(2:4))

  summary(obj)
  plot(obj)

  ## tune `knn' using a convenience function; this time with the
  ## conventional interface and bootstrap sampling:
  x <- iris[,-5]
  y <- iris[,5]
  obj2 <- tune.knn(x, y, k = 1:5, tunecontrol = tune.control(sampling = "boot"))
  summary(obj2)
  plot(obj2)

  ## tune `rpart' for regression, using 10-fold cross validation (default)
  data(mtcars)
  obj3 <- tune.rpart(mpg~., data = mtcars, minsplit = c(5,10,15))
  summary(obj3)
  plot(obj3)

  ## simple error estimation for lm using 10-fold cross validation
  tune(lm, mpg~., data = mtcars)

Example output

Parameter tuning of 'svm':

- sampling method: fixed training/validation set 

- best parameters:
 gamma cost
   0.5    4

- best performance: 0.02 

- Detailed performance results:
  gamma cost error dispersion
1   0.5    4  0.02         NA
2   1.0    4  0.02         NA
3   2.0    4  0.02         NA
4   0.5    8  0.02         NA
5   1.0    8  0.02         NA
6   2.0    8  0.02         NA
7   0.5   16  0.02         NA
8   1.0   16  0.02         NA
9   2.0   16  0.02         NA


Parameter tuning of 'knn.wrapper':

- sampling method: bootstrapping 

- best parameters:
 k
 4

- best performance: 0.04245643 

- Detailed performance results:
  k      error dispersion
1 1 0.04702013 0.02753035
2 2 0.04961913 0.03264160
3 3 0.04790305 0.02947715
4 4 0.04245643 0.02736114
5 5 0.04724714 0.02834721


Parameter tuning of 'rpart.wrapper':

- sampling method: 10-fold cross validation 

- best parameters:
 minsplit
        5

- best performance: 13.49869 

- Detailed performance results:
  minsplit    error dispersion
1        5 13.49869   10.09678
2       10 14.84880   10.96759
3       15 21.77051   13.50412


Error estimation of 'lm' using 10-fold cross validation: 15.94817

e1071 documentation built on July 28, 2018, 5:02 p.m.