Description Usage Arguments Details Value Note Author(s) References See Also Examples
Select tuning parameters of a model by estimating the respective prediction errors via (repeated) Kfold crossvalidation, (repeated) random splitting (also known as random subsampling or Monte Carlo crossvalidation), or the bootstrap. It is thereby possible to supply a model fitting function or an unevaluated function call to a model fitting function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  perryTuning(object, ...)
## S3 method for class 'function'
perryTuning(object, formula,
data = NULL, x = NULL, y, tuning = list(),
args = list(), splits = foldControl(),
predictFun = predict, predictArgs = list(),
cost = rmspe, costArgs = list(),
selectBest = c("min", "hastie"), seFactor = 1,
final = FALSE, names = NULL, envir = parent.frame(),
ncores = 1, cl = NULL, seed = NULL, ...)
## S3 method for class 'call'
perryTuning(object, data = NULL,
x = NULL, y, tuning = list(), splits = foldControl(),
predictFun = predict, predictArgs = list(),
cost = rmspe, costArgs = list(),
selectBest = c("min", "hastie"), seFactor = 1,
final = FALSE, names = NULL, envir = parent.frame(),
ncores = 1, cl = NULL, seed = NULL, ...)

object 
a function or an unevaluated function call
for fitting a model (see 
formula 
a 
data 
a data frame containing the variables
required for fitting the models. This is typically used
if the model in the function call is described by a

x 
a numeric matrix containing the predictor variables. This is typically used if the function call for fitting the models requires the predictor matrix and the response to be supplied as separate arguments. 
y 
a numeric vector or matrix containing the response. 
tuning 
a list of arguments giving the tuning parameter values to be evaluated. The names of the list components should thereby correspond to the argument names of the tuning parameters. For each tuning parameter, a vector of values can be supplied. The prediction error is then estimated for all possible combinations of tuning parameter values. 
args 
a list of additional arguments to be passed to the model fitting function. 
splits 
an object of class 
predictFun 
a function to compute predictions for
the test data. It should expect the fitted model to be
passed as the first argument and the test data as the
second argument, and must return either a vector or a
matrix containing the predicted values. The default is
to use the 
predictArgs 
a list of additional arguments to be
passed to 
cost 
a cost function measuring prediction loss.
It should expect the observed values of the response to
be passed as the first argument and the predicted values
as the second argument, and must return either a
nonnegative scalar value, or a list with the first
component containing the prediction error and the second
component containing the standard error. The default is
to use the root mean squared prediction error (see

costArgs 
a list of additional arguments to be
passed to the prediction loss function 
selectBest 
a character string specifying a
criterion for selecting the best model. Possible values
are 
seFactor 
a numeric value giving a multiplication
factor of the standard error for the selection of the
best model. This is ignored if 
final 
a logical indicating whether to fit the final model with the optimal combination of tuning parameters. 
names 
an optional character vector giving names for the arguments containing the data to be used in the function call (see “Details”). 
envir 
the 
ncores 
a positive integer giving the number of
processor cores to be used for parallel computing (the
default is 1 for no parallelization). If this is set to

cl 
a parallel cluster for parallel computing
as generated by 
seed 
optional initial seed for the random number
generator (see 
... 
additional arguments to be passed down. 
(Repeated) Kfold crossvalidation is performed in
the following way. The data are first split into K
previously obtained blocks of approximately equal size
(given by folds
). Each of the K data blocks
is left out once to fit the model, and predictions are
computed for the observations in the leftout block with
predictFun
. Thus a prediction is obtained for
each observation. The response and the obtained
predictions for all observations are then passed to the
prediction loss function cost
to estimate the
prediction error. For repeated Kfold
crossvalidation (as indicated by splits
), this
process is replicated and the estimated prediction errors
from all replications are returned.
(Repeated) random splitting is performed similarly. In
each replication, the data are split into a training set
and a test set at random. Then the training data are
used to fit the model, and predictions are computed for
the test data. Hence only the response values from the
test data and the corresponding predictions are passed to
the prediction loss function cost
.
For the bootstrap estimator, each bootstrap sample is
used as training data to fit the model. The outofbag
estimator uses the observations that do not enter the
bootstrap sample as test data and computes the prediction
loss function cost
for those outofbag
observations. The 0.632 estimator is computed as a
linear combination of the outofbag estimator and the
prediction loss of the fitted values of the model
computed from the full sample.
In any case, if the response is a vector but
predictFun
returns a matrix, the prediction error
is computed for each column. A typical use case for this
behavior would be if predictFun
returns
predictions from an initial model fit and stepwise
improvements thereof.
If formula
or data
are supplied, all
variables required for fitting the models are added as
one argument to the function call, which is the typical
behavior of model fitting functions with a
formula
interface. In this case,
the accepted values for names
depend on the
method. For the function
method, a character
vector of length two should supplied, with the first
element specifying the argument name for the formula and
the second element specifying the argument name for the
data (the default is to use c("formula", "data")
).
Note that names for both arguments should be supplied
even if only one is actually used. For the call
method, which does not have a formula
argument, a
character string specifying the argument name for the
data should be supplied (the default is to use
"data"
).
If x
is supplied, on the other hand, the predictor
matrix and the response are added as separate arguments
to the function call. In this case, names
should
be a character vector of length two, with the first
element specifying the argument name for the predictor
matrix and the second element specifying the argument
name for the response (the default is to use c("x",
"y")
). It should be noted that the formula
or
data
arguments take precedence over x
.
If tuning
is an empty list, perryFit
is called to return an object of class "perry"
.
Otherwise an object of class "perryTuning"
(which
inherits from class "perrySelect"
) with the
following components is returned:
pe 
a data frame containing the estimated prediction errors for all combinations of tuning parameter values. In case of more than one replication, those are average values over all replications. 
se 
a data frame containing the estimated standard errors of the prediction loss for all combinations of tuning parameter values. 
reps 
a data frame containing the estimated prediction errors from all replications for all combinations of tuning parameter values. This is only returned in case of more than one replication. 
splits 
an object giving the data splits used to estimate the prediction error. 
y 
the response. 
yHat 
a list containing the predicted values for all combinations of tuning parameter values. Each list component is again a list containing the corresponding predicted values from all replications. 
best 
an integer vector giving the indices of the optimal combinations of tuning parameters. 
selectBest 
a character string specifying the criterion used for selecting the best model. 
seFactor 
a numeric value giving the multiplication factor of the standard error used for the selection of the best model. 
tuning 
a data frame containing the grid of tuning parameter values for which the prediction error was estimated. 
finalModel 
the final model fit with the optimal
combination of tuning parameters. This is only returned
if argument 
call 
the matched function call. 
The same data splits are used for all combinations of tuning parameter values for maximum comparability.
If a final model with the optimal combination of tuning
parameters is computed, class "perryTuning"
inherits the coef()
, fitted()
,
predict()
and residuals()
methods from its
component finalModel
.
Andreas Alfons
Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2nd edition.
perryFit
, perrySelect
,
cvFolds
, randomSplits
,
bootSamples
, cost
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  library("perryExamples")
data("coleman")
## evaluate MM regression models tuned for 85% and 95% efficiency
tuning < list(tuning.psi = c(3.443689, 4.685061))
## via model fitting function
# perform crossvalidation
# note that the response is extracted from 'data' in
# this example and does not have to be supplied
perryTuning(lmrob, formula = Y ~ ., data = coleman,
tuning = tuning, splits = foldControl(K = 5, R = 10),
cost = rtmspe, costArgs = list(trim = 0.1), seed = 1234)
## via function call
# set up function call
call < call("lmrob", formula = Y ~ .)
# perform crossvalidation
perryTuning(call, data = coleman, y = coleman$Y,
tuning = tuning, splits = foldControl(K = 5, R = 10),
cost = rtmspe, costArgs = list(trim = 0.1), seed = 1234)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.