cvTool | R Documentation |
Basic function to estimate the prediction error of a
model via (repeated) K
-fold cross-validation. The
model is thereby specified by an unevaluated function
call to a model fitting function.
cvTool(call, data = NULL, x = NULL, y, cost = rmspe,
folds, names = NULL, predictArgs = list(),
costArgs = list(), envir = parent.frame())
call |
an unevaluated function call for fitting a
model (see |
data |
a data frame containing the variables
required for fitting the models. This is typically used
if the model in the function call is described by a
|
x |
a numeric matrix containing the predictor variables. This is typically used if the function call for fitting the models requires the predictor matrix and the response to be supplied as separate arguments. |
y |
a numeric vector or matrix containing the response. |
cost |
a cost function measuring prediction loss.
It should expect the observed values of the response to
be passed as the first argument and the predicted values
as the second argument, and must return either a
non-negative scalar value, or a list with the first
component containing the prediction error and the second
component containing the standard error. The default is
to use the root mean squared prediction error (see
|
folds |
an object of class |
names |
an optional character vector giving names for the arguments containing the data to be used in the function call (see “Details”). |
predictArgs |
a list of additional arguments to be
passed to the |
costArgs |
a list of additional arguments to be
passed to the prediction loss function |
envir |
the |
(Repeated) K
-fold cross-validation is performed in
the following way. The data are first split into K
previously obtained blocks of approximately equal size
(given by folds
). Each of the K
data blocks
is left out once to fit the model, and predictions are
computed for the observations in the left-out block with
the predict
method of the fitted
model. Thus a prediction is obtained for each
observation.
The response variable and the obtained predictions for
all observations are then passed to the prediction loss
function cost
to estimate the prediction error.
For repeated cross-validation (as indicated by
folds
), this process is replicated and the
estimated prediction errors from all replications are
returned.
Furthermore, if the response is a vector but the
predict
method of the fitted models
returns a matrix, the prediction error is computed for
each column. A typical use case for this behavior would
be if the predict
method returns
predictions from an initial model fit and stepwise
improvements thereof.
If data
is supplied, all variables required for
fitting the models are added as one argument to the
function call, which is the typical behavior of model
fitting functions with a formula
interface. In this case, a character string specifying
the argument name can be passed via names
(the
default is to use "data"
).
If x
is supplied, on the other hand, the predictor
matrix and the response are added as separate arguments
to the function call. In this case, names
should
be a character vector of length two, with the first
element specifying the argument name for the predictor
matrix and the second element specifying the argument
name for the response (the default is to use c("x",
"y")
). It should be noted that data
takes
precedence over x
if both are supplied.
If only one replication is requested and the prediction
loss function cost
also returns the standard
error, a list is returned, with the first component
containing the estimated prediction errors and the second
component the corresponding estimated standard errors.
Otherwise the return value is a numeric matrix in which each column contains the respective estimated prediction errors from all replications.
Andreas Alfons
cvFit
, cvTuning
,
cvFolds
, cost
library("robustbase")
data("coleman")
set.seed(1234) # set seed for reproducibility
# set up function call for an MM regression model
call <- call("lmrob", formula = Y ~ .)
# set up folds for cross-validation
folds <- cvFolds(nrow(coleman), K = 5, R = 10)
# perform cross-validation
cvTool(call, data = coleman, y = coleman$Y, cost = rtmspe,
folds = folds, costArgs = list(trim = 0.1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.