repCV | R Documentation |
Estimate the prediction error of a linear model via
(repeated) K
-fold cross-validation.
Cross-validation functions are available for least
squares fits computed with lm
as
well as for the following robust alternatives: MM-type
models computed with lmrob
and
least trimmed squares fits computed with
ltsReg
.
repCV(object, ...)
## S3 method for class 'lm'
repCV(object, cost = rmspe, K = 5, R = 1,
foldType = c("random", "consecutive", "interleaved"),
grouping = NULL, folds = NULL, seed = NULL, ...)
## S3 method for class 'lmrob'
repCV(object, cost = rtmspe, K = 5,
R = 1,
foldType = c("random", "consecutive", "interleaved"),
grouping = NULL, folds = NULL, seed = NULL, ...)
## S3 method for class 'lts'
repCV(object, cost = rtmspe, K = 5, R = 1,
foldType = c("random", "consecutive", "interleaved"),
grouping = NULL, folds = NULL,
fit = c("reweighted", "raw", "both"), seed = NULL, ...)
cvLm(object, cost = rmspe, K = 5, R = 1,
foldType = c("random", "consecutive", "interleaved"),
grouping = NULL, folds = NULL, seed = NULL, ...)
cvLmrob(object, cost = rtmspe, K = 5, R = 1,
foldType = c("random", "consecutive", "interleaved"),
grouping = NULL, folds = NULL, seed = NULL, ...)
cvLts(object, cost = rtmspe, K = 5, R = 1,
foldType = c("random", "consecutive", "interleaved"),
grouping = NULL, folds = NULL,
fit = c("reweighted", "raw", "both"), seed = NULL, ...)
object |
an object returned from a model fitting
function. Methods are implemented for objects of class
|
cost |
a cost function measuring prediction loss.
It should expect the observed values of the response to
be passed as the first argument and the predicted values
as the second argument, and must return either a
non-negative scalar value, or a list with the first
component containing the prediction error and the second
component containing the standard error. The default is
to use the root mean squared prediction error for the
|
K |
an integer giving the number of folds into which
the data should be split (the default is five). Keep in
mind that this should be chosen such that all folds are
of approximately equal size. Setting |
R |
an integer giving the number of replications for
repeated |
foldType |
a character string specifying the type of
folds to be generated. Possible values are
|
grouping |
a factor specifying groups of observations. If supplied, the data are split according to the groups rather than individual observations such that all observations within a group belong to the same fold. |
folds |
an object of class |
fit |
a character string specifying for which fit to
estimate the prediction error. Possible values are
|
seed |
optional initial seed for the random number
generator (see |
... |
additional arguments to be passed to the
prediction loss function |
(Repeated) K
-fold cross-validation is performed in
the following way. The data are first split into K
previously obtained blocks of approximately equal size.
Each of the K
data blocks is left out once to fit
the model, and predictions are computed for the
observations in the left-out block with the
predict
method of the fitted model.
Thus a prediction is obtained for each observation.
The response variable and the obtained predictions for
all observations are then passed to the prediction loss
function cost
to estimate the prediction error.
For repeated cross-validation, this process is replicated
and the estimated prediction errors from all replications
as well as their average are included in the returned
object.
An object of class "cv"
with the following
components:
n |
an integer giving the number of observations or groups. |
K |
an integer giving the number of folds. |
R |
an integer giving the number of replications. |
cv |
a numeric vector containing the estimated
prediction errors. For the |
se |
a numeric vector containing the estimated
standard errors of the prediction loss. For the
|
reps |
a numeric matrix containing the estimated
prediction errors from all replications. For the
|
seed |
the seed of the random number generator before cross-validation was performed. |
call |
the matched function call. |
The repCV
methods are simple wrapper functions
that extract the data from the fitted model and call
cvFit
to perform cross-validation. In
addition, cvLm
, cvLmrob
and cvLts
are aliases for the respective methods.
Andreas Alfons
cvFit
, cvFolds
,
cost
, lm
,
lmrob
,
ltsReg
library("robustbase")
data("coleman")
set.seed(1234) # set seed for reproducibility
# set up folds for cross-validation
folds <- cvFolds(nrow(coleman), K = 5, R = 10)
# perform cross-validation for an LS regression model
fitLm <- lm(Y ~ ., data = coleman)
repCV(fitLm, cost = rtmspe, folds = folds, trim = 0.1)
# perform cross-validation for an MM regression model
fitLmrob <- lmrob(Y ~ ., data = coleman)
repCV(fitLmrob, cost = rtmspe, folds = folds, trim = 0.1)
# perform cross-validation for an LTS regression model
fitLts <- ltsReg(Y ~ ., data = coleman)
repCV(fitLts, cost = rtmspe, folds = folds, trim = 0.1)
repCV(fitLts, cost = rtmspe, folds = folds,
fit = "both", trim = 0.1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.