Description Usage Arguments Details Value Author(s) See Also
Estimate the prediction error of a model via (repeated) Kfold crossvalidation, (repeated) random splitting (also known as random subsampling or Monte Carlo crossvalidation), or the bootstrap. It is thereby possible to supply an object returned by a model fitting function, a model fitting function itself, or an unevaluated function call to a model fitting function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60  perryFit(object, ...)
## Default S3 method:
perryFit(
object,
data = NULL,
x = NULL,
y,
splits = foldControl(),
predictFun = predict,
predictArgs = list(),
cost = rmspe,
costArgs = list(),
names = NULL,
envir = parent.frame(),
ncores = 1,
cl = NULL,
seed = NULL,
...
)
## S3 method for class ''function''
perryFit(
object,
formula,
data = NULL,
x = NULL,
y,
args = list(),
splits = foldControl(),
predictFun = predict,
predictArgs = list(),
cost = rmspe,
costArgs = list(),
names = NULL,
envir = parent.frame(),
ncores = 1,
cl = NULL,
seed = NULL,
...
)
## S3 method for class 'call'
perryFit(
object,
data = NULL,
x = NULL,
y,
splits = foldControl(),
predictFun = predict,
predictArgs = list(),
cost = rmspe,
costArgs = list(),
names = NULL,
envir = parent.frame(),
ncores = 1,
cl = NULL,
seed = NULL,
...
)

object 
the fitted model for which to estimate the prediction error,
a function for fitting a model, or an unevaluated function call for fitting
a model (see 
... 
additional arguments to be passed down. 
data 
a data frame containing the variables required for fitting the
models. This is typically used if the model in the function call is
described by a 
x 
a numeric matrix containing the predictor variables. This is typically used if the function call for fitting the models requires the predictor matrix and the response to be supplied as separate arguments. 
y 
a numeric vector or matrix containing the response. 
splits 
an object of class 
predictFun 
a function to compute predictions for the test data. It
should expect the fitted model to be passed as the first argument and the test
data as the second argument, and must return either a vector or a matrix
containing the predicted values. The default is to use the

predictArgs 
a list of additional arguments to be passed to

cost 
a cost function measuring prediction loss. It should expect
the observed values of the response to be passed as the first argument and
the predicted values as the second argument, and must return either a
nonnegative scalar value, or a list with the first component containing
the prediction error and the second component containing the standard
error. The default is to use the root mean squared prediction error
(see 
costArgs 
a list of additional arguments to be passed to the
prediction loss function 
names 
an optional character vector giving names for the arguments containing the data to be used in the function call (see “Details”). 
envir 
the 
ncores 
a positive integer giving the number of processor cores to be
used for parallel computing (the default is 1 for no parallelization). If
this is set to 
cl 
a parallel cluster for parallel computing as generated by

seed 
optional initial seed for the random number generator (see

formula 
a 
args 
a list of additional arguments to be passed to the model fitting function. 
(Repeated) Kfold crossvalidation is performed in the following
way. The data are first split into K previously obtained blocks of
approximately equal size (given by folds
). Each of the K data
blocks is left out once to fit the model, and predictions are computed for
the observations in the leftout block with predictFun
. Thus a
prediction is obtained for each observation. The response and the obtained
predictions for all observations are then passed to the prediction loss
function cost
to estimate the prediction error. For repeated
Kfold crossvalidation (as indicated by splits
), this process
is replicated and the estimated prediction errors from all replications are
returned.
(Repeated) random splitting is performed similarly. In each replication,
the data are split into a training set and a test set at random. Then the
training data are used to fit the model, and predictions are computed for
the test data. Hence only the response values from the test data and the
corresponding predictions are passed to the prediction loss function
cost
.
For the bootstrap estimator, each bootstrap sample is used as training data
to fit the model. The outofbag estimator uses the observations that do
not enter the bootstrap sample as test data and computes the prediction loss
function cost
for those outofbag observations. The 0.632 estimator
is computed as a linear combination of the outofbag estimator and the
prediction loss of the fitted values of the model computed from the full
sample.
In any case, if the response is a vector but predictFun
returns a
matrix, the prediction error is computed for each column. A typical use
case for this behavior would be if predictFun
returns predictions
from an initial model fit and stepwise improvements thereof.
If formula
or data
are supplied, all variables required for
fitting the models are added as one argument to the function call, which is
the typical behavior of model fitting functions with a
formula
interface. In this case, the accepted values
for names
depend on the method. For the function
method, a
character vector of length two should supplied, with the first element
specifying the argument name for the formula and the second element
specifying the argument name for the data (the default is to use
c("formula", "data")
). Note that names for both arguments should be
supplied even if only one is actually used. For the other methods, which do
not have a formula
argument, a character string specifying the
argument name for the data should be supplied (the default is to use
"data"
).
If x
is supplied, on the other hand, the predictor matrix and the
response are added as separate arguments to the function call. In this
case, names
should be a character vector of length two, with the
first element specifying the argument name for the predictor matrix and the
second element specifying the argument name for the response (the default is
to use c("x", "y")
). It should be noted that the formula
or
data
arguments take precedence over x
.
An object of class "perry"
with the following components:
pe
a numeric vector containing the respective estimated prediction errors. In case of more than one replication, those are average values over all replications.
se
a numeric vector containing the respective estimated standard errors of the prediction loss.
reps
a numeric matrix in which each column contains the respective estimated prediction errors from all replications. This is only returned in case of more than one replication.
splits
an object giving the data splits used to estimate the prediction error.
y
the response.
yHat
a list containing the predicted values from all replications.
call
the matched function call.
Andreas Alfons
perrySelect
, perryTuning
,
cvFolds
, randomSplits
,
bootSamples
, cost
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.