perry-methods: Resampling-based prediction error for fitted models

Description Usage Arguments Value Note Author(s) See Also Examples

Description

Estimate the prediction error of a fitted model via (repeated) K-fold cross-validation, (repeated) random splitting (also known as random subsampling or Monte Carlo cross-validation), or the bootstrap. Methods are available for least squares fits computed with lm as well as for the following robust alternatives: MM-type models computed with lmrob and least trimmed squares fits computed with ltsReg.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## S3 method for class 'lm'
perry(
  object,
  splits = foldControl(),
  cost = rmspe,
  ncores = 1,
  cl = NULL,
  seed = NULL,
  ...
)

## S3 method for class 'lmrob'
perry(
  object,
  splits = foldControl(),
  cost = rtmspe,
  ncores = 1,
  cl = NULL,
  seed = NULL,
  ...
)

## S3 method for class 'lts'
perry(
  object,
  splits = foldControl(),
  fit = c("reweighted", "raw", "both"),
  cost = rtmspe,
  ncores = 1,
  cl = NULL,
  seed = NULL,
  ...
)

Arguments

object

the fitted model for which to estimate the prediction error.

splits

an object of class "cvFolds" (as returned by cvFolds) or a control object of class "foldControl" (see foldControl) defining the folds of the data for (repeated) K-fold cross-validation, an object of class "randomSplits" (as returned by randomSplits) or a control object of class "splitControl" (see splitControl) defining random data splits, or an object of class "bootSamples" (as returned by bootSamples) or a control object of class "bootControl" (see bootControl) defining bootstrap samples.

cost

a cost function measuring prediction loss. It should expect the observed values of the response to be passed as the first argument and the predicted values as the second argument, and must return either a non-negative scalar value, or a list with the first component containing the prediction error and the second component containing the standard error. The default is to use the root mean squared prediction error for the "lm" method and the root trimmed mean squared prediction error for the "lmrob" and "lts" methods (see cost).

ncores

a positive integer giving the number of processor cores to be used for parallel computing (the default is 1 for no parallelization). If this is set to NA, all available processor cores are used.

cl

a parallel cluster for parallel computing as generated by makeCluster. If supplied, this is preferred over ncores.

seed

optional initial seed for the random number generator (see .Random.seed). Note that also in case of parallel computing, resampling is performed on the manager process rather than the worker processes. On the parallel worker processes, random number streams are used and the seed is set via clusterSetRNGStream.

...

additional arguments to be passed to the prediction loss function cost.

fit

a character string specifying for which fit to estimate the prediction error. Possible values are "reweighted" (the default) for the prediction error of the reweighted fit, "raw" for the prediction error of the raw fit, or "both" for the prediction error of both fits.

Value

An object of class "perry" with the following components:

pe

a numeric vector containing the estimated prediction errors. For the "lm" and "lmrob" methods, this is a single numeric value. For the "lts" method, this contains one value for each of the requested fits. In case of more than one replication, those are average values over all replications.

se

a numeric vector containing the estimated standard errors of the prediction loss. For the "lm" and "lmrob" methods, this is a single numeric value. For the "lts" method, this contains one value for each of the requested fits.

reps

a numeric matrix containing the estimated prediction errors from all replications. For the "lm" and "lmrob" methods, this is a matrix with one column. For the "lts" method, this contains one column for each of the requested fits. However, this is only returned in case of more than one replication.

splits

an object giving the data splits used to estimate the prediction error.

y

the response.

yHat

a list containing the predicted values from all replications.

call

the matched function call.

Note

The perry methods extract the data from the fitted model and call perryFit to perform resampling-based prediction error estimation.

Author(s)

Andreas Alfons

See Also

perryFit

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## load data
data("Bundesliga")
n <- nrow(Bundesliga)

## fit linear model
Bundesliga$logMarketValue <- log(Bundesliga$MarketValue)
fit <- lm(logMarketValue ~ Contract + Matches + Goals + Assists, 
          data=Bundesliga)

## perform K-fold cross-validation
perry(fit, foldControl(K = 5, R = 10), seed = 1234)

## perform random splitting
perry(fit, splitControl(m = n/3, R = 10), seed = 1234)

## perform bootstrap prediction error estimation
# 0.632 estimator
perry(fit, bootControl(R = 10, type = "0.632"), seed = 1234)
# out-of-bag estimator
perry(fit, bootControl(R = 10, type = "out-of-bag"), seed = 1234)

perryExamples documentation built on Nov. 3, 2021, 5:07 p.m.