errorest_632plus: Calculates the .632+ Error Rate for a specified classifier...

Description Usage Arguments Details Value References Examples

View source: R/errorest-632plus.r

Description

For a given data matrix and its corresponding vector of labels, we calculate the .632+ error rate from Efron and Tibshirani (1997) for a given classifier.

Usage

1
2
3
  errorest_632plus(x, y, train, classify,
    num_bootstraps = 50, apparent = NULL, loo_boot = NULL,
    ...)

Arguments

x

a matrix of n observations (rows) and p features (columns)

y

a vector of n class labels

train

a function that builds the classifier. (See details.)

classify

a function that classifies observations from the constructed classifier from train. (See details.)

num_bootstraps

the number of bootstrap replications

apparent

the apparent error rate for the given classifier. If NULL, this argument is ignored. See Details.

loo_boot

the leave-one-out bootstrap error rate for the given classifier. If NULL, this argument is ignored. See Details.

...

additional arguments passed to the function specified in train.

Details

To calculate the .632+ error rate, we compute the leave-one-out (LOO) bootstrap error rate and the apparent error rate. Then, we compute the 'relative overfitting rate' based on these values. Next, we compute the 'no-information error rate'. Finally, we compute the .632+ error rate estimator from these values.

The 'no-information error rate', γ, is the error rate of the classifier if the error rate if the feature vectors and the class labels were independent. For K classes, we can estimate γ by

\hat{γ} = ∑_{k=1}^K p_k * (1 - q_k)

, where p_k is the observed proportion of responses for class k and q_k is the proportion of observations classified as class k.

To calculate the apparent error rate, we use the errorest_apparent function. Similarly, to calculate the LOO bootstrap (LOO-Boot) error rate, we use the errorest_loo_boot function. In some cases (e.g. simulation study) one, if not both, of these error rate estimators might already be computed. Hence, we allow the user to provide these values if they are already computed; by default, the arguments are NULL to indicate that they are unavailable.

We expect that the first two arguments of the classifier function given in train are x and y, corresponding to the data matrix and the vector of their labels. Additional arguments can be passed to the train function. The returned object should be a classifier that will be passed to the function given in the classify argument.

We stay with the usual R convention for the classify function. We expect that this function takes two arguments: 1. an object argument which contains the trained classifier returned from the function specified in train; and 2. a newdata argument which contains a matrix of observations to be classified – the matrix should have rows corresponding to the individual observations and columns corresponding to the features (covariates).

Value

the 632+ error rate estimate

References

Efron, Bradley and Tibshirani, Robert (1997), "Improvements on Cross-Validation: The .632+ Bootstrap Method," Journal of American Statistical Association, 92, 438, 548-560.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
require('MASS')
iris_x <- data.matrix(iris[, -5])
iris_y <- iris[, 5]

# Because the \code{classify} function returns multiples objects in a list,
# we provide a wrapper function that returns only the class labels.
lda_wrapper <- function(object, newdata) { predict(object, newdata)$class }
set.seed(42)

# We compute the apparent and LOO-Boot error rates up front to demonstrate
# that they can be computed before the \code{errorest_632plus} function is called.

set.seed(42)
apparent <- errorest_apparent(x = iris_x, y = iris_y, train = MASS:::lda,
                              classify = lda_wrapper)
set.seed(42)
loo_boot <- errorest_loo_boot(x = iris_x, y = iris_y, train = MASS:::lda,
                              classify = lda_wrapper)

# Each of the following 3 calls should result in the same error rate.
# 1. The apparent error rate is provided, while the LOO-Boot must be computed.
set.seed(42)
errorest_632plus(x = iris_x, y = iris_y, train = MASS:::lda,
                 classify = lda_wrapper, apparent = apparent)
# 2. The LOO-Boot error rate is provided, while the apparent must be computed.
set.seed(42)
errorest_632plus(x = iris_x, y = iris_y, train = MASS:::lda,
                 classify = lda_wrapper, loo_boot = loo_boot)
# 3. Both error rates are provided, so the calculation is quick.
errorest_632plus(x = iris_x, y = iris_y, train = MASS:::lda,
                 classify = lda_wrapper, apparent = apparent,
                 loo_boot = loo_boot)

# In each case the output is: 0.02194472

Example output

Loading required package: MASS
[1] 0.02194472
[1] 0.02194472
[1] 0.02194472

sortinghat documentation built on May 30, 2017, 4:52 a.m.