Description Usage Arguments Details Value References Examples
View source: R/errorest-632plus.r
For a given data matrix and its corresponding vector of labels, we calculate the .632+ error rate from Efron and Tibshirani (1997) for a given classifier.
1 2 3 | errorest_632plus(x, y, train, classify,
num_bootstraps = 50, apparent = NULL, loo_boot = NULL,
...)
|
x |
a matrix of n observations (rows) and p features (columns) |
y |
a vector of n class labels |
train |
a function that builds the classifier. (See details.) |
classify |
a function that classifies observations
from the constructed classifier from |
num_bootstraps |
the number of bootstrap replications |
apparent |
the apparent error rate for the given
classifier. If |
loo_boot |
the leave-one-out bootstrap error rate
for the given classifier. If |
... |
additional arguments passed to the function
specified in |
To calculate the .632+ error rate, we compute the leave-one-out (LOO) bootstrap error rate and the apparent error rate. Then, we compute the 'relative overfitting rate' based on these values. Next, we compute the 'no-information error rate'. Finally, we compute the .632+ error rate estimator from these values.
The 'no-information error rate', γ, is the error rate of the classifier if the error rate if the feature vectors and the class labels were independent. For K classes, we can estimate γ by
\hat{γ} = ∑_{k=1}^K p_k * (1 - q_k)
, where p_k is the observed proportion of responses for class k and q_k is the proportion of observations classified as class k.
To calculate the apparent error rate, we use the
errorest_apparent
function. Similarly, to
calculate the LOO bootstrap (LOO-Boot) error rate, we use
the errorest_loo_boot
function. In some cases
(e.g. simulation study) one, if not both, of these error
rate estimators might already be computed. Hence, we
allow the user to provide these values if they are
already computed; by default, the arguments are
NULL
to indicate that they are unavailable.
We expect that the first two arguments of the classifier
function given in train
are x
and y
,
corresponding to the data matrix and the vector of their
labels. Additional arguments can be passed to the
train
function. The returned object should be a
classifier that will be passed to the function given in
the classify
argument.
We stay with the usual R convention for the
classify
function. We expect that this function
takes two arguments: 1. an object
argument which
contains the trained classifier returned from the
function specified in train
; and 2. a
newdata
argument which contains a matrix of
observations to be classified – the matrix should have
rows corresponding to the individual observations and
columns corresponding to the features (covariates).
the 632+ error rate estimate
Efron, Bradley and Tibshirani, Robert (1997), "Improvements on Cross-Validation: The .632+ Bootstrap Method," Journal of American Statistical Association, 92, 438, 548-560.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | require('MASS')
iris_x <- data.matrix(iris[, -5])
iris_y <- iris[, 5]
# Because the \code{classify} function returns multiples objects in a list,
# we provide a wrapper function that returns only the class labels.
lda_wrapper <- function(object, newdata) { predict(object, newdata)$class }
set.seed(42)
# We compute the apparent and LOO-Boot error rates up front to demonstrate
# that they can be computed before the \code{errorest_632plus} function is called.
set.seed(42)
apparent <- errorest_apparent(x = iris_x, y = iris_y, train = MASS:::lda,
classify = lda_wrapper)
set.seed(42)
loo_boot <- errorest_loo_boot(x = iris_x, y = iris_y, train = MASS:::lda,
classify = lda_wrapper)
# Each of the following 3 calls should result in the same error rate.
# 1. The apparent error rate is provided, while the LOO-Boot must be computed.
set.seed(42)
errorest_632plus(x = iris_x, y = iris_y, train = MASS:::lda,
classify = lda_wrapper, apparent = apparent)
# 2. The LOO-Boot error rate is provided, while the apparent must be computed.
set.seed(42)
errorest_632plus(x = iris_x, y = iris_y, train = MASS:::lda,
classify = lda_wrapper, loo_boot = loo_boot)
# 3. Both error rates are provided, so the calculation is quick.
errorest_632plus(x = iris_x, y = iris_y, train = MASS:::lda,
classify = lda_wrapper, apparent = apparent,
loo_boot = loo_boot)
# In each case the output is: 0.02194472
|
Loading required package: MASS
[1] 0.02194472
[1] 0.02194472
[1] 0.02194472
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.