Description Usage Arguments Details Value References Examples
For a given data matrix and its corresponding vector of labels, we calculate the .632 error rate from Efron (1983) for a given classifier.
1 2 | errorest_632(x, y, train, classify, num_bootstraps = 50,
apparent = NULL, loo_boot = NULL, ...)
|
x |
a matrix of n observations (rows) and p features (columns) |
y |
a vector of n class labels |
train |
a function that builds the classifier. (See details.) |
classify |
a function that classifies observations
from the constructed classifier from |
num_bootstraps |
the number of bootstrap replications |
apparent |
the apparent error rate for the given
classifier. If |
loo_boot |
the leave-one-out bootstrap error rate
for the given classifier. If |
... |
additional arguments passed to the function
specified in |
To calculate the .632 error rate, we compute the leave-one-out bootstrap (LOO-Boot) error rate and the apparent error rate (AER). Then, we compute a convex combination of these two error rates estimators.
To calculate the AER, we use the
errorest_apparent
function. Similarly, we
use the errorest_loo_boot
function to
calculate the (LOO-Boot error rate. In some cases (e.g.,
simulation study) one, if not both, of these error rate
estimators might already be computed. Hence, we allow the
user to provide these values if they are already
computed; by default, the arguments are NULL
to
indicate that they are ignored.
For the given classifier, two functions must be provided
1. to train the classifier and 2. to classify unlabeled
observations. The training function is provided as
train
and the classification function as
classify
.
We expect that the first two arguments of the
train
function are x
and y
,
corresponding to the data matrix and the vector of their
labels, respectively. Additional arguments can be passed
to the train
function.
We stay with the usual R convention for the
classify
function. We expect that this function
takes two arguments: 1. an object
argument which
contains the trained classifier returned from the
function specified in train
; and 2. a
newdata
argument which contains a matrix of
observations to be classified – the matrix should have
rows corresponding to the individual observations and
columns corresponding to the features (covariates). For
an example, see lda
.
the 632 error rate estimate
Efron, Bradley (1983), "Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation," Journal of American Statistical Association, 78, 382, 316-331.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | require('MASS')
iris_x <- data.matrix(iris[, -5])
iris_y <- iris[, 5]
# Because the \code{classify} function returns multiples objects in a list,
# we provide a wrapper function that returns only the class labels.
lda_wrapper <- function(object, newdata) { predict(object, newdata)$class }
# We compute the apparent and LOO-Boot error rates up front to demonstrate
# that they can be computed before the \code{errorest_632} function is called.
set.seed(42)
apparent <- errorest_apparent(x = iris_x, y = iris_y, train = MASS:::lda,
classify = lda_wrapper)
set.seed(42)
loo_boot <- errorest_loo_boot(x = iris_x, y = iris_y, train = MASS:::lda,
classify = lda_wrapper)
# Each of the following 3 calls should result in the same error rate.
# 1. The apparent error rate is provided, while the LOO-Boot must be computed.
set.seed(42)
errorest_632(x = iris_x, y = iris_y, train = MASS:::lda, classify = lda_wrapper,
apparent = apparent)
# 2. The LOO-Boot error rate is provided, while the apparent must be computed.
set.seed(42)
errorest_632(x = iris_x, y = iris_y, train = MASS:::lda, classify = lda_wrapper,
loo_boot = loo_boot)
# 3. Both error rates are provided, so the calculation is quick.
errorest_632(x = iris_x, y = iris_y, train = MASS:::lda, classify = lda_wrapper,
apparent = apparent, loo_boot = loo_boot)
# In each case the output is: 0.02194132
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.