errorest_632plus: Calculates the .632+ Error Rate for a specified classifier...
In sortinghat: sortinghat

Description Usage Arguments Details Value References Examples

For a given data matrix and its corresponding vector of labels, we calculate the .632+ error rate from Efron and Tibshirani (1997) for a given classifier.

1
2
3

  errorest_632plus(x, y, train, classify,
    num_bootstraps = 50, apparent = NULL, loo_boot = NULL,
    ...)

`x`	a matrix of n observations (rows) and p features (columns)
`y`	a vector of n class labels
`train`	a function that builds the classifier. (See details.)
`classify`	a function that classifies observations from the constructed classifier from `train`. (See details.)
`num_bootstraps`	the number of bootstrap replications
`apparent`	the apparent error rate for the given classifier. If `NULL`, this argument is ignored. See Details.
`loo_boot`	the leave-one-out bootstrap error rate for the given classifier. If `NULL`, this argument is ignored. See Details.
`...`	additional arguments passed to the function specified in `train`.

To calculate the .632+ error rate, we compute the leave-one-out (LOO) bootstrap error rate and the apparent error rate. Then, we compute the 'relative overfitting rate' based on these values. Next, we compute the 'no-information error rate'. Finally, we compute the .632+ error rate estimator from these values.

The 'no-information error rate', γ, is the error rate of the classifier if the error rate if the feature vectors and the class labels were independent. For K classes, we can estimate γ by

\hat{γ} = ∑_{k=1}^K p_k * (1 - q_k)

, where p_k is the observed proportion of responses for class k and q_k is the proportion of observations classified as class k.

To calculate the apparent error rate, we use the errorest_apparent function. Similarly, to calculate the LOO bootstrap (LOO-Boot) error rate, we use the errorest_loo_boot function. In some cases (e.g. simulation study) one, if not both, of these error rate estimators might already be computed. Hence, we allow the user to provide these values if they are already computed; by default, the arguments are NULL to indicate that they are unavailable.

We expect that the first two arguments of the classifier function given in train are x and y, corresponding to the data matrix and the vector of their labels. Additional arguments can be passed to the train function. The returned object should be a classifier that will be passed to the function given in the classify argument.

We stay with the usual R convention for the classify function. We expect that this function takes two arguments: 1. an object argument which contains the trained classifier returned from the function specified in train; and 2. a newdata argument which contains a matrix of observations to be classified – the matrix should have rows corresponding to the individual observations and columns corresponding to the features (covariates).

the 632+ error rate estimate

Efron, Bradley and Tibshirani, Robert (1997), "Improvements on Cross-Validation: The .632+ Bootstrap Method," Journal of American Statistical Association, 92, 438, 548-560.

require('MASS')
iris_x <- data.matrix(iris[, -5])
iris_y <- iris[, 5]

# Because the \code{classify} function returns multiples objects in a list,
# we provide a wrapper function that returns only the class labels.
lda_wrapper <- function(object, newdata) { predict(object, newdata)$class }
set.seed(42)

# We compute the apparent and LOO-Boot error rates up front to demonstrate
# that they can be computed before the \code{errorest_632plus} function is called.

set.seed(42)
apparent <- errorest_apparent(x = iris_x, y = iris_y, train = MASS:::lda,
                              classify = lda_wrapper)
set.seed(42)
loo_boot <- errorest_loo_boot(x = iris_x, y = iris_y, train = MASS:::lda,
                              classify = lda_wrapper)

# Each of the following 3 calls should result in the same error rate.
# 1. The apparent error rate is provided, while the LOO-Boot must be computed.
set.seed(42)
errorest_632plus(x = iris_x, y = iris_y, train = MASS:::lda,
                 classify = lda_wrapper, apparent = apparent)
# 2. The LOO-Boot error rate is provided, while the apparent must be computed.
set.seed(42)
errorest_632plus(x = iris_x, y = iris_y, train = MASS:::lda,
                 classify = lda_wrapper, loo_boot = loo_boot)
# 3. Both error rates are provided, so the calculation is quick.
errorest_632plus(x = iris_x, y = iris_y, train = MASS:::lda,
                 classify = lda_wrapper, apparent = apparent,
                 loo_boot = loo_boot)

# In each case the output is: 0.02194472

Loading required package: MASS
[1] 0.02194472
[1] 0.02194472
[1] 0.02194472

sortinghat documentation built on May 30, 2017, 4:52 a.m.

sortinghat index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sortinghat
sortinghat

errorest_632plus: Calculates the .632+ Error Rate for a specified classifier...
In sortinghat: sortinghat

Description

Usage

Arguments

Details

Value

References

Examples

Example output

Related to errorest_632plus in sortinghat...

R Package Documentation

Browse R Packages

We want your feedback!

sortinghat sortinghat

errorest_632plus: Calculates the .632+ Error Rate for a specified classifier... In sortinghat: sortinghat

Description

Usage

Arguments

Details

Value

References

Examples

Example output

Related to errorest_632plus in sortinghat...

R Package Documentation

Browse R Packages

We want your feedback!

sortinghat
sortinghat

errorest_632plus: Calculates the .632+ Error Rate for a specified classifier...
In sortinghat: sortinghat