Description Usage Arguments Details Value Examples
View source: R/errorest-loo-boot.r
For a given data matrix and its corresponding vector of labels, we calculate the LOO bootstrap (LOO-Boot) error rate for a given classifier.
1 2 | errorest_loo_boot(x, y, train, classify,
num_bootstraps = 50, ...)
|
x |
a matrix of n observations (rows) and p features (columns) |
y |
a vector of n class labels |
train |
a function that builds the classifier. (See details.) |
classify |
a function that classifies observations
from the constructed classifier from |
num_bootstraps |
the number of bootstrap replications |
... |
additional arguments passed to the function
specified in |
To calculate the LOO-Boot error rate, we sample from the
data with replacement to obtain a bootstrapped training
data set. We then train the given classifier (given in
train
) on the bootstrapped training data set and
classify the observations from the original data set
given in the matrix x
that are not contained in
the current bootstrapped training data set. We repeat
this process num_bootstraps
times. Then, for each
observation in the original data set, we compute the
proportion of times the observation was misclassified,
based on the true labels given in y
. We report the
average of these proportions as the LOO-Boot error rate.
For the given classifier, two functions must be provided
1. to train the classifier and 2. to classify unlabeled
observations. The training function is provided as
train
and the classification function as
classify
.
We expect that the first two arguments of the
train
function are x
and y
,
corresponding to the data matrix and the vector of their
labels, respectively. Additional arguments can be passed
to the train
function.
We stay with the usual R convention for the
classify
function. We expect that this function
takes two arguments: 1. an object
argument which
contains the trained classifier returned from the
function specified in train
; and 2. a
newdata
argument which contains a matrix of
observations to be classified – the matrix should have
rows corresponding to the individual observations and
columns corresponding to the features (covariates). For
an example, see lda
.
the LOO-Boot error rate estimate
1 2 3 4 5 6 7 8 9 10 | require('MASS')
iris_x <- data.matrix(iris[, -5])
iris_y <- iris[, 5]
# Because the \code{classify} function returns multiples objects in a list,
# we provide a wrapper function that returns only the class labels.
lda_wrapper <- function(object, newdata) { predict(object, newdata)$class }
set.seed(42)
errorest_loo_boot(x = iris_x, y = iris_y, train = MASS:::lda, classify = lda_wrapper)
# Output: 0.02307171
|
Loading required package: MASS
[1] 0.02307171
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.