Description Usage Arguments Details Value Examples
View source: R/errorest-boot.r
For a given data matrix and its corresponding vector of labels, we calculate the bootstrap error rate for a given classifier.
1 2 | errorest_boot(x, y, train, classify, num_bootstraps = 50,
...)
|
x |
a matrix of n observations (rows) and p features (columns) |
y |
a vector of n class labels |
train |
a function that builds the classifier. (See details.) |
classify |
a function that classifies observations
from the constructed classifier from |
num_bootstraps |
the number of bootstrap replications |
... |
additional arguments passed to the function
specified in |
To calculate the bootstrap error rate, we sample from the
data with replacement to obtain a bootstrapped training
data set. We then train the given classifier (given in
train
) on the bootstrapped training data set and
classify the original data set given in the matrix
x
. Then we calculate the proportion of
misclassified observations, based on the true labels
given in y
, to obtain a single bootstrap error
rate. We repeat this process num_bootstraps
times
and report the average of the bootstrap error rates.
For the given classifier, two functions must be provided
1. to train the classifier and 2. to classify unlabeled
observations. The training function is provided as
train
and the classification function as
classify
.
We expect that the first two arguments of the
train
function are x
and y
,
corresponding to the data matrix and the vector of their
labels, respectively. Additional arguments can be passed
to the train
function.
We stay with the usual R convention for the
classify
function. We expect that this function
takes two arguments: 1. an object
argument which
contains the trained classifier returned from the
function specified in train
; and 2. a
newdata
argument which contains a matrix of
observations to be classified – the matrix should have
rows corresponding to the individual observations and
columns corresponding to the features (covariates). For
an example, see lda
.
the bootstrapped error rate estimate
1 2 3 4 5 6 7 8 9 10 | require('MASS')
iris_x <- data.matrix(iris[, -5])
iris_y <- iris[, 5]
# Because the \code{classify} function returns multiples objects in a list,
# we provide a wrapper function that returns only the class labels.
lda_wrapper <- function(object, newdata) { predict(object, newdata)$class }
set.seed(42)
errorest_boot(x = iris_x, y = iris_y, train = MASS:::lda, classify = lda_wrapper)
# Output: 0.0228
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.