errorest_boot: Calculates the Bootstrap Error Rate for a specified...

Description Usage Arguments Details Value Examples

View source: R/errorest-boot.r

Description

For a given data matrix and its corresponding vector of labels, we calculate the bootstrap error rate for a given classifier.

Usage

1
2
  errorest_boot(x, y, train, classify, num_bootstraps = 50,
    ...)

Arguments

x

a matrix of n observations (rows) and p features (columns)

y

a vector of n class labels

train

a function that builds the classifier. (See details.)

classify

a function that classifies observations from the constructed classifier from train. (See details.)

num_bootstraps

the number of bootstrap replications

...

additional arguments passed to the function specified in train.

Details

To calculate the bootstrap error rate, we sample from the data with replacement to obtain a bootstrapped training data set. We then train the given classifier (given in train) on the bootstrapped training data set and classify the original data set given in the matrix x. Then we calculate the proportion of misclassified observations, based on the true labels given in y, to obtain a single bootstrap error rate. We repeat this process num_bootstraps times and report the average of the bootstrap error rates.

For the given classifier, two functions must be provided 1. to train the classifier and 2. to classify unlabeled observations. The training function is provided as train and the classification function as classify.

We expect that the first two arguments of the train function are x and y, corresponding to the data matrix and the vector of their labels, respectively. Additional arguments can be passed to the train function.

We stay with the usual R convention for the classify function. We expect that this function takes two arguments: 1. an object argument which contains the trained classifier returned from the function specified in train; and 2. a newdata argument which contains a matrix of observations to be classified – the matrix should have rows corresponding to the individual observations and columns corresponding to the features (covariates). For an example, see lda.

Value

the bootstrapped error rate estimate

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
require('MASS')
iris_x <- data.matrix(iris[, -5])
iris_y <- iris[, 5]

# Because the \code{classify} function returns multiples objects in a list,
# we provide a wrapper function that returns only the class labels.
lda_wrapper <- function(object, newdata) { predict(object, newdata)$class }
set.seed(42)
errorest_boot(x = iris_x, y = iris_y, train = MASS:::lda, classify = lda_wrapper)
# Output: 0.0228

sortinghat documentation built on May 30, 2017, 4:52 a.m.