Description Usage Arguments Details Value Examples

For a given data matrix and its corresponding vector of labels, we calculate the cross-validation (CV) error rate for a given classifier.

1 2 | ```
errorest_cv(x, y, train, classify, num_folds = 10,
hold_out = NULL, ...)
``` |

`x` |
a matrix of n observations (rows) and p features (columns) |

`y` |
a vector of n class labels |

`train` |
a function that builds the classifier. (See details.) |

`classify` |
a function that classifies observations
from the constructed classifier from |

`num_folds` |
the number of cross-validation folds.
Ignored if |

`hold_out` |
the hold-out size for cross-validation. See Details. |

`...` |
additional arguments passed to the function
specified in |

To calculate the CV error rate, we partition the data set into 'folds'. For each fold, we consider the observations within the fold as a test data set, while the remaining observations are considered as a training data set. We then calculate the number of misclassified observations within the fold. The CV error rate is the proportion of misclassified observations across all folds.

Rather than partitioning the observations into folds, an
alternative convention is to specify the 'hold-out' size
for each test data set. Note that this convention is
equivalent to the notion of folds. We allow the user to
specify either option with the `hold_out`

and
`num_folds`

arguments. The `num_folds`

argument
is the default option but is ignored if the
`hold_out`

argument is specified (i.e. is not
`NULL`

).

For the given classifier, two functions must be provided
1. to train the classifier and 2. to classify unlabeled
observations. The training function is provided as
`train`

and the classification function as
`classify`

.

We expect that the first two arguments of the
`train`

function are `x`

and `y`

,
corresponding to the data matrix and the vector of their
labels, respectively. Additional arguments can be passed
to the `train`

function.

We stay with the usual R convention for the
`classify`

function. We expect that this function
takes two arguments: 1. an `object`

argument which
contains the trained classifier returned from the
function specified in `train`

; and 2. a
`newdata`

argument which contains a matrix of
observations to be classified – the matrix should have
rows corresponding to the individual observations and
columns corresponding to the features (covariates). For
an example, see `lda`

.

the calculated CV error-rate estimate

1 2 3 4 5 6 7 8 9 10 11 | ```
require('MASS')
iris_x <- data.matrix(iris[, -5])
iris_y <- iris[, 5]
# Because the \code{classify} function returns multiples objects in a list,
# we provide a wrapper function that returns only the class labels.
lda_wrapper <- function(object, newdata) { predict(object, newdata)$class }
set.seed(42)
errorest_cv(x = iris_x, y = iris_y, train = MASS:::lda, classify = lda_wrapper)
# Output: 0.02666667
``` |

sortinghat documentation built on May 30, 2017, 4:52 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.