# cv.nfeaturesLDA: Cross-validation to find the optimum number of features... In animation: A Gallery of Animations in Statistics and Utilities to Create Animations

## Description

This function provids an illustration of the process of finding out the optimum number of variables using k-fold cross-validation in a linear discriminant analysis (LDA).

## Usage

 ```1 2 3 4 5 6 7 8``` ```cv.nfeaturesLDA( data = matrix(rnorm(600), 60), cl = gl(3, 20), k = 5, cex.rg = c(0.5, 3), col.av = c("blue", "red"), ... ) ```

## Arguments

 `data` a data matrix containg the predictors in columns `cl` a factor indicating the classification of the rows of `data` `k` the number of folds `cex.rg` the range of the magnification to be used to the points in the plot `col.av` the two colors used to respectively denote rates of correct predictions in the i-th fold and the average rates for all k folds `...` arguments passed to `points` to draw the points which denote the correct rate

## Details

For a classification problem, usually we wish to use as less variables as possible because of difficulties brought by the high dimension.

The selection procedure is like this:

• Split the whole data randomly into k folds:

• For the number of features g = 1, 2, ..., gmax, choose g features that have the largest discriminatory power (measured by the F-statistic in ANOVA):

• For the fold i (i = 1, 2, ..., k):

• Train a LDA model without the i-th fold data, and predict with the i-th fold for a proportion of correct predictions p[gi];

• Average the k proportions to get the correct rate p[g];

• Determine the optimum number of features with the largest p.

Note that g_{max} is set by `ani.options('nmax')` (i.e. the maximum number of features we want to choose).

## Value

A list containing

 `accuracy ` a matrix in which the element in the i-th row and j-th column is the rate of correct predictions based on LDA, i.e. build a LDA model with j variables and predict with data in the i-th fold (the test set) `optimum ` the optimum number of features based on the cross-validation

## Author(s)

Yihui Xie <https://yihui.org/>

## References

Maindonald J, Braun J (2007). Data Analysis and Graphics Using R - An Example-Based Approach. Cambridge University Press, 2nd edition. pp. 400

`kfcv`, `cv.ani`, `lda`