RDA Cross Validation Function

Share:

Description

A function that does RDA cross-validation analysis on the training data set.

Usage

1
2
rda.cv(fit, x, y, prior, alpha, delta, nfold=min(table(y), 10),
       folds=balanced.folds(y), trace=FALSE)

Arguments

fit

An rda fit object obtained from the rda function.

x

The training data set as used in the rda function.

y

The class labels of the training samples (columns) in "x" as used in rda function.

prior

A numerical vector that gives the prior proportion of each class. Its length should be equal to the number of classes. By default, the function uses the one coming along with the fit object unless users want to specify some other prior vector.

alpha

A numerical vector of the regularization values for alpha. By default, the function uses the one coming along with the fit object unless users want to do cross-validation based on some other values of alpha.

delta

A numerical vector of the threshold values for delta. By default, the function uses the one coming along with the fit object unless users want to do cross-validation based on some other values of delta.

nfold

An integer number to specify the number of folds in the cross-validation analysis. This option is overwritten when the folds option is specified at the same time.

folds

A list that provides the folds used in the cross-validation analysis. Each component of the list is an integer vector of the sample indices. See examples below for more details.

trace

A logical flag indicating whether the intermediate steps should be printed.

Details

rda.cv does the RDA-based cross-validation on the training data set.

Value

The rda.cv function will return an object of class rdacv with the following list of components:

alpha

The vector of the regularization values for alpha used in the cross-validation.

delta

The vector of the threshold values for delta used in the cross-validation.

prior

The vector of the prior proportion of each class used in the cross-validation.

nfold

The number of folds used in the cross-validation.

folds

The folds used in the cross-validation.

yhat.new

The 3-dim array of the predicted class labels of the training samples for each combination (alpha, delta). The first index corresponds to the alpha values while the second index corresponds to the delta values. The third index is the predicted class labels for the corresponding samples.

err

The training error matrix from cross-validation. The rows correspond to the alpha values while the columns correspond to the delta values. It is automatically generated by the function.

cv.err

The test error (or cross-validation error) matrix. The rows correspond to the alpha values while the columns correspond to the delta values.

ngene

The matrix of the number of shrunken genes. The rows correspond to the alpha values while the columns correspond to the delta values. Note: the number of shrunken genes is based on the average result from cross-validation.

reg

The type of regularization used in cross-validation.

n

The sample size of the training data set.

Author(s)

Yaqian Guo, Trevor Hastie and Robert Tibshirani

References

Guo, Y. et al. (2004) Regularized Discriminant Analysis and Its Application in Microarrays, Technical Report, Department of Statistics, Stanford University.

See Also

Also see rda and predict.rda.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
data(colon)
colon.x <- t(colon.x)
fit <- rda(colon.x, colon.y)
fit.cv <- rda.cv(fit, x=colon.x, y=colon.y)

## to use the customized folds in cross-validation,
## for example, 6-fold with 11, 11, 10, 10, 10, 10 samples 
## in the respective folds, you can do the follows:
index <- sample(1:62, 62)
folds <- list()
folds[[1]] <- index[1:11]
folds[[2]] <- index[12:22]
folds[[3]] <- index[23:32]
folds[[4]] <- index[33:42]
folds[[5]] <- index[43:52]
folds[[6]] <- index[53:62]
fit.cv <- rda.cv(fit, colon.x, colon.y, folds=folds)