cv.vda.le: Choose the optimal pair of lambdas, lambda_1 and lambda_2
In VDA: VDA

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/cv.vda.le.R

Use k-fold validation to choose the optmial values for the tuning parameters λ_1 and λ_2 to be used in Multicategory Vertex Discriminant Analysis (vda.le).

1	cv.vda.le(x, y, kfold, lam.vec.1, lam.vec.2)

`x`	n x p matrix or data frame containing the cases for each feature. The rows correspond to cases and the columns to the features. Intercept column is not included in this.
`y`	n x 1 vector representing the outcome variable. Each element denotes which one of the k classes that case belongs to.
`kfold`	The number of folds to use for the k-fold validation for each set of λ_1 and λ_2
`lam.vec.1`	A vector containing the set of all values of λ_1, from which VDA will be conducted. To use only Euclidean penalization, set `lam.vec.2`=0.
`lam.vec.2`	A vector containing the set of all values of λ_2, from which VDA will be conducted. vda.le is relatively insensitive to lambda values, so it is recommended that a vector of few values is used. The default value is 0.01. To use only Lasso penalization, set `lam.vec.1`=0.

For each pair of (λ_1,λ_2), k-fold cross-validation will be conducted and the corresponding average testing error over the k folds will be recorded. λ_1 represents the parameter for the lasso penalization, while λ_2 represents the parameter for the group euclidean penalization. To use only Lasso penalization, set lam.vec.2=0. To use only Euclidean penalization, set lam.vec.1=0. The optimal pair is considered the pair of values that give the smallest testing error over the cross validation.

To view a plot of the cross validation errors across lambda values, see plot.cv.vda.le.

`kfold`	The number of folds used in k-fold cross validation
`lam.vec.1`	The user supplied vector of λ_1 values
`lam.vec.2`	The user supplied vector of λ_2 values
`error.cv`	A matrix of average testing errors. The rows correspond to λ_1 values and the columns correspond to λ_2 values.
`lam.opt`	The pair of λ_1 and λ_2 values that return the lowest testing error across k-fold cross validation.

Edward Grant, Xia Li, Kenneth Lange, Tong Tong Wu

Maintainer: Edward Grant edward.m.grant@gmail.com

Wu, T.T. and Lange, K. (2010) Multicategory Vertex Discriminant Analysis for High-Dimensional Data. Annals of Applied Statistics, Volume 4, No 4, 1698-1721.

Lange, K. and Wu, T.T. (2008) An MM Algorithm for Multicategory Vertex Discriminant Analysis. Journal of Computational and Graphical Statistics, Volume 17, No 3, 527-544.

vda.le.

plot.cv.vda.le.

### load zoo data
### column 1 is name, columns 2:17 are features, column 18 is class
data(zoo)

### feature matrix 
x <- zoo[,2:17]

### class vector
y <- zoo[,18]

### lambda vector
lam1 <- (1:5)/100
lam2 <- (1:5)/100

### Searching for the best pair, using both lasso and euclidean penalizations
cv <- cv.vda.le(x, y, kfold = 3, lam.vec.1 = exp(1:5)/10000, lam.vec.2 = (1:5)/100)
plot(cv)
outLE <- vda.le(x,y,cv$lam.opt[1],cv$lam.opt[2])

### To search for the best pair, using ONLY lasso penalization, set lambda2=0 (remove comments)
#cvlasso <- cv.vda.le(x, y, kfold = 3, lam.vec.1 = exp(1:10)/1000, lam.vec.2 = 0)
#plot(cvlasso)
#cvlasso$lam.opt

### To search for the best pair, using ONLY euclidean penalization, set lambda1=0 (remove comments)
#cveuclidian <- cv.vda.le(x, y, kfold = 3, lam.vec.1 = 0, lam.vec.2 = exp(1:10)/1000)
#plot(cveuclidian)
#cveuclidian$lam.opt

### Predict five cases based on vda.le (Lasso and Euclidean penalties)
fivecases <- matrix(0,5,16)
fivecases[1,] <- c(1,0,0,1,0,0,0,1,1,1,0,0,4,0,1,0)
fivecases[2,] <- c(1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1)
fivecases[3,] <- c(0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0)
fivecases[4,] <- c(0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0)
fivecases[5,] <- c(0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0)
predict(outLE, fivecases)