ridgereg.cv: Cross validation for the ridge regression

View source: R/ridgereg.cv.R

Cross-validation for ridge regressionR Documentation

Cross validation for the ridge regression

Description

Cross validation for the ridge regression is performed using the TT estimate of bias (Tibshirani and Tibshirani, 2009). There is an option for the GCV criterion which is automatic.

Usage

ridgereg.cv( target, dataset, K = 10, lambda = seq(0, 2, by = 0.1), auto = FALSE, 
seed = FALSE, ncores = 1, mat = NULL )

Arguments

target

A numeric vector containing the values of the target variable. If the values are proportions or percentages, i.e. strictly within 0 and 1 they are mapped into R using log( target/(1 - target) ).

dataset

A numeric matrix containing the variables. Rows are samples and columns are features.

K

The number of folds. Set to 10 by default.

lambda

A vector with the a grid of values of λ to be used.

auto

A boolean variable. If it is TRUE the GCV criterion will provide an automatic answer for the best $lambda$. Otherwise k-fold cross validation is performed.

seed

A boolean variable. If it is TRUE the results will always be the same.

ncores

The number of cores to use. If it is more than 1 parallel computing is performed.

mat

If the user has its own matrix with the folds, he can put it here. It must be a matrix with K columns, each column is a fold and it contains the positions of the data, i.e. numbers, not the data. For example the first column is c(1,10,4,25,30), the second is c(21, 23,2, 19, 9) and so on.

Details

The lm.ridge command in MASS library is a wrapper for this function. If you want a fast choice of λ, then specify auto = TRUE and the λ which minimizes the generalised cross-validation criterion will be returned. Otherise a k-fold cross validation is performed and the estimated performance is bias corrected as suggested by Tibshirani and Tibshirani (2009).

Value

A list including:

mspe

If auto is FALSE the values of the mean prediction error for each value of λ.

lambda

If auto is FALSE the λ which minimizes the MSPE.

performance

If auto is FALSE the minimum bias corrected MSPE along with the estimate of bias.

runtime

The run time of the algorithm. A numeric vector. The first element is the user time, the second element is the system time and the third element is the elapsed time.

Note

The values can be extracted with the $ symbol, i.e. this is an S3 class output.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris mtsagris@uoc.gr

References

Hoerl A.E. and R.W. Kennard (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55-67.

Brown P. J. (1994). Measurement, Regression and Calibration. Oxford Science Publications.

Tibshirani R.J., and Tibshirani R. (2009). A bias correction for the minimum error rate in cross-validation. The Annals of Applied Statistics 3(2): 822-829.

See Also

ridge.reg

Examples

#simulate a dataset with continuous data
dataset <- matrix(runif(200 * 40, 1, 100), nrow = 200 ) 
#the target feature is the last column of the dataset as a vector
target <- dataset[, 40]
a1 <- ridgereg.cv(target, dataset, auto = TRUE)
a2 <- ridgereg.cv( target, dataset, K = 10, lambda = seq(0, 1, by = 0.1) )

MXM documentation built on Aug. 25, 2022, 9:05 a.m.