Engines for cross-validation of many types of regression and class prediction models are provided. These engines include built-in support for 'glmnet', 'lars', 'plus', 'MASS', 'rpart', 'C50' and 'randomforest'. It is easy for the user to add other regression or classification algorithms. The 'parallel' package is used to improve speed. Several data generation algorithms for problems in regression and classification are provided.
The DESCRIPTION file:
Index: This package was not yet installed at build time.
Engines for cross-validation of many types of regression and class prediction models are provided. These engines include built-in support for CRAN packages including glmnet, lars, plus, MASS, rpart, C50 and randomforest. The cross validation engines are the functions gcv() and cgcv(). It is easy for the user to add other regression or classification algorithms for use with these engines. The default cost function for regression is squared error but support is provided for mean absolute error and mean percentage absolute error. For classifcation the default cost function 0/1 loss with the associated mis-classification rate but logloss is also provided. The user may also specify their own cost function. Both gcv() and cgcv() make use of R's parallel package. Several illustrative datasets are included as well as data generation algorithms for problems in regression and classification.
The delete-d cross validation method of Shao (1993) is used. Shao recommends at least 1000 iterations so this method requires significantly more computation than k-fold cross-validation that is recommend by Hastie, Tibshirani and Friedman (2009), in conjunction with regularizatin using the one-standard-deviation rule, for the purpose of selecting a tuning parameter in penalized regression. However many researchers have noticed that even regularized k-fold cross-validation is quite variable (Kim, 2009). A future version of this package will include k-fold cross-validation and iterated k-fold cross-validation. Usually iterated k-fold cross-validation produces very similar results to the delete-d method (Kim, 2009).
Other CRAN packages that provide general frameworks with resampling strategies include boot, mlr and caret.
A. I McLeod Maintainer: A. I. McLeod <[email protected]>
Trevor Hastie, Robert Tibshirani, Jerome H. Friedman (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed. Springer.
Jun Shao (1993), Linear Model Selection by Cross-validation Journal of the American Statistical Association,Vol. 88, Iss. 422, 1993.
J. H. Kim, (2009), Estimating Classification Error Rate: Repeated Cross-validation, Repeated Hold-out and Bootstrap. Computational Statistics and Data Analysis, 53, 3735-3745.
1 2 3 4 5 6 7 8 9 10 11
#Regression with simulated model Xy <- ShaoReg() gcv(Xy[,1:8], Xy[,9], MaxIter=25, d=5) # #SVM with simulated mixture data Xy <- rmix(100) cgcv(X=Xy[,1:2], y=Xy[,3], yh=yh_svm, MaxIter=25) # #data has been divided into training and test just do simple # cross-validation yh_CART(SinghTrain, SinghTest)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.