Description Usage Arguments Details Author(s) References Examples
This function takes a linear regression from lm
, logistic regression from glm
, partition model from rpart
, or random forest from randomForest
and calculates the generalization error on a dataframe.
1 | generalization.error(MODEL,HOLDOUT,Kfold=FALSE,K=5,R=10,seed=NA)
|
MODEL |
A linear regression model created using |
HOLDOUT |
A dataset for which the generalization error will be calculated. If not given, the error on the data used to build the model ( |
Kfold |
If |
K |
The number of folds used in repeated K-fold cross-validation for the estimation of the generalization error for the model |
R |
The number of repeats used in repeated K-fold cross-validation. |
seed |
an optional argument priming the random number seed for estimating the generalization error |
This function calculates the error on MODEL
, its estimated generalization error from repeated K-fold cross-validation (for regression models only), and the actual generalization error on HOLDOUT
. If the response is quantitative, the RMSE is reported. If the response is categorical, the confusion matrices and misclassification rates are returned.
Adam Petrie
Introduction to Regression and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | #Education analytics
data(STUDENT)
set.seed(1010)
train.rows <- sample(1:nrow(STUDENT),0.7*nrow(STUDENT))
TRAIN <- STUDENT[train.rows,]
HOLDOUT <- STUDENT[-train.rows,]
M <- lm(CollegeGPA~.,data=TRAIN)
#Also estimate the generalization error of the model
generalization.error(M,HOLDOUT,Kfold=TRUE,seed=5020)
#Try partition and randomforest, though they do not perform as well as regression here
TREE <- rpart(CollegeGPA~.,data=TRAIN)
FOREST <- randomForest(CollegeGPA~.,data=TRAIN)
generalization.error(TREE,HOLDOUT)
generalization.error(FOREST,HOLDOUT)
#Wine
data(WINE)
set.seed(2020)
train.rows <- sample(1:nrow(WINE),0.7*nrow(WINE))
TRAIN <- WINE[train.rows,]
HOLDOUT <- WINE[-train.rows,]
M <- glm(Quality~.^2,data=TRAIN,family=binomial)
generalization.error(M,HOLDOUT)
#Random forest predicts best on the holdout sample
TREE <- rpart(Quality~.,data=TRAIN)
FOREST <- randomForest(Quality~.,data=TRAIN)
generalization.error(TREE,HOLDOUT)
generalization.error(FOREST,HOLDOUT)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.