withCrossval | R Documentation |
In each set of replicate weights there will be some clusters that have essentially zero weight. These are used as the test set, with the other clusters used as the training set. Jackknife weights ("JK1","JKn") are very similar to cross-validation at the cluster level; bootstrap weights are similar to bootstrapping for cross-validation.
withCrossval(design, formula, trainfun, testfun, loss = c("MSE",
"entropy", "AbsError"), intercept, tuning, nearly_zero=1e-4,...)
design |
A survey design object (currently only |
formula |
Model formula where the left-hand side specifies the outcome vairable and the right-hand side specifies the variables that will be used for prediction |
trainfun |
Function taking a predictor matrix |
testfun |
Function taking a predictor matrix |
loss |
Loss function for assessing prediction |
intercept |
Should the predictor matrix have an intercept added? |
tuning |
vector of tuning parameters, such as the regularisation
parameter in information criteria or the number of predictors. |
nearly_zero |
test-set threshold on the scale of replicate weight divided by sampling weight. |
... |
future expansion |
A number
Iparragirre, A., Lumley, T., Barrio, I., & Arostegui, I. (2023). Variable selection with LASSO regression for complex survey data. Stat, 12(1), e578.
as.svrepdesign
data(api)
rclus1<-as.svrepdesign(svydesign(id=~dnum, weights=~pw, data=apiclus1,
fpc=~fpc))
withCrossval(rclus1, api00~api99+ell+stype,
trainfun=function(X,y,w,tuning) lm.wfit(X,y,w),
testfun=function(X, trainfit,tuning) X%*%coef(trainfit),
intercept=TRUE,loss="MSE",tuning=1)
## More realistic example using lasso
## tuning parameter is number of variables in model
##
## library(glmnet)
## ftrain=function(X,y,w,tuning) {
## m<-glmnet(X,y,weights=w)
## lambda<-m$lambda[min(which(m$df>=tuning))]
## list(m,lambda)
## }
## ftest=function(X, trainfit, tuning){
## predict(trainfit[[1]], newx=X, s=trainfit[[2]])
## }
##
## withCrossval(rclus1, api00~api99+ell+stype+mobility+enroll,
## trainfun=ftrain,
## testfun=ftest,
## intercept=FALSE,loss="MSE",
## tuning=0:3)
##
## [1] 11445.2379 9649.1150 800.0742 787.4171
##
## Models with two or three predictors are about equally good
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.