withCrossval: Crossvalidation using replicate weights

View source: R/xval.R

withCrossvalR Documentation

Crossvalidation using replicate weights

Description

In each set of replicate weights there will be some clusters that have essentially zero weight. These are used as the test set, with the other clusters used as the training set. Jackknife weights ("JK1","JKn") are very similar to cross-validation at the cluster level; bootstrap weights are similar to bootstrapping for cross-validation.

Usage

withCrossval(design, formula, trainfun, testfun, loss = c("MSE",
"entropy", "AbsError"), intercept, tuning,  nearly_zero=1e-4,...)

Arguments

design

A survey design object (currently only svyrep.design)

formula

Model formula where the left-hand side specifies the outcome vairable and the right-hand side specifies the variables that will be used for prediction

trainfun

Function taking a predictor matrix X, an outcome vector y, a weights vector w, and an element of tuning, and training a model that is returned as some R object.

testfun

Function taking a predictor matrix X and the output from trainfun and returning fitted values for the outcome variable.

loss

Loss function for assessing prediction

intercept

Should the predictor matrix have an intercept added?

tuning

vector of tuning parameters, such as the regularisation parameter in information criteria or the number of predictors. trainfun and testfun will be called with each element of this vector in turn. Use any single-element vector if no tuning parameter is needed

nearly_zero

test-set threshold on the scale of replicate weight divided by sampling weight.

...

future expansion

Value

A number

References

Iparragirre, A., Lumley, T., Barrio, I., & Arostegui, I. (2023). Variable selection with LASSO regression for complex survey data. Stat, 12(1), e578.

See Also

as.svrepdesign

Examples

data(api)
rclus1<-as.svrepdesign(svydesign(id=~dnum, weights=~pw, data=apiclus1,
fpc=~fpc))


withCrossval(rclus1, api00~api99+ell+stype,
  trainfun=function(X,y,w,tuning) lm.wfit(X,y,w),
  testfun=function(X, trainfit,tuning) X%*%coef(trainfit),
  intercept=TRUE,loss="MSE",tuning=1)


## More realistic example using lasso
## tuning parameter is number of variables in model
##
##  library(glmnet)
##  ftrain=function(X,y,w,tuning) {
##   	m<-glmnet(X,y,weights=w)
##   	lambda<-m$lambda[min(which(m$df>=tuning))]
## 	list(m,lambda)
##   	}
##   ftest=function(X, trainfit, tuning){
##   	predict(trainfit[[1]], newx=X, s=trainfit[[2]])
##   }
##
##  withCrossval(rclus1, api00~api99+ell+stype+mobility+enroll,
##    trainfun=ftrain,
##    testfun=ftest,
##    intercept=FALSE,loss="MSE",
##    tuning=0:3)
##
## [1] 11445.2379  9649.1150   800.0742   787.4171
##
## Models with two or three predictors are about equally good



survey documentation built on Aug. 28, 2024, 3 a.m.