cvSLAb: Estimate cross-validated risk for the super learner fit to...

Description Usage Arguments Details Value See Also Examples

Description

A convenience wrapper for CV.SuperLearner for antibody measurements.

Usage

1
2
3
cvSLAb(Y, X, id = 1:length(Y), V = 10, SL.library = c("SL.mean", "SL.glm",
  "SL.bayesglm", "SL.loess", "SL.gam", "SL.randomForest"), RFnodesize = NULL,
  gamdf = NULL)

Arguments

Y

Antibody measurement. Must be a numeric vector.

X

A vector, matrix, or data.frame of covariates for each individual used to predict antibody levels

id

An optional cluster or repeated measures id variable. For cross-validation splits, id forces observations in the same cluster or for the same individual to be in the same validation fold.

V

Number of folds to use in the cross validation (default is 10)

SL.library

Library of algorithms to include in the ensemble (see the SuperLearner package for details).

RFnodesize

Optional argument to specify a range of minimum node sizes for the random forest algorithm. If SL.library includes SL.randomForest, then the default is to search over node sizes of 15,20,...40. Specifying this option will override the default.

gamdf

Optional argument to specify a range of degrees of freedom for natural smoothing splines in a generalized additive model. If SL.library includes SL.gam, then the default is to search over degrees of freedom 2,3,...10. Specifying this option will override the default.

Details

The SuperLearner function builds a estimator, but does not contain an estimate on the performance of the estimator. Various methods exist for estimator performance evaluation. If you are familiar with the super learner algorithm, it should be no surprise we recommend using cross-validation to evaluate the honest performance of the super learner estimator. The function cvSLAb provides a convenient wrapper for the CV.SuperLearner routine to compute the V-fold cross-validated risk estimate for the super learner (and all algorithms in SL.library for comparison). The wrapper adds convenience by restricting the dataset to complete cases, transforming the covariate matrix (W) into a data.frame, and allowing the user to tune parameters in the Random Forest and GAM libraries if they are included in SL.library. It assumes a continuous outcome (family=gaussian()), but can be run on binary outcomes without problems.

Value

This function returns an object of class CV.SuperLearner (see the SuperLearner package for details)

See Also

tmleAb, SuperLearner

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 
# load the Garki project serology data, subset to round 5 intervention
data("garki_sero")
garki_sero$village <- factor(garki_sero$village)
garki_sero$sex <- factor(garki_sero$sex)
garki_sero$tr01 <- ifelse(garki_sero$tr=="Intervention",1,0)
d <- subset(garki_sero,serosvy==5 & tr=="Intervention")

# fit the cross-validated super learner
# with just Age as the predictor
set.seed(62522)
CVfit <- cvSLAb(Y=log10(d$ifatpftitre+1),X=data.frame(Age=d$ageyrs),id=d$id)

# plot cross-validated MSE ("Risk")
plot(CVfit)

## End(Not run)

ben-arnold/tmleAb documentation built on May 12, 2019, 10:55 a.m.