View source: R/CV.SuperLearner.R
CV.SuperLearner | R Documentation |
Function to get V-fold cross-validated risk estimate for super learner. This function simply splits the data into V folds and then calls SuperLearner. Most of the arguments are passed directly to SuperLearner.
CV.SuperLearner(Y, X, V = NULL, family = gaussian(), SL.library,
method = "method.NNLS", id = NULL, verbose = FALSE,
control = list(saveFitLibrary = FALSE), cvControl = list(),
innerCvControl = list(),
obsWeights = NULL, saveAll = TRUE, parallel = "seq", env = parent.frame())
Y |
The outcome. |
X |
The covariates. |
V |
The number of folds for |
family |
Currently allows |
SL.library |
Either a character vector of prediction algorithms or a list containing character vectors. See details below for examples on the structure. A list of functions included in the SuperLearner package can be found with |
method |
A list (or a function to create a list) containing details on estimating the coefficients for the super learner and the model to combine the individual algorithms in the library. See |
id |
Optional cluster identification variable. For the cross-validation splits, |
verbose |
Logical; TRUE for printing progress during the computation (helpful for debugging). |
control |
A list of parameters to control the estimation process. Parameters include |
cvControl |
A list of parameters to control the outer cross-validation process. The outer cross-validation is the sample spliting for evaluating the SuperLearner. Parameters include |
innerCvControl |
A list of lists of parameters to control the inner cross-validation process. It should have |
obsWeights |
Optional observation weights variable. As with |
saveAll |
Logical; Should the entire |
parallel |
Options for parallel computation of the V-fold step. Use "seq" (the default) for sequential computation. |
env |
Environment containing the learner functions. Defaults to the calling environment. |
The SuperLearner
function builds a estimator, but does not contain an estimate on the performance of the estimator. Various methods exist for estimator performance evaluation. If you are familiar with the super learner algorithm, it should be no surprise we recommend using cross-validation to evaluate the honest performance of the super learner estimator. The function CV.SuperLearner
computes the usual V-fold cross-validated risk estimate for the super learner (and all algorithms in SL.library
for comparison).
An object of class CV.SuperLearner
(a list) with components:
call |
The matched call. |
AllSL |
If |
SL.predict |
The predicted values from the super learner when each particular row was part of the validation fold. |
discreteSL.predict |
The traditional cross-validated selector. Picks the algorithm with the smallest cross-validated risk (in super learner terms, gives that algorithm coefficient 1 and all others 0). |
whichDiscreteSL |
A list of length |
library.predict |
A matrix with the predicted values from each algorithm in |
coef |
A matrix with the coefficients for the super learner on each fold. The columns are the algorithms in |
folds |
A list containing the row numbers for each validation fold. |
V |
Number of folds for |
libraryNames |
A character vector with the names of the algorithms in the library. The format is 'predictionAlgorithm_screeningAlgorithm' with '_All' used to denote the prediction algorithm run on all variables in X. |
SL.library |
Returns |
method |
A list with the method functions. |
Y |
The outcome |
Eric C Polley polley.eric@mayo.edu
SuperLearner
## Not run:
set.seed(23432)
## training set
n <- 500
p <- 50
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
colnames(X) <- paste("X", 1:p, sep="")
X <- data.frame(X)
Y <- X[, 1] + sqrt(abs(X[, 2] * X[, 3])) + X[, 2] - X[, 3] + rnorm(n)
## build Library and run Super Learner
SL.library <- c("SL.glm", "SL.randomForest", "SL.gam", "SL.polymars", "SL.mean")
test <- CV.SuperLearner(Y = Y, X = X, V = 10, SL.library = SL.library,
verbose = TRUE, method = "method.NNLS")
test
summary(test)
## Look at the coefficients across folds
coef(test)
# Example with specifying cross-validation options for both
# CV.SuperLearner (cvControl) and the internal SuperLearners (innerCvControl)
test <- CV.SuperLearner(Y = Y, X = X, SL.library = SL.library,
cvControl = list(V = 10, shuffle = FALSE),
innerCvControl = list(list(V = 5)),
verbose = TRUE, method = "method.NNLS")
## examples with snow
library(parallel)
cl <- makeCluster(2, type = "PSOCK") # can use different types here
clusterSetRNGStream(cl, iseed = 2343)
testSNOW <- CV.SuperLearner(Y = Y, X = X, SL.library = SL.library, method = "method.NNLS",
parallel = cl)
summary(testSNOW)
stopCluster(cl)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.