Description Usage Arguments Details Value Author(s) References See Also Examples
Computes the V-fold cross-validation estimates from the Super Learner. The function splits the data into V folds and calls SuperLearner
.
1 |
Y |
The outcome variable |
X |
The predictor variables |
SL.library |
The library of prediction algorithms to be used in |
outside.V |
An integer for the number of folds to split the data into |
inside.V |
An integer for the number of folds each Super Learner should use |
shuffle |
A logical value indicating whether the rows of the data should be shuffled before the data splits |
verbose |
A logical value to produce additional output |
family |
currently allows |
method |
Loss function for combining prediction in the library. Currently either "NNLS" (the default), "NNLS2", or "NNloglik". NNLS and NNLS2 are non-negative least squares based on the Lawson-Hanson algorithm and the dual method of Goldfarb and Idnani, respectively. NNLS and NNLS2 will work for both gaussian and binomial outcomes. NNloglik is a non-negative binomial likelihood maximization using the BFGS quasi-Newton optimization method. |
id |
cluster identification variable. For the cross-validation splits used to find the weights for each prediction algorithm, |
obsWeights |
observation weights |
save.fit.library |
a logical value whether to save the fit of each algorithm in the library on the full data set. This must be TRUE for |
trim.logit |
Only used if |
stratifyCV |
a logical value for the cross-validation splits. If TRUE and the family is binomial then the splits will stratify on the outcome to give (roughly) equal proportions of the outcome in all splits. Currently will not work in combination with cluster id. |
... |
additional arguments ... |
see SuperLearner
for details on the Super Learner
CV.fit.SL |
A list containing the output from each |
pred.SL |
The V-fold cross-validation super learner predictions for the outcome. These can be used to estimate the honest cross-validated risk |
pred.discreteSL |
The V-fold cross-validated discrete super learner prediction for the outcome. The discrete super learner selects the algorithm with the minimum internal cross-validated risk estimate. See output value |
whichDiscreteSL |
The prediction algorithm selected in each outside V fold as the discrete super learner |
pred.library |
The V-fold cross-validation predictions for the outcome from all algorithms in the library |
coef.SL |
a matrix of coefficients in the SuperLearner across the V folds |
folds |
a list with the cross-validation splits |
call |
the function call |
Eric C Polley ecpolley@berkeley.edu
van der Laan, M. J., Polley, E. C. and Hubbard, A. E. (2008) Super Learner, Statistical Applications of Genetics and Molecular Biology, 6, article 25. http://www.bepress.com/sagmb/vol6/iss1/art25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | ## Not run:
## simulate data
set.seed(23432)
## training set
n <- 200
p <- 20
X <- matrix(rnorm(n*p), nrow=n, ncol=p)
colnames(X) <- paste("X",1:p, sep="")
X <- data.frame(X)
Y <- X[, 1] + X[, 2]^2 - X[, 3] + X[, 1]*X[, 4] + X[, 5] + X[, 6] - X[, 7] + rnorm(n)
## test set
m <- 1000
newX <- matrix(rnorm(m*p), nrow=m, ncol=p)
colnames(newX) <- paste("X",1:p, sep="")
newX <- data.frame(newX)
newY <- newX[, 1] + newX[, 2]^2 - newX[, 3] + newX[, 1]*newX[, 4] + newX[, 5] + newX[, 6] - newX[, 7] + rnorm(m)
## generate Library and run Super Learner
SL.library <- c("SL.glmnet","SL.glm","SL.randomForest")
test <- SuperLearner(Y=Y, X=X, newX=newX, SL.library=SL.library, verbose=TRUE, V=20)
test
testCV <- CV.SuperLearner(Y=Y, X=X, SL.library=SL.library, verbose=TRUE, outside.V=10, inside.V = 20)
testCV
## compare SuperLearner honest CV risk with discrete super learner CV risk
mean((Y - testCV$pred.SL)^2)
mean((Y - testCV$pred.discreteSL)^2)
apply(testCV$pred.library, 2, function(x) mean((Y - x)^2))
summary(testCV)
## Binary outcome:
set.seed(1)
N <- 200
X <- matrix(rnorm(N*10), N, 10)
X <- as.data.frame(X)
Y <- rbinom(N, 1, plogis(.2*X[, 1] + .1*X[, 2] - .2*X[, 3] + .1*X[, 3]*X[, 4] - .2*abs(X[, 4])))
SL.library <- c("SL.glmnet","SL.glm","SL.randomForest", "SL.knn20", "SL.knn30", "SL.knn40", "SL.knn50", "SL.glmnet.alpha50", "SL.gam", "SL.gam.3")
testCV.NNLS <- CV.SuperLearner(Y=Y, X=X, SL.library=SL.library, verbose=TRUE, outside.V=10, inside.V = 20, method = "NNLS", family = binomial())
summary(testCV.NNLS)
testCV.NNloglik <- CV.SuperLearner(Y=Y, X=X, SL.library=SL.library, verbose=TRUE, outside.V=10, inside.V = 20, method = "NNloglik", family = binomial())
summary(testCV.NNloglik)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.