Description Usage Arguments Details Value Author(s) References Examples

A Prediction Function for the Super Learner. The `SuperLearner`

function takes a training set pair (X,Y) and returns the predicted values based on a validation set.

1 2 3 | ```
SuperLearner(Y, X, newX = NULL, family = gaussian(), SL.library,
method = "method.NNLS", id = NULL, verbose = FALSE,
control = list(), cvControl = list(), obsWeights = NULL, env = parent.frame())
``` |

`Y` |
The outcome in the training data set. Must be a numeric vector. |

`X` |
The predictor variables in the training data set, usually a data.frame. |

`newX` |
The predictor variables in the validation data set. The structure should match X. If missing, uses X for newX. |

`SL.library` |
Either a character vector of prediction algorithms or a list containing character vectors. See details below for examples on the structure. A list of functions included in the SuperLearner package can be found with |

`verbose` |
logical; TRUE for printing progress during the computation (helpful for debugging). |

`family` |
Currently allows |

`method` |
A list (or a function to create a list) containing details on estimating the coefficients for the super learner and the model to combine the individual algorithms in the library. See |

`id` |
Optional cluster identification variable. For the cross-validation splits, |

`obsWeights` |
Optional observation weights variable. As with |

`control` |
A list of parameters to control the estimation process. Parameters include |

`cvControl` |
A list of parameters to control the cross-validation process. Parameters include |

`env` |
Environment containing the learner functions. Defaults to the calling environment. |

`SuperLearner`

fits the super learner prediction algorithm. The weights for each algorithm in `SL.library`

is estimated, along with the fit of each algorithm.

The prescreen algorithms. These algorithms first rank the variables in `X`

based on either a univariate regression p-value of the `randomForest`

variable importance. A subset of the variables in `X`

is selected based on a pre-defined cut-off. With this subset of the X variables, the algorithms in `SL.library`

are then fit.

The SuperLearner package contains a few prediction and screening algorithm wrappers. The full list of wrappers can be viewed with `listWrappers()`

. The design of the SuperLearner package is such that the user can easily add their own wrappers. We also maintain a website with additional examples of wrapper functions at https://github.com/ecpolley/SuperLearnerExtra.

`call` |
The matched call. |

`libraryNames` |
A character vector with the names of the algorithms in the library. The format is 'predictionAlgorithm_screeningAlgorithm' with '_All' used to denote the prediction algorithm run on all variables in X. |

`SL.library` |
Returns |

`SL.predict` |
The predicted values from the super learner for the rows in |

`coef` |
Coefficients for the super learner. |

`library.predict` |
A matrix with the predicted values from each algorithm in |

`Z` |
The Z matrix (the cross-validated predicted values for each algorithm in |

`cvRisk` |
A numeric vector with the V-fold cross-validated risk estimate for each algorithm in |

`family` |
Returns the |

`fitLibrary` |
A list with the fitted objects for each algorithm in |

`cvFitLibrary` |
A list with fitted objects for each algorithm in |

`varNames` |
A character vector with the names of the variables in |

`validRows` |
A list containing the row numbers for the V-fold cross-validation step. |

`method` |
A list with the method functions. |

`whichScreen` |
A logical matrix indicating which variables passed each screening algorithm. |

`control` |
The |

`cvControl` |
The |

`errorsInCVLibrary` |
A logical vector indicating if any algorithms experienced an error within the CV step. |

`errorsInLibrary` |
A logical vector indicating if any algorithms experienced an error on the full data. |

`env` |
Environment passed into function which will be searched to find the learner functions. Defaults to the calling environment. |

`times` |
A list that contains the execution time of the SuperLearner, plus separate times for model fitting and prediction. |

Eric C Polley epolley@uchicago.edu

van der Laan, M. J., Polley, E. C. and Hubbard, A. E. (2008) Super Learner, *Statistical Applications of Genetics and Molecular Biology*, **6**, article 25.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | ```
## Not run:
## simulate data
set.seed(23432)
## training set
n <- 500
p <- 50
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
colnames(X) <- paste("X", 1:p, sep="")
X <- data.frame(X)
Y <- X[, 1] + sqrt(abs(X[, 2] * X[, 3])) + X[, 2] - X[, 3] + rnorm(n)
## test set
m <- 1000
newX <- matrix(rnorm(m*p), nrow = m, ncol = p)
colnames(newX) <- paste("X", 1:p, sep="")
newX <- data.frame(newX)
newY <- newX[, 1] + sqrt(abs(newX[, 2] * newX[, 3])) + newX[, 2] -
newX[, 3] + rnorm(m)
# generate Library and run Super Learner
SL.library <- c("SL.glm", "SL.randomForest", "SL.gam",
"SL.polymars", "SL.mean")
test <- SuperLearner(Y = Y, X = X, newX = newX, SL.library = SL.library,
verbose = TRUE, method = "method.NNLS")
test
# library with screening
SL.library <- list(c("SL.glmnet", "All"), c("SL.glm", "screen.randomForest",
"All", "screen.SIS"), "SL.randomForest", c("SL.polymars", "All"), "SL.mean")
test <- SuperLearner(Y = Y, X = X, newX = newX, SL.library = SL.library,
verbose = TRUE, method = "method.NNLS")
test
# binary outcome
set.seed(1)
N <- 200
X <- matrix(rnorm(N*10), N, 10)
X <- as.data.frame(X)
Y <- rbinom(N, 1, plogis(.2*X[, 1] + .1*X[, 2] - .2*X[, 3] +
.1*X[, 3]*X[, 4] - .2*abs(X[, 4])))
SL.library <- c("SL.glmnet", "SL.glm", "SL.knn", "SL.mean")
# least squares loss function
test.NNLS <- SuperLearner(Y = Y, X = X, SL.library = SL.library,
verbose = TRUE, method = "method.NNLS", family = binomial())
test.NNLS
# negative log binomial likelihood loss function
test.NNloglik <- SuperLearner(Y = Y, X = X, SL.library = SL.library,
verbose = TRUE, method = "method.NNloglik", family = binomial())
test.NNloglik
# 1 - AUC loss function
test.AUC <- SuperLearner(Y = Y, X = X, SL.library = SL.library,
verbose = TRUE, method = "method.AUC", family = binomial())
test.AUC
# 2
# adapted from library(SIS)
set.seed(1)
# training
b <- c(2, 2, 2, -3*sqrt(2))
n <- 150
p <- 200
truerho <- 0.5
corrmat <- diag(rep(1-truerho, p)) + matrix(truerho, p, p)
corrmat[, 4] = sqrt(truerho)
corrmat[4, ] = sqrt(truerho)
corrmat[4, 4] = 1
cholmat <- chol(corrmat)
x <- matrix(rnorm(n*p, mean=0, sd=1), n, p)
x <- x
feta <- x[, 1:4]
fprob <- exp(feta) / (1 + exp(feta))
y <- rbinom(n, 1, fprob)
# test
m <- 10000
newx <- matrix(rnorm(m*p, mean=0, sd=1), m, p)
newx <- newx
newfeta <- newx[, 1:4]
newfprob <- exp(newfeta) / (1 + exp(newfeta))
newy <- rbinom(m, 1, newfprob)
DATA2 <- data.frame(Y = y, X = x)
newDATA2 <- data.frame(Y = newy, X=newx)
create.SL.knn <- function(k = c(20, 30)) {
for(mm in seq(length(k))){
eval(parse(text = paste('SL.knn.', k[mm], '<- function(..., k = ', k[mm],
') SL.knn(..., k = k)', sep = '')), envir = .GlobalEnv)
}
invisible(TRUE)
}
create.SL.knn(c(20, 30, 40, 50, 60, 70))
# library with screening
SL.library <- list(c("SL.glmnet", "All"), c("SL.glm", "screen.randomForest"),
"SL.randomForest", "SL.knn", "SL.knn.20", "SL.knn.30", "SL.knn.40",
"SL.knn.50", "SL.knn.60", "SL.knn.70",
c("SL.polymars", "screen.randomForest"))
test <- SuperLearner(Y = DATA2$Y, X = DATA2[, -1], newX = newDATA2[, -1],
SL.library = SL.library, verbose = TRUE, family = binomial())
test
## examples with multicore
set.seed(23432, "L'Ecuyer-CMRG") # use L'Ecuyer for multicore seeds. see ?set.seed for details
## training set
n <- 500
p <- 50
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
colnames(X) <- paste("X", 1:p, sep="")
X <- data.frame(X)
Y <- X[, 1] + sqrt(abs(X[, 2] * X[, 3])) + X[, 2] - X[, 3] + rnorm(n)
## test set
m <- 1000
newX <- matrix(rnorm(m*p), nrow = m, ncol = p)
colnames(newX) <- paste("X", 1:p, sep="")
newX <- data.frame(newX)
newY <- newX[, 1] + sqrt(abs(newX[, 2] * newX[, 3])) + newX[, 2] - newX[, 3] + rnorm(m)
# generate Library and run Super Learner
SL.library <- c("SL.glm", "SL.randomForest",
"SL.polymars", "SL.mean")
testMC <- mcSuperLearner(Y = Y, X = X, newX = newX, SL.library = SL.library,
method = "method.NNLS")
testMC
## examples with snow
library(parallel)
cl <- makeCluster(2, type = "PSOCK") # can use different types here
clusterSetRNGStream(cl, iseed = 2343)
# make SL functions available on the clusters, use assignment to avoid printing
foo <- clusterEvalQ(cl, library(SuperLearner))
testSNOW <- snowSuperLearner(cluster = cl, Y = Y, X = X, newX = newX,
SL.library = SL.library, method = "method.NNLS")
testSNOW
stopCluster(cl)
## snow example with user-generated wrappers
# If you write your own wrappers and are using snowSuperLearner()
# These new wrappers need to be added to the SuperLearner namespace and exported to the clusters
# Using a simple example here, but can define any new SuperLearner wrapper
my.SL.wrapper <- function(...) SL.glm(...)
# assign function into SuperLearner namespace
environment(my.SL.wrapper) <-asNamespace("SuperLearner")
cl <- makeCluster(2, type = "PSOCK") # can use different types here
clusterSetRNGStream(cl, iseed = 2343)
# make SL functions available on the clusters, use assignment to avoid printing
foo <- clusterEvalQ(cl, library(SuperLearner))
clusterExport(cl, c("my.SL.wrapper")) # copy the function to all clusters
testSNOW <- snowSuperLearner(cluster = cl, Y = Y, X = X, newX = newX,
SL.library = c("SL.glm", "SL.mean", "my.SL.wrapper"), method = "method.NNLS")
testSNOW
stopCluster(cl)
## timing
replicate(5, system.time(SuperLearner(Y = Y, X = X, newX = newX,
SL.library = SL.library, method = "method.NNLS")))
replicate(5, system.time(mcSuperLearner(Y = Y, X = X, newX = newX,
SL.library = SL.library, method = "method.NNLS")))
cl <- makeCluster(2, type = 'PSOCK')
# make SL functions available on the clusters, use assignment to avoid printing
foo <- clusterEvalQ(cl, library(SuperLearner))
replicate(5, system.time(snowSuperLearner(cl, Y = Y, X = X, newX = newX,
SL.library = SL.library, method = "method.NNLS")))
stopCluster(cl)
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.