Description Usage Arguments Details Value Author(s) References See Also Examples
A Prediction Function for the Two-stage Super Learner. The twostageSL
function takes a training set pair (X,Y) and returns the predicted values based on a validation set.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Y |
The outcome in the training data set. Must be a numeric vector. |
X |
The predictor variables in the training data set, usually a data.frame. |
newX |
The predictor variables in the validation data set. The structure should match X. If missing, uses X for newX. |
library.2stage |
Candidate prediction algorithms in two-stage super learner. A list containing prediction algorithms at stage 1 and stage 2, the prediction algorithms are either a character vector or a list containing character vectors. See details below for examples on the structure. A list of functions included in the |
library.1stage |
Candidate prediction algorithms in standard super learner. Either a character vector of prediction algorithms or a list containing character vectors. See details below for examples on the structure. A list of functions included in the |
twostage |
logical; TRUE for implementing two-stage super learner; FALSE for implementing standatd super learner |
family.1 |
Error distribution of the stage 1 outcome for two-stage super learner. Currently only allows |
family.2 |
Error distribution of the stage 2 outcome for two-stage super learner. Currently only allows |
family.single |
Error distribution of the outcome for standard super learner. Currently only allows |
method |
Details on estimating the coefficients for the two-stage super learner and the model to combine the individual algorithms in the library. Currently, the built in option is only "method.CC_LS.scale" (default) which is a scaled version of CC_LS. CC_LS.scale uses Goldfarb and Idnani's quadratic programming algorithm to calculate the best convex combination of weights to minimize the squared error loss. In addition, CC_LS.scale divides the quadratic function by a large constant to shrink the huge matrix and vector in quadratic function. |
id |
Optional cluster identification variable. For the cross-validation splits, |
verbose |
logical; TRUE for printing progress during the computation (helpful for debugging). |
control |
A list of parameters to control the estimation process. Parameters include |
cvControl |
A list of parameters to control the cross-validation process. Parameters include |
obsWeights |
Optional observation weights variable. As with |
env |
Environment containing the learner functions. Defaults to the calling environment. |
twostageSL
fits the two-stage super learner prediction algorithm. The weights for each algorithm in library.2stage
and library.1stage
is estimated, along with the fit of each algorithm.
The prescreen algorithms. These algorithms first rank the variables in X
based on either a univariate regression p-value or the randomForest
variable importance. A subset of the variables in X
is selected based on a pre-defined cut-off. With this subset of the X variables, the algorithms in library.2stage
and library.1stage
are then fit.
The twostageSL package contains a few prediction and screening algorithm wrappers. The full list of wrappers can be viewed with listWrappers()
. The design of the twostageSL package is such that the user can easily add their own wrappers.
An object with S3 class twostageSL
containing:
call |
The matched call. |
libraryNames |
A character vector with the names of the algorithms in the library. The format is 'predictionAlgorithm_screeningAlgorithm' with '_All' used to denote the prediction algorithm run on all variables in X. |
library.Num |
Number of prediction algorithms in |
orig.library |
Returns the prediction algorithms and screening algorithms in each stage of |
SL.library |
Returns the prediction algorithms and screening algorithms in |
SL.predict |
The predicted values from the two-stage super learner for the rows in |
coef |
Coefficients for the two-stage super learner. |
library.predict |
A matrix with the predicted values from each algorithm in |
Z |
The Z matrix (the cross-validated predicted values for each algorithm in |
cvRisk |
A numeric vector with the V-fold cross-validated risk estimate for each algorithm in |
family |
Returns the |
fitLibrary |
A list with the fitted objects for each algorithm in |
cvfitLibrary |
A list with fitted objects for each algorithm in |
varNames |
A character vector with the names of the variables in |
validRows |
A list containing the row numbers for the V-fold cross-validation step. |
number0 |
A dataframe indicating the number of zeros in each of the |
method |
A list with the method functions. |
whichScreen |
A logical matrix indicating which variables passed each screening algorithm. |
control |
The |
cvControl |
The |
errorsInCVLibrary |
A logical vector indicating if any algorithms experienced an error within the CV step. |
errorsInLibrary |
A logical vector indicating if any algorithms experienced an error on the full data. |
data |
The data frame including the predict variables and outcome in the training data set. |
env |
Environment passed into function which will be searched to find the learner functions. Defaults to the calling environment. |
times |
A list that contains the execution time of the twostageSL, plus separate times for model fitting and prediction. |
Ziyue Wu
van der Laan, M. J., Polley, E. C. and Hubbard, A. E. (2008) Super Learner, Statistical Applications of Genetics and Molecular Biology, 6, article 25.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | ## simulate data
set.seed(123)
## training set
n <- 10000
p <- 5
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
colnames(X) <- paste("X", 1:p, sep="")
X <- data.frame(X)
Y <- rep(NA,n)
## probability of outcome being zero
prob <- plogis(1 + X[,1] + X[,2] + X[,1]*X[,2])
g <- rbinom(n,1,prob)
## assign zero outcome
ind <- g==0
Y[ind] <- 0
## assign non-zero outcome
ind <- g==1
Y[ind] <- 10 + X[ind, 1] + sqrt(abs(X[ind, 2] * X[ind, 3])) + X[ind, 2] - X[ind, 3] + rnorm(sum(ind))
## test set
m <- 1000
newX <- matrix(rnorm(m*p), nrow = m, ncol = p)
colnames(newX) <- paste("X", 1:p, sep="")
newX <- data.frame(newX)
newY <- rep(NA,m)
## probability of outcome being zero
newprob <- plogis(1 + newX[,1] + newX[,2] + newX[,1]*newX[,2])
newg <- rbinom(m,1,newprob)
## assign zero outcome
newind <- newg==0
newY[newind] <- 0
## assign non-zero outcome
newind <- g==1
newY[newind] <- 10 + newX[newind, 1] + sqrt(abs(newX[newind, 2] * newX[newind, 3])) + newX[newind, 2] - X[newind, 3] + rnorm(sum(newind))
## generate the Library
twostage.library <- list(stage1=c("SL.glm","SL.mean","SL.earth"),
stage2=c("SL.glm","SL.mean","SL.earth"))
onestage.library <- c("SL.glm","SL.mean","SL.earth")
## run the twostage super learner
two <- twostageSL(Y=Y,
X=X,
newX = newX,
library.2stage <- twostage.library,
library.1stage <- onestage.library,
twostage = TRUE,
family.1=binomial,
family.2=gaussian,
family.single=gaussian,
cvControl = list(V = 5))
two
## run the standard super learner
one <- twostageSL(Y=Y,
X=X,
newX = newX,
library.2stage <- twostage.library,
library.1stage <- onestage.library,
twostage = FALSE,
family.1=binomial,
family.2=gaussian,
family.single=gaussian,
cvControl = list(V = 5))
one
## library with screening
twostage.library <- list(stage1=list(c("SL.glm","screen.glmnet"),
c("SL.earth","screen.corP"),
c("SL.mean","All")),
stage2=list(c("SL.glm","screen.glmnet"),
c("SL.earth","screen.corP"),
c("SL.mean","All")))
onestage.library <- list(c("SL.glm","screen.glmnet"),
c("SL.earth","screen.corP"),
c("SL.mean","All"))
## run the twostage super learner
two <- twostageSL(Y=Y,
X=X,
newX = newX,
library.2stage <- twostage.library,
library.1stage <- onestage.library,
twostage = TRUE,
family.1=binomial,
family.2=gaussian,
family.single=gaussian,
cvControl = list(V = 5))
two
## run the standard super learner
one <- twostageSL(Y=Y,
X=X,
newX = newX,
library.2stage <- twostage.library,
library.1stage <- onestage.library,
twostage = FALSE,
family.1=binomial,
family.2=gaussian,
family.single=gaussian,
cvControl = list(V = 5))
one
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.