Description Usage Arguments Details Value References See Also Examples
View source: R/crossvalidation.R
Cross-validated estimation of the empirical risk for hyper-parameter selection.
1 2 3 4 5 6 7 | ## S3 method for class 'mboost'
cvrisk(object, folds = cv(model.weights(object)),
grid = 1:mstop(object),
papply = mclapply,
fun = NULL, corrected = TRUE, mc.preschedule = FALSE, ...)
cv(weights, type = c("bootstrap", "kfold", "subsampling"),
B = ifelse(type == "kfold", 10, 25), prob = 0.5, strata = NULL)
|
object |
an object of class |
folds |
a weight matrix with number of rows equal to the number
of observations. The number of columns corresponds to
the number of cross-validation runs. Can be computed
using function |
grid |
a vector of stopping parameters the empirical risk is to be evaluated for. |
papply |
(parallel) apply function, defaults to |
fun |
if |
corrected |
if |
mc.preschedule |
preschedule tasks if are parallelized using |
weights |
a numeric vector of weights for the model to be cross-validated. |
type |
character argument for specifying the cross-validation method. Currently (stratified) bootstrap, k-fold cross-validation and subsampling are implemented. |
B |
number of folds, per default 25 for |
prob |
percentage of observations to be included in the learning samples for subsampling. |
strata |
a factor of the same length as |
... |
additional arguments passed to |
The number of boosting iterations is a hyper-parameter of the
boosting algorithms implemented in this package. Honest,
i.e., cross-validated, estimates of the empirical risk
for different stopping parameters mstop
are computed by
this function which can be utilized to choose an appropriate
number of boosting iterations to be applied.
Different forms of cross-validation can be applied, for example
10-fold cross-validation or bootstrapping. The weights (zero weights
correspond to test cases) are defined via the folds
matrix.
cvrisk
runs in parallel on OSes where forking is possible
(i.e., not on Windows) and multiple cores/processors are available.
The scheduling
can be changed by the corresponding arguments of
mclapply
(via the dot arguments).
The function cv
can be used to build an appropriate
weight matrix to be used with cvrisk
. If strata
is defined
sampling is performed in each stratum separately thus preserving
the distribution of the strata
variable in each fold.
An object of class cvrisk
(when fun
wasn't specified), basically a matrix
containing estimates of the empirical risk for a varying number
of bootstrap iterations. plot
and print
methods
are available as well as a mstop
method.
Torsten Hothorn, Friedrich Leisch, Achim Zeileis and Kurt Hornik (2006), The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics, 14(3), 675–699.
Andreas Mayr, Benjamin Hofner, and Matthias Schmid (2012). The
importance of knowing when to stop - a sequential stopping rule for
component-wise gradient boosting. Methods of Information in
Medicine, 51, 178–186.
DOI: http://dx.doi.org/10.3414/ME11-02-0030
Verweij and van Houwelingen (1993). Cross-validation in survival analysis. Statistics in Medicine, 12:2305–2314.
AIC.mboost
for
AIC
based selection of the stopping iteration. Use mstop
to extract the optimal stopping iteration from cvrisk
object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | data("bodyfat", package = "TH.data")
### fit linear model to data
model <- glmboost(DEXfat ~ ., data = bodyfat, center = TRUE)
### AIC-based selection of number of boosting iterations
maic <- AIC(model)
maic
### inspect coefficient path and AIC-based stopping criterion
par(mai = par("mai") * c(1, 1, 1, 1.8))
plot(model)
abline(v = mstop(maic), col = "lightgray")
### 10-fold cross-validation
cv10f <- cv(model.weights(model), type = "kfold")
cvm <- cvrisk(model, folds = cv10f, papply = lapply)
print(cvm)
mstop(cvm)
plot(cvm)
### 25 bootstrap iterations (manually)
set.seed(290875)
n <- nrow(bodyfat)
bs25 <- rmultinom(25, n, rep(1, n)/n)
cvm <- cvrisk(model, folds = bs25, papply = lapply)
print(cvm)
mstop(cvm)
plot(cvm)
### same by default
set.seed(290875)
cvrisk(model, papply = lapply)
### 25 bootstrap iterations (using cv)
set.seed(290875)
bs25_2 <- cv(model.weights(model), type="bootstrap")
all(bs25 == bs25_2)
### trees
blackbox <- blackboost(DEXfat ~ ., data = bodyfat)
cvtree <- cvrisk(blackbox, papply = lapply)
plot(cvtree)
### cvrisk in parallel modes:
## Not run: ## parallel:::mclapply only runs properly on unix systems
cvrisk(model)
## End(Not run)
## Not run: ## infrastructure needs to be set up in advance
cl <- makeCluster(25) # e.g. to run cvrisk on 25 nodes via PVM
myApply <- function(X, FUN, ...) {
myFun <- function(...) {
library("mboost") # load mboost on nodes
FUN(...)
}
## further set up steps as required
parLapply(cl = cl, X, myFun, ...)
}
cvrisk(model, papply = myApply)
stopCluster(cl)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.