cv.GAMBoost: Cross-validation for GAMBoost fits

Description Usage Arguments Value Author(s) See Also Examples

Description

Performs a K-fold cross-validation for GAMBoost in search for the optimal number of boosting steps.

Usage

1
2
3
4
5
6
cv.GAMBoost(x=NULL,y,x.linear=NULL,subset=NULL,maxstepno=500,
            family=binomial(),weights=rep(1,length(y)),
            calc.hat=TRUE,calc.se=TRUE,trace=FALSE,
            parallel=FALSE,upload.x=TRUE,multicore=FALSE,folds=NULL,
            K=10,type=c("loglik","error","L2"),pred.cutoff=0.5,
            just.criterion=FALSE,...) 

Arguments

x

n * p matrix of covariates with potentially non-linear influence. If this is not given (and argument x.linear is employed), a generalized linear model is fitted.

y

response vector of length n.

x.linear

optional n * q matrix of covariates with linear influence.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

maxstepno

maximum number of boosting steps to evaluate.

family,weights,calc.hat,calc.se

arguments passed to GAMBoost.

trace

logical value indicating whether information on progress should be printed.

parallel

logical value indicating whether computations in the cross-validation folds should be performed in parallel on a compute cluster, using package snowfall. Parallelization is performed via the package snowfall and the initialization function of of this package, sfInit, should be called before calling cv.GAMBoost.

upload.x

logical value indicating whether x and x.linear should/have to be uploaded to the compute cluster for parallel computation. Uploading these only once (using sfExport(x,x.linear) from library snowfall) can save much time for large data sets.

multicore

indicates whether computations in the cross-validation folds should be performed in parallel, using package multicore. If TRUE, package multicore is employed using the default number of cores. A value larger than 1 is taken to be the number of cores that should be employed.

folds

if not NULL, this has to be a list of length K, each element being a vector of indices of fold elements. Useful for employing the same folds for repeated runs.

K

number of folds to be used for cross-validation.

type, pred.cutoff

goodness-of-fit criterion: likelihood ("loglik"), error rate for binary response data ("error") or squared error for others ("L2"). For binary response data and the "error" criterion pred.cutoff specifies the p value cutoff for prediction of class 1 vs 0.

just.criterion

logical value indicating wether a list with the goodness-of-fit information should be returned or a GAMBoost fit with the optimal number of steps.

...

miscellaneous parameters for the calls to GAMBoost

Value

GAMBoost fit with the optimal number of boosting steps or list with the following components:

criterion

vector with goodness-of fit criterion for boosting step 1 , ... , maxstep

se

vector with standard error estimates for the goodness-of-fit criterion in each boosting step.

selected

index of the optimal boosting step.

folds

list of length K, where the elements are vectors of the indices of observations in the respective folds.

Author(s)

Harald Binder binderh@uni-mainz.de

See Also

GAMBoost

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Not run: 
##  Generate some data 

x <- matrix(runif(100*8,min=-1,max=1),100,8)             
eta <- -0.5 + 2*x[,1] + 2*x[,3]^2
y <- rbinom(100,1,binomial()$linkinv(eta))

##  Fit the model with smooth components

gb1 <- GAMBoost(x,y,penalty=400,stepno=100,trace=TRUE,family=binomial()) 

##  10-fold cross-validation with prediction error as a criterion

gb1.crit <- cv.GAMBoost(x,y,penalty=400,maxstepno=100,trace=TRUE,
                        family=binomial(),
                        K=10,type="error",just.criterion=TRUE)

##  Compare AIC and estimated prediction error

which.min(gb1$AIC)          
which.min(gb1.crit$criterion)

## End(Not run)

GAMBoost documentation built on May 2, 2019, 12:40 p.m.