Determines the optimal number of boosting steps by cross-validation

Description

Performs a K-fold cross-validation for CoxBoost in search for the optimal number of boosting steps.

Usage

1
2
3
4
cv.CoxBoost(time,status,x,subset=1:length(time),maxstepno=100,K=10,
			type=c("verweij","naive"),
            parallel=FALSE,upload.x=TRUE,multicore=FALSE,
            folds=NULL,trace=FALSE,...) 

Arguments

time

vector of length n specifying the observed times.

status

censoring indicator, i.e., vector of length n with entries 0 for censored observations and 1 for uncensored observations. If this vector contains elements not equal to 0 or 1, these are taken to indicate events from a competing risk and a model for the subdistribution hazard with respect to event 1 is fitted (see e.g. Fine and Gray, 1999).

x

n * p matrix of covariates.

subset

a vector specifying a subset of observations to be used in the fitting process.

maxstepno

maximum number of boosting steps to evaluate, i.e, the returned “optimal” number of boosting steps will be in the range [0,maxstepno].

K

number of folds to be used for cross-validation. If K is larger or equal to the number of non-zero elements in status, leave-one-out cross-validation is performed.

type

way of calculating the partial likelihood contribution of the observation in the hold-out folds: "verweij" uses the more appropriate method described in Verweij and van Houwelingen (1996), "naive" uses the approach where the observations that are not in the hold-out folds are ignored (often found in other R packages).

parallel

logical value indicating whether computations in the cross-validation folds should be performed in parallel on a compute cluster, using package snowfall. Parallelization is performed via the package snowfall and the initialization function of of this package, sfInit, should be called before calling cv.CoxBoost.

multicore

indicates whether computations in the cross-validation folds should be performed in parallel, using package parallel. If TRUE, package parallel is employed using the default number of cores. A value larger than 1 is taken to be the number of cores that should be employed.

upload.x

logical value indicating whether x should/has to be uploaded to the compute cluster for parallel computation. Uploading this only once (using sfExport(x) from library snowfall) can save much time for large data sets.

folds

if not NULL, this has to be a list of length K, each element being a vector of indices of fold elements. Useful for employing the same folds for repeated runs.

trace

logical value indicating whether progress in estimation should be indicated by printing the number of the cross-validation fold and the index of the covariate updated.

...

miscellaneous parameters for the calls to CoxBoost

Value

List with the following components:

mean.logplik

vector of length maxstepno+1 with the mean partial log-likelihood for boosting steps 0 to maxstepno

se.logplik

vector with standard error estimates for the mean partial log-likelihood criterion for each boosting step.

optimal.step

optimal boosting step number, i.e., with minimum mean partial log-likelihood.

folds

list of length K, where the elements are vectors of the indices of observations in the respective folds.

Author(s)

Harald Binder binderh@uni-mainz.de

References

Verweij, P. J. M. and van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine, 12(24):2305-2314.

See Also

CoxBoost, optimCoxBoostPenalty

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## Not run: 
#   Generate some survival data with 10 informative covariates 
n <- 200; p <- 100
beta <- c(rep(1,10),rep(0,p-10))
x <- matrix(rnorm(n*p),n,p)
real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
cens.time <- rexp(n,rate=1/10)
status <- ifelse(real.time <= cens.time,1,0)
obs.time <- ifelse(real.time <= cens.time,real.time,cens.time)


#  10-fold cross-validation

cv.res <- cv.CoxBoost(time=obs.time,status=status,x=x,maxstepno=500,
                      K=10,type="verweij",penalty=100) 

#   examine mean partial log-likelihood in the course of the boosting steps
plot(cv.res$mean.logplik)   

#   Fit with optimal number of boosting steps

cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=cv.res$optimal.step,
                  penalty=100) 
summary(cbfit)


## End(Not run)