CoxBoost: Fit a Cox model by likelihood based boosting

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/CoxBoost.R

Description

CoxBoost is used to fit a Cox proportional hazards model by componentwise likelihood based boosting. It is especially suited for models with a large number of predictors and allows for mandatory covariates with unpenalized parameter estimates.

Usage

1
2
3
4
5
CoxBoost(time,status,x,unpen.index=NULL,standardize=TRUE,subset=1:length(time),
         weights=NULL,stepno=100,penalty=9*sum(status[subset]==1),
         criterion = c("pscore", "score","hpscore","hscore"),
         stepsize.factor=1,sf.scheme=c("sigmoid","linear"),pendistmat=NULL,
         connected.index=NULL,x.is.01=FALSE,return.score=TRUE,trace=FALSE) 

Arguments

time

vector of length n specifying the observed times.

status

censoring indicator, i.e., vector of length n with entries 0 for censored observations and 1 for uncensored observations. If this vector contains elements not equal to 0 or 1, these are taken to indicate events from a competing risk and a model for the subdistribution hazard with respect to event 1 is fitted (see e.g. Fine and Gray, 1999; Binder et al. 2009a).

x

n * p matrix of covariates.

unpen.index

vector of length p.unpen with indices of mandatory covariates, where parameter estimation should be performed unpenalized.

standardize

logical value indicating whether covariates should be standardized for estimation. This does not apply for mandatory covariates, i.e., these are not standardized.

subset

a vector specifying a subset of observations to be used in the fitting process.

weights

optional vector of length n, for specifying weights for the individual observations.

penalty

penalty value for the update of an individual element of the parameter vector in each boosting step.

criterion

indicates the criterion to be used for selection in each boosting step. "pscore" corresponds to the penalized score statistics, "score" to the un-penalized score statistics. Different results will only be seen for un-standardized covariates ("pscore" will result in preferential selection of covariates with larger covariance), or if different penalties are used for different covariates. "hpscore" and "hscore" correspond to "pscore" and "score". However, a heuristic is used for evaluating only a subset of covariates in each boosting step, as described in Binder et al. (2011). This can considerably speed up computation, but may lead to different results.

stepsize.factor

determines the step-size modification factor by which the natural step size of boosting steps should be changed after a covariate has been selected in a boosting step. The default (value 1) implies constant penalties, for a value < 1 the penalty for a covariate is increased after it has been selected in a boosting step, and for a value > 1 the penalty it is decreased. If pendistmat is given, penalty updates are only performed for covariates that have at least one connection to another covariate.

sf.scheme

scheme for changing step sizes (via stepsize.factor). "linear" corresponds to the scheme described in Binder and Schumacher (2009b), "sigmoid" employs a sigmoid shape.

pendistmat

connection matrix with entries ranging between 0 and 1, with entry (i,j) indicating the certainty of the connection between covariates i and j. According to this information penalty changes due to stepsize.factor < 1 are propagated, i.e., if entry (i,j) is non-zero, the penalty for covariate j is decreased after it has been increased for covariate i, after it has been selected in a boosting step. This matrix either has to have dimension (p - p.unpen) * (p - p.unpen) or the indicices of the p.connected connected covariates have to be given in connected.index, in which case the matrix has to have dimension p.connected * p.connected. Generally, sparse matices from package Matrix can be used to save memory.

connected.index

indices of the p.connected connected covariates, for which pendistmat provides the connection information for distributing changes in penalties. No overlap with unpen.index is allowed. If NULL, and a connection matrix is given, all covariates are assumed to be connected.

stepno

number of boosting steps (m).

x.is.01

logical value indicating whether (the non-mandatory part of) x contains just values 0 and 1, i.e., binary covariates. If this is the case and indicated by this argument, computations are much faster.

return.score

logical value indicating whether the value of the score statistic (or penalized score statistic, depending on criterion), as evaluated in each boosting step for every covariate, should be returned. The corresponding element scoremat can become very large (and needs much memory) when the number of covariates and boosting steps is large.

trace

logical value indicating whether progress in estimation should be indicated by printing the name of the covariate updated.

Details

In contrast to gradient boosting (implemented e.g. in the glmboost routine in the R package mboost, using the CoxPH loss function), CoxBoost is not based on gradients of loss functions, but adapts the offset-based boosting approach from Tutz and Binder (2007) for estimating Cox proportional hazards models. In each boosting step the previous boosting steps are incorporated as an offset in penalized partial likelihood estimation, which is employed for obtain an update for one single parameter, i.e., one covariate, in every boosting step. This results in sparse fits similar to Lasso-like approaches, with many estimated coefficients being zero. The main model complexity parameter, which has to be selected (e.g. by cross-validation using cv.CoxBoost), is the number of boosting steps stepno. The penalty parameter penalty can be chosen rather coarsely, either by hand or using optimCoxBoostPenalty.

The advantage of the offset-based approach compared to gradient boosting is that the penalty structure is very flexible. In the present implementation this is used for allowing for unpenalized mandatory covariates, which receive a very fast coefficient build-up in the course of the boosting steps, while the other (optional) covariates are subjected to penalization. For example in a microarray setting, the (many) microarray features would be taken to be optional covariates, and the (few) potential clinical covariates would be taken to be mandatory, by including their indices in unpen.index.

If a group of correlated covariates has influence on the response, e.g. genes from the same pathway, componentwise boosting will often result in a non-zero estimate for only one member of this group. To avoid this, information on the connection between covariates can be provided in pendistmat. If then, in addition, a penalty updating scheme with stepsize.factor < 1 is chosen, connected covariates are more likely to be chosen in future boosting steps, if a directly connected covariate has been chosen in an earlier boosting step (see Binder and Schumacher, 2009b).

Value

CoxBoost returns an object of class CoxBoost.

n, p

number of observations and number of covariates.

stepno

number of boosting steps.

xnames

vector of length p containing the names of the covariates. This information is extracted from x or names following the scheme V1, V2, ...

are used.

coefficients

(stepno+1) * p matrix containing the coefficient estimates for the (standardized) optional covariates for boosting steps 0 to stepno. This will typically be a sparse matrix, built using package Matrix

.

scoremat

stepno * p matrix containing the value of the score statistic for each of the optional covariates before each boosting step.

meanx, sdx

vector of mean values and standard deviations used for standardizing the covariates.

unpen.index

indices of the mandatory covariates in the original covariate matrix x.

penalty

If stepsize.factor != 1, stepno * (p - p.unpen) matrix containing the penalties used for every boosting step and every penalized covariate, otherwise a vector containing the unchanged values of the penalty employed in each boosting step.

time

observed times given in the CoxBoost call.

status

censoring indicator given in the CoxBoost call.

event.times

vector with event times from the data given in the CoxBoost call.

linear.predictors

(stepno+1) * n matrix giving the linear predictor for boosting steps 0 to stepno and every observation.

Lambda

matrix with the Breslow estimate for the cumulative baseline hazard for boosting steps 0 to stepno for every event time.

logplik

partial log-likelihood of the fitted model in the final boosting step.

Author(s)

Written by Harald Binder binderh@uni-mainz.de.

References

Binder, H., Benner, A., Bullinger, L., and Schumacher, M. (2013). Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures. Statistics in Medicine, doi: 10.1002/sim.5490.

Binder, H., Allignol, A., Schumacher, M., and Beyersmann, J. (2009). Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics, 25:890-896.

Binder, H. and Schumacher, M. (2009). Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics. 10:18.

Binder, H. and Schumacher, M. (2008). Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics. 9:14.

Tutz, G. and Binder, H. (2007) Boosting ridge regression. Computational Statistics \& Data Analysis, 51(12):6044-6059.

Fine, J. P. and Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 94:496-509.

See Also

predict.CoxBoost, cv.CoxBoost.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#   Generate some survival data with 10 informative covariates 
n <- 200; p <- 100
beta <- c(rep(1,10),rep(0,p-10))
x <- matrix(rnorm(n*p),n,p)
real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
cens.time <- rexp(n,rate=1/10)
status <- ifelse(real.time <= cens.time,1,0)
obs.time <- ifelse(real.time <= cens.time,real.time,cens.time)

#   Fit a Cox proportional hazards model by CoxBoost

cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=100,penalty=100) 
summary(cbfit)

#   ... with covariates 1 and 2 being mandatory

cbfit.mand <- CoxBoost(time=obs.time,status=status,x=x,unpen.index=c(1,2),
                       stepno=100,penalty=100) 
summary(cbfit.mand)

CoxBoost documentation built on May 1, 2019, 9:32 p.m.

Related to CoxBoost in CoxBoost...