Description Usage Arguments Details Value Author(s) References See Also Examples
Formula interface for fitting a Cox proportional hazards model by componentwise likelihood based boosting (via a call to CoxBoost
), where cross-validation can be performed automatically for determining the number of boosting steps (via a call to cv.CoxBoost
).
1 2 3 4 5 |
formula |
A formula describing the model to be fitted, similar to a call to |
data |
data frame containing the variables described in the formula. |
weights |
optional vector, for specifying weights for the individual observations. |
subset |
a vector specifying a subset of observations to be used in the fitting process. |
mandatory |
vector containing the names of the covariates whose effect is to be estimated un-regularized. |
cause |
cause of interest in a competing risks setting, when the response is specified by |
standardize |
logical value indicating whether covariates should be standardized for estimation. This does not apply for mandatory covariates, i.e., these are not standardized. |
stepno |
maximum number of boosting steps to be evaluated when determining the number of boosting steps by cross-validation, otherwise the number of boosting seps itself. |
criterion |
indicates the criterion to be used for selection in each boosting step. |
nu |
(roughly) the fraction of the partial maximum likelihood estimate used for the update in each boosting step. This is converted into a penalty for the call to |
stepsize.factor |
determines the step-size modification factor by which the natural step size of boosting steps should be changed after a covariate has been selected in a boosting step. The default (value |
varlink |
list for specifying links between covariates, used to re-distribute step sizes when |
cv |
|
trace |
logical value indicating whether progress in estimation should be indicated by printing the name of the covariate updated. |
... |
miscellaneous arguments, passed to the call to |
In contrast to gradient boosting (implemented e.g. in the glmboost
routine in the R package mboost
, using the CoxPH
loss function), CoxBoost
is not based on gradients of loss functions, but adapts the offset-based boosting approach from Tutz and Binder (2007) for estimating Cox proportional hazards models. In each boosting step the previous boosting steps are incorporated as an offset in penalized partial likelihood estimation, which is employed for obtain an update for one single parameter, i.e., one covariate, in every boosting step. This results in sparse fits similar to Lasso-like approaches, with many estimated coefficients being zero. The main model complexity parameter, the number of boosting steps, is automatically selected by cross-validation using a call to cv.CoxBoost
). Note that this will introduce random variation when repeatedly calling iCoxBoost
, i.e. it is advised to set/save the random number generator state for reproducible results.
The advantage of the offset-based approach compared to gradient boosting is that the penalty structure is very flexible. In the present implementation this is used for allowing for unpenalized mandatory covariates, which receive a very fast coefficient build-up in the course of the boosting steps, while the other (optional) covariates are subjected to penalization.
For example in a microarray setting, the (many) microarray features would be taken to be optional covariates, and the (few) potential clinical covariates would be taken to be mandatory, by including their names in mandatory
.
If a group of correlated covariates has influence on the response, e.g. genes from the same pathway, componentwise boosting will often result in a non-zero estimate for only one member of this group. To avoid this, information on the connection between covariates can be provided in varlink
. If then, in addition, a penalty updating scheme with stepsize.factor
< 1 is chosen, connected covariates are more likely to be chosen in future boosting steps, if a directly connected covariate has been chosen in an earlier boosting step (see Binder and Schumacher, 2009b).
iCoxBoost
returns an object of class iCoxBoost
, which also has class CoxBoost
. In addition to the elements from CoxBoost
it has the following elements:
call, formula, terms |
call, formula and terms from the formula interface. |
cause |
cause of interest. |
cv.res |
result from |
Written by Harald Binder binderh@uni-mainz.de.
Binder, H., Benner, A., Bullinger, L., and Schumacher, M. (2013). Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures. Statistics in Medicine, doi: 10.1002/sim.5490.
Binder, H., Allignol, A., Schumacher, M., and Beyersmann, J. (2009). Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics, 25:890-896.
Binder, H. and Schumacher, M. (2009). Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics. 10:18.
Binder, H. and Schumacher, M. (2008). Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics. 9:14.
Tutz, G. and Binder, H. (2007) Boosting ridge regression. Computational Statistics \& Data Analysis, 51(12):6044-6059.
Fine, J. P. and Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 94:496-509.
predict.iCoxBoost
, CoxBoost
, cv.CoxBoost
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # Generate some survival data with 10 informative covariates
n <- 200; p <- 100
beta <- c(rep(1,2),rep(0,p-2))
x <- matrix(rnorm(n*p),n,p)
actual.data <- as.data.frame(x)
real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
cens.time <- rexp(n,rate=1/10)
actual.data$status <- ifelse(real.time <= cens.time,1,0)
actual.data$time <- ifelse(real.time <= cens.time,real.time,cens.time)
# Fit a Cox proportional hazards model by iCoxBoost
cbfit <- iCoxBoost(Surv(time,status) ~ .,data=actual.data)
summary(cbfit)
plot(cbfit)
# ... with covariates 1 and 2 being mandatory
cbfit.mand <- iCoxBoost(Surv(time,status) ~ .,data=actual.data,mandatory=c("V1"))
summary(cbfit.mand)
plot(cbfit.mand)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.