sprinter: Main function for building prognostic models by considering...
In sprinter: Framework for Screening Prognostic Interactions

Description Usage Arguments Details Value Author(s) References See Also Examples

The function sprinter builds a prognostic model by preselecting interactions and main effects before fitting a regression model.

sprinter(x, 
         time, 
         status= rep(0, nrow(x)),
         mandatory= NULL,
         repetitions = 25, 
         n.inter.candidates =1000, 
         screen.main, 
         screen.inter = fit.rf,
         fit.final = screen.main, 
         args.screen.main = list(), 
         args.screen.inter = list(),
         args.fit.final = args.screen.main, 
         orthogonalize = TRUE, cutoff = 0,
         parallel = FALSE, mc.cores = detectCores(),
         ...)

`x`	n * p matrix of covariates.
`time`	vector of length `n` specifying the observed times.
`status`	censoring indicator, i.e., vector of length `n` with entries `0` for censored observations and 1 for uncensored observations. This optional argument is neccessary in time-to-event data.
`mandatory`	vector with variable names of mandatory covariates, where parameter estimation should be performed unpenalized.
`repetitions`	number of repetitions of the interaction screening approach. Repetitions are performed by creating subsamples and applying the interaction screening approach on each subsample dataset separately.
`n.inter.candidates`	minimal number of potential interaction candidates, which are considered in the final model building step.
`screen.main`	function for screening potential main effects. `fit.uniCox` performs univariate Cox-regressions and selects the main effects by their adjusted p-values. `fit.CoxBoost` performs a variable selection by fitting a Cox model by likelihood based boosting. Other methods are possible to be implemented as adaptive functions for the usage in `sprinter`. For more details see `fit.uniCox`
`screen.inter`	function for detecting potential interaction candidates. `fit.rf` performs a random forest and `fit.logicReg` performs a logic regression. Other methods are possible to implement as adaptive functions for the use in `sprinter`. For more details see `fit.rf`.
`fit.final`	function for building the final Cox proportional hazards model. Default is the function set in screen.main.
`args.screen.main`	list of arguments which should be used in the main effects detection step.
`args.screen.inter`	list of arguments which should be used in the interaction screening step.
`args.fit.final`	list of arguments which should be used in the final model building step.
`orthogonalize`	logical value. If true all variables are made orthogonal to those that are assessed as main effects by `screen.main`.
`cutoff`	value or function to evaluate a value according to the variable importance. The cutoff is used to select variables for evaluating the pairwise inclusion frequencies.
`parallel`	logical value indicating whether the interaction screening step should be performed parallel.
`mc.cores`	the numbers of cores to use, if parallel = TRUE.
`...`	additional arguments.

A call to the sprinter-function fits a prognostic model to time-to-event data by combining available statistical components. The modular structure secures the simultaneously consideration of two important elements in a model: interactions and main effects.
Interactions play an important role in molecular applications, because some effects arises only if two genes are differentially expressed at the same time. Therefore, it is important to consider interaction terms when predicting a clinical outcome. The method which is used to preselect interactions is set in screen.inter.
The interactions are preselected with respect to potential main effects. Therefore, a main effects model is performed for extracting the existing main effects and to be able to compare the main effects model with the final model to show the benefit of considering interactions. The method which is used to perform a main effects model is set in screen.main.
The final model is performed by the main effects resulting from screen.main and the interactions resulting from screen.inter as covariates. The method which is used for building the final model is set in fit.final. As default the same method is used as in screen.main. In the following the three components of the framework are explained more in detail: (1) Fitting a main effects model, (2) adjusting the data and pre-selecting interaction terms and (3) building the comprehensive model inclusind promising interactions.

For screening main effects, sprinter allows any approach which can handle with high-dimensional datasets with time-to-event settings. The following two established approaches have already been prepared for usage: (A) Univariate-Cox regression with adjusted p-values (fit.uniCox) and (B) CoxBoost (fit.CoxBoost). Other approaches can also be implemented (see fit.CoxBoost).

For screening the interaction effects, sprinter offers the random forest method fit.rf and logic regression fit.logicReg for pre-selecting interactions. For each variable a variable importance measurement is calculated that considers the underlying interaction structure and reflects the meaning of a variable for the forest or the logic regression, respectively. The variable importance is used to construct the relevant interactions for the model. Before pre-selecting the interactions, the data are modified so that weaker interaction effects that are originally overlaid by stronger main effects can be detected. To achieve this, the data are orthogonalized by computing residuals corresponding to the selected main effects and the mandatory covariates.
For better stabilization subsamples are created and the interaction detection approach is performed on each subsampled dataset. As this step can be computationally expensive it is possible to parallelize this step, by parallel = TRUE. To summarize the results of all subsamples, pairwise variable inclusion frequencies of the constructed interactions terms are computed and the n.inter.candidates most frequent pairs are selected as relevant interaction terms. Other approaches can also be implemented (see fit.rf).

For building the final model, the user can set the desired method in fit.final. If no method is required the same method is used as for building the main effects model. In contrast to building the main effects model, the final model is constructed by the variables selected in the main effects model together with the n.inter.candidates pre-selected interactions of the screening step.

An object of class (sprinter) with the following components:

`n.inter.candidates`	Number of potential interaction candidates considered in the final model building.
`inter.candidates`	Vector of length `n.inter.candidates` with the potential interaction candidates considered in the final model building.
`main.model`	Main effects model. The class depends on the function used in `screen.main`.
`final.model`	Final model. The class depends on the function used in `fit.final`.
`xnames`	vector of the variable names used in the final model.

Written by Isabell Hoffmann isabell.hoffmann@uni-mainz.de.

Sariyar, M., Hoffmann, I. Binder, J. (2014). Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data. BMC Bioinformatics 15:58.

rfsrc, coxph, CoxBoost

## Not run: 
##---------------------------
## Survival analysis
##---------------------------

#############################
# Fit a Cox proportional hazards model by CoxBoost by considering 
# interactions after screening interactions by random forest
# system.time:
#   user  system elapsed 
# 370.97    2.32  374.31
# For a faster run set repetitions down!
#############################

# Create survival data with interactions:
simulation <- simul.int(287578,n = 200, p = 500,
                          beta.int = 1.0,
                          beta.main = 0.9, 
                          censparam = 1/20, 
                          lambda = 1/20)
data <- simulation$data

# Showing True Effects:
simulation$info

# Perform the sprinter approach:
set.seed(123)
testcb <- sprinter( x=data[,1:500],  
                    time = data$obs.time,
                    status= data$obs.status,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.CoxBoost, 
                    fit.final = fit.CoxBoost, 
                    args.screen.main = list(seed=123,stepno = 10, K = 10, 
                                            criterion ='pscore', nu = 0.05),
                    parallel = FALSE)
summary(testcb)



##########
# Fit a Cox proportional hazards model by considering 
# interactions after screening interactions by random forest
# and selecting relevant effects by univariate Cox regression: 
# system.time:
#   user  system elapsed 
# 374.50    1.53  376.68 
# For a faster run set repetitions down!
##########

# Create survival data with interactions:
data <- simul.int(287578,n = 200, p = 500,
                          beta.int = 1.0,
                          beta.main = 0.9, 
                          censparam = 1/20, 
                          lambda = 1/20)[[1]]


# Perform the sprinter approach:
set.seed(123)
testunicox <- sprinter( x=data[,1:500],  
                    time = data$obs.time,
                    status= data$obs.status,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.uniCox, 
                    fit.final = fit.uniCox, 
                    parallel = FALSE)


summary(testunicox)


# true coefficients:
# ID1   ID2   ID5:ID6   ID7:ID8
# 0.9  -0.9      1         -1


##---------------------------
## Continuous outcome
##---------------------------

# selection of main effects by univariate generalized 
# linear models and pre-selections of interactions 
# by random forest:
sprinter.glm.rf.con <- sprinter( x=data[,1:500],  
                    time = data$obs.time,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.uniGlm, 
                    fit.final = fit.uniGlm, 
                    parallel = FALSE)

# selection of main effects by univariate generalized 
# linear models and pre-selections of interactions 
# by logic regression:
sprinter.glm.logicR.con <- sprinter( x=data[,1:500],  
                    time = data$obs.time,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.uniGlm,
                    screen.inter = fit.logicReg,
                    fit.final = fit.uniGlm, 
                    args.screen.inter = list(type = 2),
                    parallel = FALSE)

# selection of main effects by GAMBoost 
#  and pre-selections of interactions 
# by random forest:
sprinter.gamboost.rf.con <- sprinter( x=data[,1:500],  
                    time = data$obs.time,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.GAMBoost, 
                    args.screen.main = list(stepno = 10),
                    fit.final = fit.GAMBoost, 
                    parallel = FALSE)
                    

##---------------------------
## Binary outcome 
##---------------------------
x <- matrix(runif(200*500,min=-1,max=1),200,500)  
colnames(x) <- paste('ID', 1:500, sep = '')
eta <- -0.5 + 2*x[,1] - 2*x[,3] + 2 * x[,3]*x[,4]
y <- rbinom(200,1,binomial()$linkinv(eta))


# selection of main effects by univariate generalized 
# linear models and pre-selections of interactions 
# by random forest:
sprinter.glm.rf.bin <- sprinter( x=x[,1:500],  
                    time = y,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.uniGlm, 
                    fit.final = fit.uniGlm, 
                    args.screen.main = list(family = binomial()),
                    parallel = FALSE)
                    
# selection of main effects by univariate generalized 
# linear models and pre-selections of interactions 
# by logic regression:
sprinter.glm.logicR.bin <- sprinter( x=x[,1:500],  
                    time = y,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.uniGlm,
                    screen.inter = fit.logicReg,
                    fit.final = fit.uniGlm, 
                    args.screen.inter = list(type = 3),
                    parallel = FALSE)
     

# selection of main effects by GAMBoost and pre-selection of 
# interactions by random forest:

sprinter.GAMBoost.rf.bin <- sprinter( x=x,  
                    time = y,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.GAMBoost, 
                    fit.final = fit.GAMBoost, 
                    args.screen.main = list(family = binomial()),
                    parallel = FALSE)
                    
               

## End(Not run)