sprinter: Main function for building prognostic models by considering...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

The function sprinter builds a prognostic model by preselecting interactions and main effects before fitting a regression model.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
sprinter(x, 
         time, 
         status= rep(0, nrow(x)),
         mandatory= NULL,
         repetitions = 25, 
         n.inter.candidates =1000, 
         screen.main, 
         screen.inter = fit.rf,
         fit.final = screen.main, 
         args.screen.main = list(), 
         args.screen.inter = list(),
         args.fit.final = args.screen.main, 
         orthogonalize = TRUE, cutoff = 0,
         parallel = FALSE, mc.cores = detectCores(),
         ...)

Arguments

x

n * p matrix of covariates.

time

vector of length n specifying the observed times.

status

censoring indicator, i.e., vector of length n with entries 0 for censored observations and 1 for uncensored observations. This optional argument is neccessary in time-to-event data.

mandatory

vector with variable names of mandatory covariates, where parameter estimation should be performed unpenalized.

repetitions

number of repetitions of the interaction screening approach. Repetitions are performed by creating subsamples and applying the interaction screening approach on each subsample dataset separately.

n.inter.candidates

minimal number of potential interaction candidates, which are considered in the final model building step.

screen.main

function for screening potential main effects. fit.uniCox performs univariate Cox-regressions and selects the main effects by their adjusted p-values.
fit.CoxBoost performs a variable selection by fitting a Cox model by likelihood based boosting. Other methods are possible to be implemented as adaptive functions for the usage in sprinter. For more details see fit.uniCox

screen.inter

function for detecting potential interaction candidates. fit.rf performs a random forest and fit.logicReg performs a logic regression. Other methods are possible to implement as adaptive functions for the use in sprinter. For more details see fit.rf.

fit.final

function for building the final Cox proportional hazards model. Default is the function set in screen.main.

args.screen.main

list of arguments which should be used in the main effects detection step.

args.screen.inter

list of arguments which should be used in the interaction screening step.

args.fit.final

list of arguments which should be used in the final model building step.

orthogonalize

logical value. If true all variables are made orthogonal to those that are assessed as main effects by screen.main.

cutoff

value or function to evaluate a value according to the variable importance. The cutoff is used to select variables for evaluating the pairwise inclusion frequencies.

parallel

logical value indicating whether the interaction screening step should be performed parallel.

mc.cores

the numbers of cores to use, if parallel = TRUE.

...

additional arguments.

Details

A call to the sprinter-function fits a prognostic model to time-to-event data by combining available statistical components. The modular structure secures the simultaneously consideration of two important elements in a model: interactions and main effects.
Interactions play an important role in molecular applications, because some effects arises only if two genes are differentially expressed at the same time. Therefore, it is important to consider interaction terms when predicting a clinical outcome. The method which is used to preselect interactions is set in screen.inter.
The interactions are preselected with respect to potential main effects. Therefore, a main effects model is performed for extracting the existing main effects and to be able to compare the main effects model with the final model to show the benefit of considering interactions. The method which is used to perform a main effects model is set in screen.main.
The final model is performed by the main effects resulting from screen.main and the interactions resulting from screen.inter as covariates. The method which is used for building the final model is set in fit.final. As default the same method is used as in screen.main. In the following the three components of the framework are explained more in detail: (1) Fitting a main effects model, (2) adjusting the data and pre-selecting interaction terms and (3) building the comprehensive model inclusind promising interactions.

For screening main effects, sprinter allows any approach which can handle with high-dimensional datasets with time-to-event settings. The following two established approaches have already been prepared for usage: (A) Univariate-Cox regression with adjusted p-values (fit.uniCox) and (B) CoxBoost (fit.CoxBoost). Other approaches can also be implemented (see fit.CoxBoost).

For screening the interaction effects, sprinter offers the random forest method fit.rf and logic regression fit.logicReg for pre-selecting interactions. For each variable a variable importance measurement is calculated that considers the underlying interaction structure and reflects the meaning of a variable for the forest or the logic regression, respectively. The variable importance is used to construct the relevant interactions for the model. Before pre-selecting the interactions, the data are modified so that weaker interaction effects that are originally overlaid by stronger main effects can be detected. To achieve this, the data are orthogonalized by computing residuals corresponding to the selected main effects and the mandatory covariates.
For better stabilization subsamples are created and the interaction detection approach is performed on each subsampled dataset. As this step can be computationally expensive it is possible to parallelize this step, by parallel = TRUE. To summarize the results of all subsamples, pairwise variable inclusion frequencies of the constructed interactions terms are computed and the n.inter.candidates most frequent pairs are selected as relevant interaction terms. Other approaches can also be implemented (see fit.rf).

For building the final model, the user can set the desired method in fit.final. If no method is required the same method is used as for building the main effects model. In contrast to building the main effects model, the final model is constructed by the variables selected in the main effects model together with the n.inter.candidates pre-selected interactions of the screening step.

Value

An object of class (sprinter) with the following components:

n.inter.candidates

Number of potential interaction candidates considered in the final model building.

inter.candidates

Vector of length n.inter.candidates with the potential interaction candidates considered in the final model building.

main.model

Main effects model. The class depends on the function used in screen.main.

final.model

Final model. The class depends on the function used in fit.final.

xnames

vector of the variable names used in the final model.

Author(s)

Written by Isabell Hoffmann isabell.hoffmann@uni-mainz.de.

References

Sariyar, M., Hoffmann, I. Binder, J. (2014). Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data. BMC Bioinformatics 15:58.

See Also

rfsrc, coxph, CoxBoost

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
## Not run: 
##---------------------------
## Survival analysis
##---------------------------

#############################
# Fit a Cox proportional hazards model by CoxBoost by considering 
# interactions after screening interactions by random forest
# system.time:
#   user  system elapsed 
# 370.97    2.32  374.31
# For a faster run set repetitions down!
#############################

# Create survival data with interactions:
simulation <- simul.int(287578,n = 200, p = 500,
                          beta.int = 1.0,
                          beta.main = 0.9, 
                          censparam = 1/20, 
                          lambda = 1/20)
data <- simulation$data

# Showing True Effects:
simulation$info

# Perform the sprinter approach:
set.seed(123)
testcb <- sprinter( x=data[,1:500],  
                    time = data$obs.time,
                    status= data$obs.status,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.CoxBoost, 
                    fit.final = fit.CoxBoost, 
                    args.screen.main = list(seed=123,stepno = 10, K = 10, 
                                            criterion ='pscore', nu = 0.05),
                    parallel = FALSE)
summary(testcb)



##########
# Fit a Cox proportional hazards model by considering 
# interactions after screening interactions by random forest
# and selecting relevant effects by univariate Cox regression: 
# system.time:
#   user  system elapsed 
# 374.50    1.53  376.68 
# For a faster run set repetitions down!
##########

# Create survival data with interactions:
data <- simul.int(287578,n = 200, p = 500,
                          beta.int = 1.0,
                          beta.main = 0.9, 
                          censparam = 1/20, 
                          lambda = 1/20)[[1]]


# Perform the sprinter approach:
set.seed(123)
testunicox <- sprinter( x=data[,1:500],  
                    time = data$obs.time,
                    status= data$obs.status,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.uniCox, 
                    fit.final = fit.uniCox, 
                    parallel = FALSE)


summary(testunicox)


# true coefficients:
# ID1   ID2   ID5:ID6   ID7:ID8
# 0.9  -0.9      1         -1


##---------------------------
## Continuous outcome
##---------------------------

# selection of main effects by univariate generalized 
# linear models and pre-selections of interactions 
# by random forest:
sprinter.glm.rf.con <- sprinter( x=data[,1:500],  
                    time = data$obs.time,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.uniGlm, 
                    fit.final = fit.uniGlm, 
                    parallel = FALSE)

# selection of main effects by univariate generalized 
# linear models and pre-selections of interactions 
# by logic regression:
sprinter.glm.logicR.con <- sprinter( x=data[,1:500],  
                    time = data$obs.time,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.uniGlm,
                    screen.inter = fit.logicReg,
                    fit.final = fit.uniGlm, 
                    args.screen.inter = list(type = 2),
                    parallel = FALSE)

# selection of main effects by GAMBoost 
#  and pre-selections of interactions 
# by random forest:
sprinter.gamboost.rf.con <- sprinter( x=data[,1:500],  
                    time = data$obs.time,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.GAMBoost, 
                    args.screen.main = list(stepno = 10),
                    fit.final = fit.GAMBoost, 
                    parallel = FALSE)
                    

##---------------------------
## Binary outcome 
##---------------------------
x <- matrix(runif(200*500,min=-1,max=1),200,500)  
colnames(x) <- paste('ID', 1:500, sep = '')
eta <- -0.5 + 2*x[,1] - 2*x[,3] + 2 * x[,3]*x[,4]
y <- rbinom(200,1,binomial()$linkinv(eta))


# selection of main effects by univariate generalized 
# linear models and pre-selections of interactions 
# by random forest:
sprinter.glm.rf.bin <- sprinter( x=x[,1:500],  
                    time = y,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.uniGlm, 
                    fit.final = fit.uniGlm, 
                    args.screen.main = list(family = binomial()),
                    parallel = FALSE)
                    
# selection of main effects by univariate generalized 
# linear models and pre-selections of interactions 
# by logic regression:
sprinter.glm.logicR.bin <- sprinter( x=x[,1:500],  
                    time = y,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.uniGlm,
                    screen.inter = fit.logicReg,
                    fit.final = fit.uniGlm, 
                    args.screen.inter = list(type = 3),
                    parallel = FALSE)
     

# selection of main effects by GAMBoost and pre-selection of 
# interactions by random forest:

sprinter.GAMBoost.rf.bin <- sprinter( x=x,  
                    time = y,
                    repetitions = 10,
                    mandatory = c("ID1","ID2"),
                    n.inter.candidates = 1000, 
                    screen.main = fit.GAMBoost, 
                    fit.final = fit.GAMBoost, 
                    args.screen.main = list(family = binomial()),
                    parallel = FALSE)
                    
               

## End(Not run)

sprinter documentation built on May 1, 2019, 8:20 p.m.