Description Usage Arguments Details Value Author(s) References See Also Examples
The function sprinter
builds a prognostic model by preselecting interactions and main effects before fitting a regression model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | sprinter(x,
time,
status= rep(0, nrow(x)),
mandatory= NULL,
repetitions = 25,
n.inter.candidates =1000,
screen.main,
screen.inter = fit.rf,
fit.final = screen.main,
args.screen.main = list(),
args.screen.inter = list(),
args.fit.final = args.screen.main,
orthogonalize = TRUE, cutoff = 0,
parallel = FALSE, mc.cores = detectCores(),
...)
|
x |
n * p matrix of covariates. |
time |
vector of length |
status |
censoring indicator, i.e., vector of length |
mandatory |
vector with variable names of mandatory covariates, where parameter estimation should be performed unpenalized. |
repetitions |
number of repetitions of the interaction screening approach. Repetitions are performed by creating subsamples and applying the interaction screening approach on each subsample dataset separately. |
n.inter.candidates |
minimal number of potential interaction candidates, which are considered in the final model building step. |
screen.main |
function for screening potential main effects. |
screen.inter |
function for detecting potential interaction candidates. |
fit.final |
function for building the final Cox proportional hazards model. Default is the function set in screen.main. |
args.screen.main |
list of arguments which should be used in the main effects detection step. |
args.screen.inter |
list of arguments which should be used in the interaction screening step. |
args.fit.final |
list of arguments which should be used in the final model building step. |
orthogonalize |
logical value. If true all variables are made orthogonal to those that are assessed as main effects by |
cutoff |
value or function to evaluate a value according to the variable importance. The cutoff is used to select variables for evaluating the pairwise inclusion frequencies. |
parallel |
logical value indicating whether the interaction screening step should be performed parallel. |
mc.cores |
the numbers of cores to use, if parallel = TRUE. |
... |
additional arguments. |
A call to the sprinter
-function fits a prognostic model to time-to-event data by combining available statistical components.
The modular structure secures the simultaneously consideration of two important elements in a model: interactions and main effects.
Interactions play an important role in molecular applications,
because some effects arises only if two genes are differentially expressed at the same time.
Therefore, it is important to consider interaction terms when predicting a clinical outcome.
The method which is used to preselect interactions is set in screen.inter
.
The interactions are preselected with respect to potential main effects.
Therefore, a main effects model is performed for extracting the existing main effects and to be able to compare the main effects model with the final model to show the benefit of considering interactions.
The method which is used to perform a main effects model is set in screen.main
.
The final model is performed by the main effects resulting from screen.main
and the interactions resulting from screen.inter
as covariates. The method which is used for building the final model is set in fit.final
. As default the same method is used as in screen.main
.
In the following the three components of the framework are explained more in detail:
(1) Fitting a main effects model,
(2) adjusting the data and pre-selecting interaction terms and
(3) building the comprehensive model inclusind promising interactions.
For screening main effects, sprinter allows any approach which can handle with high-dimensional datasets with time-to-event settings.
The following two established approaches have already been prepared for usage:
(A) Univariate-Cox regression with adjusted p-values (fit.uniCox
) and (B) CoxBoost (fit.CoxBoost
). Other approaches can also be implemented (see fit.CoxBoost
).
For screening the interaction effects, sprinter
offers the random forest method fit.rf
and logic regression fit.logicReg
for pre-selecting interactions.
For each variable a variable importance measurement is calculated that considers the underlying interaction structure and reflects the meaning of a variable for the forest or the logic regression, respectively.
The variable importance is used to construct the relevant interactions for the model.
Before pre-selecting the interactions, the data are modified so that weaker interaction effects that are originally overlaid by stronger main effects can be detected.
To achieve this, the data are orthogonalized by computing residuals corresponding to the selected main effects and the mandatory covariates.
For better stabilization subsamples are created and the interaction detection approach is performed on each subsampled dataset.
As this step can be computationally expensive it is possible to parallelize this step, by parallel = TRUE
.
To summarize the results of all subsamples, pairwise variable inclusion frequencies of the constructed interactions terms are computed and the n.inter.candidates
most frequent pairs are selected as relevant interaction terms.
Other approaches can also be implemented (see fit.rf
).
For building the final model, the user can set the desired method in fit.final
. If no method is required the same method is used as for building the main effects model.
In contrast to building the main effects model, the final model is constructed by the variables selected in the main effects model together with the n.inter.candidates
pre-selected interactions of the screening step.
An object of class (sprinter) with the following components:
n.inter.candidates |
Number of potential interaction candidates considered in the final model building. |
inter.candidates |
Vector of length |
main.model |
Main effects model. The class depends on the function used in |
final.model |
Final model. The class depends on the function used in |
xnames |
vector of the variable names used in the final model. |
Written by Isabell Hoffmann isabell.hoffmann@uni-mainz.de.
Sariyar, M., Hoffmann, I. Binder, J. (2014). Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data. BMC Bioinformatics 15:58.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | ## Not run:
##---------------------------
## Survival analysis
##---------------------------
#############################
# Fit a Cox proportional hazards model by CoxBoost by considering
# interactions after screening interactions by random forest
# system.time:
# user system elapsed
# 370.97 2.32 374.31
# For a faster run set repetitions down!
#############################
# Create survival data with interactions:
simulation <- simul.int(287578,n = 200, p = 500,
beta.int = 1.0,
beta.main = 0.9,
censparam = 1/20,
lambda = 1/20)
data <- simulation$data
# Showing True Effects:
simulation$info
# Perform the sprinter approach:
set.seed(123)
testcb <- sprinter( x=data[,1:500],
time = data$obs.time,
status= data$obs.status,
repetitions = 10,
mandatory = c("ID1","ID2"),
n.inter.candidates = 1000,
screen.main = fit.CoxBoost,
fit.final = fit.CoxBoost,
args.screen.main = list(seed=123,stepno = 10, K = 10,
criterion ='pscore', nu = 0.05),
parallel = FALSE)
summary(testcb)
##########
# Fit a Cox proportional hazards model by considering
# interactions after screening interactions by random forest
# and selecting relevant effects by univariate Cox regression:
# system.time:
# user system elapsed
# 374.50 1.53 376.68
# For a faster run set repetitions down!
##########
# Create survival data with interactions:
data <- simul.int(287578,n = 200, p = 500,
beta.int = 1.0,
beta.main = 0.9,
censparam = 1/20,
lambda = 1/20)[[1]]
# Perform the sprinter approach:
set.seed(123)
testunicox <- sprinter( x=data[,1:500],
time = data$obs.time,
status= data$obs.status,
repetitions = 10,
mandatory = c("ID1","ID2"),
n.inter.candidates = 1000,
screen.main = fit.uniCox,
fit.final = fit.uniCox,
parallel = FALSE)
summary(testunicox)
# true coefficients:
# ID1 ID2 ID5:ID6 ID7:ID8
# 0.9 -0.9 1 -1
##---------------------------
## Continuous outcome
##---------------------------
# selection of main effects by univariate generalized
# linear models and pre-selections of interactions
# by random forest:
sprinter.glm.rf.con <- sprinter( x=data[,1:500],
time = data$obs.time,
repetitions = 10,
mandatory = c("ID1","ID2"),
n.inter.candidates = 1000,
screen.main = fit.uniGlm,
fit.final = fit.uniGlm,
parallel = FALSE)
# selection of main effects by univariate generalized
# linear models and pre-selections of interactions
# by logic regression:
sprinter.glm.logicR.con <- sprinter( x=data[,1:500],
time = data$obs.time,
repetitions = 10,
mandatory = c("ID1","ID2"),
n.inter.candidates = 1000,
screen.main = fit.uniGlm,
screen.inter = fit.logicReg,
fit.final = fit.uniGlm,
args.screen.inter = list(type = 2),
parallel = FALSE)
# selection of main effects by GAMBoost
# and pre-selections of interactions
# by random forest:
sprinter.gamboost.rf.con <- sprinter( x=data[,1:500],
time = data$obs.time,
repetitions = 10,
mandatory = c("ID1","ID2"),
n.inter.candidates = 1000,
screen.main = fit.GAMBoost,
args.screen.main = list(stepno = 10),
fit.final = fit.GAMBoost,
parallel = FALSE)
##---------------------------
## Binary outcome
##---------------------------
x <- matrix(runif(200*500,min=-1,max=1),200,500)
colnames(x) <- paste('ID', 1:500, sep = '')
eta <- -0.5 + 2*x[,1] - 2*x[,3] + 2 * x[,3]*x[,4]
y <- rbinom(200,1,binomial()$linkinv(eta))
# selection of main effects by univariate generalized
# linear models and pre-selections of interactions
# by random forest:
sprinter.glm.rf.bin <- sprinter( x=x[,1:500],
time = y,
repetitions = 10,
mandatory = c("ID1","ID2"),
n.inter.candidates = 1000,
screen.main = fit.uniGlm,
fit.final = fit.uniGlm,
args.screen.main = list(family = binomial()),
parallel = FALSE)
# selection of main effects by univariate generalized
# linear models and pre-selections of interactions
# by logic regression:
sprinter.glm.logicR.bin <- sprinter( x=x[,1:500],
time = y,
repetitions = 10,
mandatory = c("ID1","ID2"),
n.inter.candidates = 1000,
screen.main = fit.uniGlm,
screen.inter = fit.logicReg,
fit.final = fit.uniGlm,
args.screen.inter = list(type = 3),
parallel = FALSE)
# selection of main effects by GAMBoost and pre-selection of
# interactions by random forest:
sprinter.GAMBoost.rf.bin <- sprinter( x=x,
time = y,
repetitions = 10,
mandatory = c("ID1","ID2"),
n.inter.candidates = 1000,
screen.main = fit.GAMBoost,
fit.final = fit.GAMBoost,
args.screen.main = list(family = binomial()),
parallel = FALSE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.