screen.inter: Adaptive function for screening interactions

Description Usage Arguments Details Value Author(s) References See Also

Description

fit.logicReg and fit.rf are functions for screening interactions in high-dimensional datasets for the usage in the argument screen.inter in the function sprinter. They return a variable importance measurement for each variable.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
fit.rf(nr, data, indices, seed.interselect, ...)

fit.rf.select(nr, data, indices, seed.interselect, n.select, ...)

fit.logicReg(nr, data, indices, seed.interselect,
       type,
       nleaves,
       ntrees, ...)
fit.logicReg.select(nr, data, indices, seed.interselect,
       type,
       nleaves,
       ntrees, 
       n.select,...)

Arguments

nr

number of resample run.

data

data frame containing the y-outcome and x-variables in the model, which is orthogonalized to the clinical covariates and the main effects identified in the main effects detection step.

indices

indices to build the resample dataset.

seed.interselect

seed for random number generator.

n.select

Number of variables selected for performing random forest.

type

type of model to be fit. For survival data you can choose between (4) proportional hazards model (Cox regression), and (5) exponential survival model, or (0) your own scoring function.

nleaves

maximum number of leaves to be fit in all trees combined.

ntrees

number of logic trees to be fit.

...

further arguments passed to methods.

Details

The functions logicReg and fit.rf are adapted for the usage in the function sprinter in order to screen interactions. Therein, variable importance measurements are evaluated for each variable, which will be used for pre-selecting relevant interactions in the function sprinter. In the function sprinter the identified interaction candidates will be combined with each other pairwise and will be provided as possible predictors for the final model.

fit.rf

This function performs a random forest for survival. It judges each variable by the permutation accuracy importance. For more information about performing the random forest see rfsrc.

fit.rf.select

This function performs a random forest for survival on a restricted data set. The number of covariables in this restricted data set can be set in n.select. The variables with the n.select smallest univariate p-values evaluated by Cox regression are selected.

fit.logicReg

For the usage of the logic regression all continuous variables are converted to binary variables at the median. Then the logic regression is fitted onto the binary data set. The variable importance measure is one, if the variable is included in the model and zero if not. In order to get the information about the variables in a multiple model, the set select = 2 is obligatory.

fit.logicReg.select

This function performs logic regression on a restricted data set. The number of covariables in this restricted data set can be set in n.select. The variables with the n.select smallest univariate p-values evaluated by Cox regression are selected.

Implementing new functions for the argument screen.inter

New functions for screening interactions can be constructed in a way that for each variable an importance measurement is returned as a vector of length p. The variable importance measurements larger than zero should be interpreted as relevant for the model.
The following arguments must be enclosed in this function:

nr value displaying the actual resampling run.
data data frame containing the y-outcome and x-variables in the model.
indices indices to build the resample dataset.
seed.interselect seed for random number generator.

With this directive other functions can be implemented and used for screening potential interaction candidates.

Value

fit.rf and fit.logicReg return a vector of length p, containing the variable importance of each variable in the data set.

fit.rf evaluates the permutation accuracy importance (PAM) as a measure for the variable importance. The function fit.logicReg returns the information whether a variable is enclosed in the model (1) or not (0).

Author(s)

Written by Isabell Hoffmann isabell.hoffmann@uni-mainz.de.

References

Ruczinski I, Kooperberg C, LeBlanc ML (2003). Logic Regression, Journal of Computational and Graphical Statistics, 12, 475-511.

Breiman L. (2001). Random forests, Machine Learning, 45:5-32.

See Also

logreg, rfsrc


sprinter documentation built on May 1, 2019, 8:20 p.m.