var.sel: Variable Selection for High order Interaction Detection

Description Usage Arguments Value Author(s) References Examples

View source: R/HIH.R

Description

For high order interaction detection in high dimentional data, firstly, random forests using subset of features are grown (subRFs, see subRF function) with output of Pairwise Minimal Depth matrix (subPMD); secondly, features with smaller values (according to the rule in subQtl) in their rows of subPMDs are selected. Iteratively growing weighted Random Forests using Pairwise Minimal Depth weights is optional.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
var.sel(formula, data,
subvars = function(data){ceiling((ncol(data)-1)/5)},
w.initial = "vimp",
wt = function(btpmd,digpmd){log(1/btpmd/digpmd)},
itrSub = 5, wtSub = T,
itrWt = 1,
subQtl = function(btpmd,digpmd){
     which(btpmd < quantile(btpmd, probs = 0.1))},
verbose = TRUE,
obj = NULL)

Arguments

formula

A symbolic description of the model to be fit.

data

Data frame containing the y-outcome and x-variables.

subvars

Number of variables selected for fitting each random forest.

w.initial

A probability vector, according to which features are selected (see w0 in subRF function) for each subRF. When equals to “vimp", then variable importance is used. When equals to “md", then minimal depth from maximal subtree is used.

wt

A function calculating variable weights using PMD matrix where btpmd[i] is the average of ith row in the PMD matrix where only off diagonal elements that have smaller values are used and digpmd[i] is the ith diagonal element in the PMD matrix.

itrSub

Number of subRFs to grow.

wtSub

logical. Should weighted random forests grown?

itrWt

Number of iterations of each weighted subRF.

subQtl

Which features will be selected from each subRF. “digpmd” is the diagonal elements of the PMD matrix and “btpmd” is the average of ith row in the PMD matrix used where only off diagonal elements that have smaller values are used.

verbose

Set to TRUE for verbose output.

obj

Inital object of class (rfsrc, grow). New object will be created using the data and formula if it is set to null.

Value

var.sl

Names of variables selected.

var.sl.list

A list of each subRF's output: the jth object is the output of the jth subRF with two elements: sl.w contains the weights (calculated using input wt) of selected variable and sl.var contains the names of selected variable. Final result (var.sl) is the union of each set sl.var.

itrwt

A list of each weighted subRF's weights: the jth object is the weight path of the jth subRF with itrWt elements. See the output of wt.itr function.

Author(s)

Yifan Sha and Min Lu

References

Ishwaran H. (2007). Variable importance in binary regression trees and forests, Electronic J. Statist., 1:519-537.

Ishwaran H., Kogalur U.B., Gorodeski E.Z, Minn A.J. and Lauer M.S. (2010). High-dimensional variable selection for survival data. J. Amer. Statist. Assoc., 105:205-217.

Ishwaran H., Kogalur U.B., Chen X. and Minn A.J. (2011). Random survival forests for high-dimensional data. Statist. Anal. Data Mining, 4:115-132.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data(express)
o <- var.sel(y~., data = express[,1:200],
             subvars = function(data){ceiling((ncol(data)-1)/2)},
             w.initial = "vimp",
             wt = function(btpmd,digpmd){log(1/btpmd/digpmd)},
             itrSub = 3, wtSub = TRUE,
             itrWt = 2,
             subQtl = function(btpmd,digpmd){
                    which(btpmd < quantile(btpmd, probs = 0.1))},
             verbose = TRUE)
o$var.sl
o$itrwt[[3]][[2]] # variable weights in the second iteration of the 5th subRF

yifansha/highinthunt documentation built on July 2, 2020, 6:29 p.m.