iRF: iteratively grows weighted random forests, finds stable...

Description Usage Arguments Value Author(s) See Also

View source: R/iRF.R

Description

Using repeated calls to iRF::randomForest, this function iteratively grows weighted ensembles of decision trees. Optionally, return stable feature interactions for select iterations by analyzing feature usage on decision paths of large leaf nodes. For details on the iRF algorithm, see https://arxiv.org/abs/1706.08457.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
  iRF(x, y, xtest=NULL, ytest=NULL, 
      n.iter=5, 
      ntree=500, 
      n.core=1,
      mtry.select.prob = rep(1/ncol(x), ncol(x)),
      keep.impvar.quantile=NULL, 
      interactions.return=NULL,
      wt.pred.accuracy=FALSE, 
      cutoff.unimp.feature = 0,
      rit.param=list(depth=5, ntree=100, nchild=2, 
                     class.id=1, class.cut=NULL), 
      varnames.grp=NULL,
      n.bootstrap=30,
      bootstrap.forest=TRUE, 
      verbose=TRUE, 
      ...
     )

Arguments

x, xtest

numeric matrices of predictors

y, ytest

response vectors

n.iter

number of weighted random forest iterations

ntree

number of trees to grow in each iteration

n.core

number of cores across which tree growing and reading should be distributed

mtry.select.prob

initial weights specified for first random forest fit, defaults to equal weights

keep.impvar.quantile

a nonnegative fraction q. If provided, all the variables with Gini importance in the top 100*q percentile are retained during random splitting variable selection in the next iteration

interactions.return

a numeric vector specifying which iterations to evaluate interactions for. Note: interaction computation is computationally intensive particularly when n.bootstrap is large.

wt.pred.accuracy

Should leaf nodes be sampled proportional to both size and accuracy (TRUE) or just size (FALSE)?

cutoff.unimp.feature

a non-negative fraction r. If provided, only features with Gini importance score in the top 100*(1-r) percentile are used to find feature interactions

rit.param

a named list, containing entries to specify depth: depth of random intersection trees ntree number of random intersection trees nchild: number of children in each split of a random intersection tree class.id: which class of observations will be used to find class-specific interaction? Choose between 0 or 1. Default is set to 1. Ignored if regression forest. class.cut: threshold to determine leaf nodes that are used to find interactions. Any leaf nodes with prediction greater than specified threshold will be used as input to RIT. If NULL, all leaf nodes from regression iRF will be used as input to RIT. Ignored if classification forest

varnames.grp

If features can be grouped based on their demographics or correlation patterns, use the group of features or “hyper-feature”s to conduct random intersection trees

n.bootstrap

Number of bootstraps replicates used to calculate stability scores of interactiosn obtained by RIT

bootstrap.forest

Should a new Random Forest be constructed for each bootstrap sample to evaluate stability? Setting to FALSE results in faster runtime.

verbose

Display progress messages and intermediate outputs on screen?

...

additional arguments passed to iRF::randomForest

Value

A list containing the following items:

rf.list

A list of n.iter objects of the class randomForest

interaction

A list of length n.iter. Each element of the list contains a named numeric vector of stability scores, where the names are candidate interactions (feature names separated by "_"), defined as frequently appearing features and feature combinations on the decision paths of large leaf nodes

Author(s)

Sumanta Basu sumbose@berkeley.edu, Karl Kumbier kkumbier@berkeley.edu

See Also

randomForest, readForest


iRF documentation built on May 2, 2019, 11:02 a.m.

Related to iRF in iRF...