tdmModSortedRFimport: Sort the input variables decreasingly by their RF-importance.

Description Usage Arguments Value Author(s)

View source: R/tdmModelingUtils.r

Description

Build a Random Forest using importance=TRUE. Usually the RF is smaller (50 trees), to speed up computation. Use na.roughfix for missing value replacement. Decide which input variables to keep and return them in SRF$input.variables

Usage

1
tdmModSortedRFimport(d_train, response.variable, input.variables, opts)

Arguments

d_train

training set

response.variable

the target column from d_train to use for the RF-model

input.variables

the input columns from d_train to use for the RF-model

opts

options, here we use the elements [defaults in brackets]:

  • SRF.kind:
    ="xperc": keep a certain importance percentage, starting from the most important variable
    ="ndrop": drop a certain number of least important variables
    ="nkeep": keep a certain number of most important variables
    ="none": do not call tdmModSortedRFimport at all (see tdmRegress.r and tdmClassify.r)

  • SRF.ndrop: [0] how many variables to drop (if SRF.kind=="ndrop")

  • SRF.XPerc: [0.95] if >=0, keep that importance percentage, starting with the most important variables (if SRF.kind=="xperc")

  • SRF.calc: [TRUE] =TRUE: calculate importance & save on SRF.file, =F: load from SRF.file (SRF.file = Output/<filename>.SRF.<response.variable>.Rdata)

  • SRF.ntree: [50] number of RF trees

  • SRF.verbose: [2]

  • SRF.maxS: [40] how many variables to show in plot

  • SRF.minlsi: [1] a lower bound for the length of SRF$input.variables

  • RF.sampsize: sampsize for RF, set prior to calling this func via tdmModAdjustSampsize(opts$SRF.samp,...)

  • GD.DEVICE: if !="non", then make a bar plot on current graphic device

  • CLS.CLASSWT: class weight vector to use in random forest training

Value

SRF, a list with the following elements:

input.variables

the vector of input variables which remain after importance processing. These are sorted by decreasing importance.

s_input

all input.variables sorted by decreasing (**NEW**) importance

s_imp1

the importance for s_input

s_dropped

vector with name of dropped variables

lsd

length of s_dropped

perc

the percentage of total importance which is in the dropped variables

opts

some defaults might have been added

Author(s)

Wolfgang Konen, Patrick Koch wolfgang.konen@th-koeln.de


TDMR documentation built on March 3, 2020, 1:06 a.m.