tdmPreGroupLevels: Group the levels of factor variable in 'dset[,colname]'.

Description Usage Arguments Value

View source: R/tdmPreprocUtils.r

Description

This function reduces the number of levels for factor variables with too many levels. It counts the cases in each level and orders them decreasingly. It binds the least frequent levels together in a new level "OTHER" such that the remaining untouched levels have more than opts$PRE.Xpgroup percent of all cases. OR it binds the levels with least cases together in "OTHER" such that the total number of new levels is opts$PRE.MaxLevel. From these two choices for "OTHER" take the one which binds more variables in column "OTHER".

Usage

1
tdmPreGroupLevels(dset, colname, opts)

Arguments

dset

data frame

colname

name of column to be re-grouped

opts

list, here we need

  • PRE.Xpgroup [0.99]

  • PRE.MaxLevel [32] (32 is the maximum number of levels allowed for randomForest)

Value

dset, a data frame with dset[,colname] re-grouped


TDMR documentation built on March 3, 2020, 1:06 a.m.