feature.boruta.comp: Boruta Feature Selection

Description Usage Arguments Details Value References See Also Examples

View source: R/feature.boruta.R

Description

Wrapper around the Boruta package. Boruta is a so called all relevant feature seletion wrapper, capable of working with each classifier outputting variable importance measure (VIM). This function provides a wrapper ensuring correct provision of input data and the potential to execute convenience functions that e.g. provide regression formula output.

Usage

1
2
3
4
feature.boruta.comp(target, predictors, fixNA = F, roughFix = F,
  variables = F, selected = F, formula = F, tentative = F,
  pValue = 0.01, mcAdj = T, maxRuns = 100, doTrace = 0,
  holdHistory = T, getImp = Boruta::getImpRfZ, verbose = F, ...)

Arguments

target

Response vector; factor for classification, numeric vector for regression.

predictors

data.frame with predictors.

fixNA

boolean switch that decides how NA values in the predictors and target variables will be handled. FALSE would cause the NAs to be ignored. TRUE will eliminate all observations including a NA value. Default is FALSE.

roughFix

boolean switch that decides whether the Boruta::TentativeRoughFix method will be used to resolve potentially remaining undecided variables. Default is FALSE.

variables

boolean switch that decides whether the variables of all three categories (Confirmed, Tentative, Rejected) will be appended to the returned Boruta object. Default is FALSE.

selected

boolean switch that decides whether the confirmed and tentative varialbes will be added as a combined vector to the Boruta object. Default is FALSE. Only works when variables is TRUE.

formula

boolean switch that decides whether a formula will be appended to the returned Boruta object. This formula will relate the target with all confirmed predictors. Depending on the tentative switch the tentative variables might be added as well. Defaults to FALSE.

tentative

boolean switch that decides whether tentative attributes will be considered for a formula. Default is FALSE.

pValue

Confidence level. Default value should be used. Default is 0.01.

mcAdj

If set to TRUE, a multiple comparisons adjustment using the Bonferroni method will be applied. Default value should be used; older (1.x and 2.x) versions of Boruta were effectively using FALSE. Default value is TRUE.

maxRuns

Maximal number of importance source runs. You may increase it to resolve attributes left tentative. Default is 100.

doTrace

Verbosity level. 0 means no tracing, 1 means reporting decision about each attribute as soon as it is justified, 2 means same as 1, plus reporting each importance source run. Default is 0.

holdHistory

If set to TRUE, the full history of importance is stored and returned as the ImpHistory element of the result. Can be used to decrease a memory footprint of Boruta in case this side data is not used, especially when the number of attributes is huge; yet it disables plotting of such made Boruta objects and the use of the Boruta::TentativeRoughFix function. Default is FALSE.

getImp

Function used to obtain attribute importance. The default is getImpRfZ, which runs random forest from the ranger package and gathers Z-scores of mean decrease accuracy measure. It should return a numeric vector of a size identical to the number of columns of its first argument, containing importance measure of respective attributes. Any order-preserving transformation of this measure will yield the same result. It is assumed that more important attributes get higher importance. +-Inf are accepted, NaNs and NAs are treated as 0s, with a warning. Default is Boruta::getImpRfZ.

verbose

boolean switch that decides whether the error output will provide more verbose information. Default is FALSE.

Details

The method first saves the name of the original target parameter so it is potentially reusable for formula creation later on. In case the fixNA switch is TRUE, all observations containing NA values will be eliminated. If this should affect all observations an error will be produced. Before executing the Boruta algorithm, the important input parameters target and predictors will be checked via the feature.boruta.fixNA method. Should any issues with the input be found (wrong data types, differing lengths, NAs) an appropriate error will be thrown. Next the actual Boruta::Boruta algorithm is executed with the provided parameters. Bortua than iteratively compares the importance of shadow attributes with the original attributes. Those with a significantly worse performance than shadow attributes will be rejected; those performing significantly better will be confirmed. Since the Boruta algorithm might not converge in the given maxRuns iterations, the Boruta::TentativeRoughFix can be used to resolve still missing values (given roughFix is TRUE). Finally, depending on the values of the variables and formula switches, a formula will be created and/or the confirmed/rejected/tentative attributes are appended to the returned Boruta object.

Value

Boruta object as it is also returned by the underlying Boruta::Boruta method. This default return value can include severeal extensions, depending on parameters like formula:

target

The name of the target vector.

variables

Variable names of all three categories (Confirmed, Tentative, Rejected)

selected

Variables names of confirmed and tentative variables in one vector.

formula

Formula of the form target ~ predictors.(confirmed/tentative)

References

Miron B. Kursa, Witold R. Rudnicki (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), p. 1-13. URL: http://www.jstatsoft.org/v36/i11/

See Also

Boruta::Boruta

feature.boruta.fixNA

feature.boruta.tentative

feature.boruta.variables

feature.boruta.selected

feature.boruta.formula

feature.boruta.checkInputParams

Examples

1
2
3
4
 KaggleHouse:::feature.boruta(
   target = data_train_na$SalePrice, predictors = data_train_na[-81],
    fixNA = T, roughFix = T, verbose = T
 )

MarcoNiemann/kaggle_house documentation built on May 7, 2019, 2:50 p.m.