feature.boruta.comp: Boruta Feature Selection
In MarcoNiemann/kaggle_house: Analysing House Prices in Ames, Iowa and Build a Sales Price Prediction Model

Description Usage Arguments Details Value References See Also Examples

Wrapper around the Boruta package. Boruta is a so called all relevant feature seletion wrapper, capable of working with each classifier outputting variable importance measure (VIM). This function provides a wrapper ensuring correct provision of input data and the potential to execute convenience functions that e.g. provide regression formula output.

feature.boruta.comp(target, predictors, fixNA = F, roughFix = F,
  variables = F, selected = F, formula = F, tentative = F,
  pValue = 0.01, mcAdj = T, maxRuns = 100, doTrace = 0,
  holdHistory = T, getImp = Boruta::getImpRfZ, verbose = F, ...)

`target`	Response vector; factor for classification, numeric vector for regression.
`predictors`	`data.frame` with predictors.
`fixNA`	`boolean` switch that decides how `NA` values in the `predictors` and `target` variables will be handled. `FALSE` would cause the `NA`s to be ignored. `TRUE` will eliminate all observations including a `NA` value. Default is `FALSE`.
`roughFix`	`boolean` switch that decides whether the `Boruta::TentativeRoughFix` method will be used to resolve potentially remaining undecided variables. Default is `FALSE`.
`variables`	`boolean` switch that decides whether the variables of all three categories (Confirmed, Tentative, Rejected) will be appended to the returned `Boruta` object. Default is `FALSE`.
`selected`	`boolean` switch that decides whether the confirmed and tentative varialbes will be added as a combined vector to the `Boruta` object. Default is `FALSE`. Only works when `variables` is `TRUE`.
`formula`	`boolean` switch that decides whether a formula will be appended to the returned `Boruta` object. This formula will relate the `target` with all confirmed `predictors`. Depending on the `tentative` switch the tentative variables might be added as well. Defaults to `FALSE`.
`tentative`	`boolean` switch that decides whether tentative attributes will be considered for a formula. Default is `FALSE`.
`pValue`	Confidence level. Default value should be used. Default is 0.01.
`mcAdj`	If set to `TRUE`, a multiple comparisons adjustment using the Bonferroni method will be applied. Default value should be used; older (1.x and 2.x) versions of Boruta were effectively using FALSE. Default value is `TRUE`.
`maxRuns`	Maximal number of importance source runs. You may increase it to resolve attributes left tentative. Default is 100.
`doTrace`	Verbosity level. 0 means no tracing, 1 means reporting decision about each attribute as soon as it is justified, 2 means same as 1, plus reporting each importance source run. Default is 0.
`holdHistory`	If set to `TRUE`, the full history of importance is stored and returned as the `ImpHistory` element of the result. Can be used to decrease a memory footprint of Boruta in case this side data is not used, especially when the number of attributes is huge; yet it disables plotting of such made Boruta objects and the use of the `Boruta::TentativeRoughFix` function. Default is `FALSE`.
`getImp`	Function used to obtain attribute importance. The default is `getImpRfZ`, which runs random forest from the `ranger` package and gathers Z-scores of mean decrease accuracy measure. It should return a numeric vector of a size identical to the number of columns of its first argument, containing importance measure of respective attributes. Any order-preserving transformation of this measure will yield the same result. It is assumed that more important attributes get higher importance. `+-Inf` are accepted, `NaNs` and `NAs` are treated as 0s, with a warning. Default is `Boruta::getImpRfZ`.
`verbose`	`boolean` switch that decides whether the error output will provide more verbose information. Default is `FALSE`.

The method first saves the name of the original target parameter so it is potentially reusable for formula creation later on. In case the fixNA switch is TRUE, all observations containing NA values will be eliminated. If this should affect all observations an error will be produced. Before executing the Boruta algorithm, the important input parameters target and predictors will be checked via the feature.boruta.fixNA method. Should any issues with the input be found (wrong data types, differing lengths, NAs) an appropriate error will be thrown. Next the actual Boruta::Boruta algorithm is executed with the provided parameters. Bortua than iteratively compares the importance of shadow attributes with the original attributes. Those with a significantly worse performance than shadow attributes will be rejected; those performing significantly better will be confirmed. Since the Boruta algorithm might not converge in the given maxRuns iterations, the Boruta::TentativeRoughFix can be used to resolve still missing values (given roughFix is TRUE). Finally, depending on the values of the variables and formula switches, a formula will be created and/or the confirmed/rejected/tentative attributes are appended to the returned Boruta object.

Boruta object as it is also returned by the underlying Boruta::Boruta method. This default return value can include severeal extensions, depending on parameters like formula:

`target`	The name of the target vector.
`variables`	Variable names of all three categories (Confirmed, Tentative, Rejected)
`selected`	Variables names of confirmed and tentative variables in one vector.
`formula`	Formula of the form `target ~ predictors.(confirmed/tentative)`

Miron B. Kursa, Witold R. Rudnicki (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), p. 1-13. URL: http://www.jstatsoft.org/v36/i11/

Boruta::Boruta

feature.boruta.fixNA

feature.boruta.tentative

feature.boruta.variables

feature.boruta.selected

feature.boruta.formula

feature.boruta.checkInputParams

 KaggleHouse:::feature.boruta(
   target = data_train_na$SalePrice, predictors = data_train_na[-81],
    fixNA = T, roughFix = T, verbose = T
 )

MarcoNiemann/kaggle_house documentation built on May 7, 2019, 2:50 p.m.

MarcoNiemann/kaggle_house index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MarcoNiemann/kaggle_house
Analysing House Prices in Ames, Iowa and Build a Sales Price Prediction Model

feature.boruta.comp: Boruta Feature Selection
In MarcoNiemann/kaggle_house: Analysing House Prices in Ames, Iowa and Build a Sales Price Prediction Model

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to feature.boruta.comp in MarcoNiemann/kaggle_house...

R Package Documentation

Browse R Packages

We want your feedback!

MarcoNiemann/kaggle_house Analysing House Prices in Ames, Iowa and Build a Sales Price Prediction Model

feature.boruta.comp: Boruta Feature Selection In MarcoNiemann/kaggle_house: Analysing House Prices in Ames, Iowa and Build a Sales Price Prediction Model

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to feature.boruta.comp in MarcoNiemann/kaggle_house...

R Package Documentation

Browse R Packages

We want your feedback!

MarcoNiemann/kaggle_house
Analysing House Prices in Ames, Iowa and Build a Sales Price Prediction Model

feature.boruta.comp: Boruta Feature Selection
In MarcoNiemann/kaggle_house: Analysing House Prices in Ames, Iowa and Build a Sales Price Prediction Model