Boruta | R Documentation |
Boruta is an all relevant feature selection wrapper algorithm, capable of working with any classification method that output variable importance measure (VIM); by default, Boruta uses Random Forest. The method performs a top-down search for relevant features by comparing original attributes' importance with importance achievable at random, estimated using their permuted copies, and progressively eliminating irrelevant features to stabilise that test.
Boruta(x, ...) ## Default S3 method: Boruta( x, y, pValue = 0.01, mcAdj = TRUE, maxRuns = 100, doTrace = 0, holdHistory = TRUE, getImp = getImpRfZ, ... ) ## S3 method for class 'formula' Boruta(formula, data, ...)
x |
data frame of predictors. |
... |
additional parameters passed to |
y |
response vector; factor for classification, numeric vector for regression, |
pValue |
confidence level. Default value should be used. |
mcAdj |
if set to |
maxRuns |
maximal number of importance source runs. You may increase it to resolve attributes left Tentative. |
doTrace |
verbosity level. 0 means no tracing, 1 means reporting decision about each attribute as soon as it is justified, 2 means the same as 1, plus reporting each importance source run, 3 means the same as 2, plus reporting of hits assigned to yet undecided attributes. |
holdHistory |
if set to |
getImp |
function used to obtain attribute importance.
The default is getImpRfZ, which runs random forest from the |
formula |
alternatively, formula describing model to be analysed. |
data |
in which to interpret formula. |
Boruta iteratively compares importances of attributes with importances of shadow attributes, created by shuffling original ones.
Attributes that have significantly worst importance than shadow ones are being consecutively dropped.
On the other hand, attributes that are significantly better than shadows are admitted to be Confirmed.
Shadows are re-created in each iteration.
Algorithm stops when only Confirmed attributes are left, or when it reaches maxRuns
importance source runs.
If the second scenario occurs, some attributes may be left without a decision.
They are claimed Tentative.
You may try to extend maxRuns
or lower pValue
to clarify them, but in some cases their importances do fluctuate too much for Boruta to converge.
Instead, you can use TentativeRoughFix
function, which will perform other, weaker test to make a final decision, or simply treat them as undecided in further analysis.
An object of class Boruta
, which is a list with the following components:
finalDecision |
a factor of three value: |
ImpHistory |
a data frame of importances of attributes gathered in each importance source run.
Beside predictors' importances, it contains maximal, mean and minimal importance of shadow attributes in each run.
Rejected attributes get |
timeTaken |
time taken by the computation. |
impSource |
string describing the source of importance, equal to a comment attribute of the |
call |
the original call of the |
Miron B. Kursa, Witold R. Rudnicki (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), p. 1-13. URL: doi: 10.18637/jss.v036.i11
set.seed(777) #Boruta on the "small redundant XOR" problem; read ?srx for details data(srx) Boruta(Y~.,data=srx)->Boruta.srx #Results summary print(Boruta.srx) #Result plot plot(Boruta.srx) #Attribute statistics attStats(Boruta.srx) #Using alternative importance source, rFerns Boruta(Y~.,data=srx,getImp=getImpFerns)->Boruta.srx.ferns print(Boruta.srx.ferns) #Verbose Boruta(Y~.,data=srx,doTrace=2)->Boruta.srx ## Not run: #Boruta on the iris problem extended with artificial irrelevant features #Generate said features iris.extended<-data.frame(iris,apply(iris[,-5],2,sample)) names(iris.extended)[6:9]<-paste("Nonsense",1:4,sep="") #Run Boruta on this data Boruta(Species~.,data=iris.extended,doTrace=2)->Boruta.iris.extended #Nonsense attributes should be rejected print(Boruta.iris.extended) ## End(Not run) ## Not run: #Boruta on the HouseVotes84 data from mlbench library(mlbench); data(HouseVotes84) na.omit(HouseVotes84)->hvo #Takes some time, so be patient Boruta(Class~.,data=hvo,doTrace=2)->Bor.hvo print(Bor.hvo) plot(Bor.hvo) plotImpHistory(Bor.hvo) ## End(Not run) ## Not run: #Boruta on the Ozone data from mlbench library(mlbench); data(Ozone) library(randomForest) na.omit(Ozone)->ozo Boruta(V4~.,data=ozo,doTrace=2)->Bor.ozo cat('Random forest run on all attributes:\n') print(randomForest(V4~.,data=ozo)) cat('Random forest run only on confirmed attributes:\n') print(randomForest(ozo[,getSelectedAttributes(Bor.ozo)],ozo$V4)) ## End(Not run) ## Not run: #Boruta on the Sonar data from mlbench library(mlbench); data(Sonar) #Takes some time, so be patient Boruta(Class~.,data=Sonar,doTrace=2)->Bor.son print(Bor.son) #Shows important bands plot(Bor.son,sort=FALSE) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.