View source: R/variable_selection_boruta.R
var.sel.boruta | R Documentation |
Variable selection using the Boruta function in the R package Boruta
.
var.sel.boruta( x, y, pValue = 0.01, maxRuns = 100, ntree = 500, mtry.prop = 0.2, nodesize.prop = 0.1, no.threads = 1, method = "ranger", type = "regression", importance = "impurity_corrected", case.weights = NULL )
x |
matrix or data.frame of predictor variables with variables in columns and samples in rows (Note: missing values are not allowed). |
y |
vector with values of phenotype variable (Note: will be converted to factor if classification mode is used). |
pValue |
confidence level (default: 0.01 based on Boruta package) |
maxRuns |
maximal number of importance source runs (default: 100 based on Boruta package) |
ntree |
number of trees. |
mtry.prop |
proportion of variables that should be used at each split. |
nodesize.prop |
proportion of minimal number of samples in terminal nodes. |
no.threads |
number of threads used for parallel execution. |
method |
implementation to be used ("ranger"). |
type |
mode of prediction ("regression", "classification" or "probability"). |
importance |
Variable importance mode ('none', 'impurity', 'impurity_corrected' or 'permutation'). Default is 'impurity_corrected'. |
case.weights |
Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees. |
This function selects only variables that are confirmed based on Boruta implementation.
For more details see Boruta
.
Note that this function uses the ranger implementation for variable selection.
List with the following components:
info
data.frame
with information of each variable
run.x = original variable importance (VIM) in run x (includes min, mean and max of VIM of shadow variables)
decision = Boruta decision (Confirmed, Rejected or Tentative)
selected = variable has been selected
var
vector of selected variables
info.shadow.var
data.frame with information about
minimal, mean and maximal shadow variables of each run
@examples # simulate toy data set data = simulation.data.cor(no.samples = 100, group.size = rep(10, 6), no.var.total = 200)
# select variables res = var.sel.boruta(x = data[, -1], y = data[, 1]) res$var
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.