var.sel.boruta: Variable selection using Boruta function.

View source: R/variable_selection_boruta.R

var.sel.borutaR Documentation

Variable selection using Boruta function.

Description

Variable selection using the Boruta function in the R package Boruta.

Usage

var.sel.boruta(
  x,
  y,
  pValue = 0.01,
  maxRuns = 100,
  ntree = 500,
  mtry.prop = 0.2,
  nodesize.prop = 0.1,
  no.threads = 1,
  method = "ranger",
  type = "regression",
  importance = "impurity_corrected",
  case.weights = NULL
)

Arguments

x

matrix or data.frame of predictor variables with variables in columns and samples in rows (Note: missing values are not allowed).

y

vector with values of phenotype variable (Note: will be converted to factor if classification mode is used).

pValue

confidence level (default: 0.01 based on Boruta package)

maxRuns

maximal number of importance source runs (default: 100 based on Boruta package)

ntree

number of trees.

mtry.prop

proportion of variables that should be used at each split.

nodesize.prop

proportion of minimal number of samples in terminal nodes.

no.threads

number of threads used for parallel execution.

method

implementation to be used ("ranger").

type

mode of prediction ("regression", "classification" or "probability").

importance

Variable importance mode ('none', 'impurity', 'impurity_corrected' or 'permutation'). Default is 'impurity_corrected'.

case.weights

Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.

Details

This function selects only variables that are confirmed based on Boruta implementation. For more details see Boruta. Note that this function uses the ranger implementation for variable selection.

Value

List with the following components:

  • info data.frame with information of each variable

    • run.x = original variable importance (VIM) in run x (includes min, mean and max of VIM of shadow variables)

    • decision = Boruta decision (Confirmed, Rejected or Tentative)

    • selected = variable has been selected

  • var vector of selected variables

  • info.shadow.var data.frame with information about minimal, mean and maximal shadow variables of each run

@examples # simulate toy data set data = simulation.data.cor(no.samples = 100, group.size = rep(10, 6), no.var.total = 200)

# select variables res = var.sel.boruta(x = data[, -1], y = data[, 1]) res$var


silkeszy/Pomona documentation built on March 31, 2022, 11:13 p.m.