var.sel.boruta: Variable selection using Boruta function.
In silkeszy/Pomona: Identification of relevant variables in omics data sets using Random Forests

View source: R/variable_selection_boruta.R

var.sel.boruta

R Documentation

Variable selection using Boruta function.

Description

Variable selection using the Boruta function in the R package Boruta.

Usage

var.sel.boruta(
  x,
  y,
  pValue = 0.01,
  maxRuns = 100,
  ntree = 500,
  mtry.prop = 0.2,
  nodesize.prop = 0.1,
  no.threads = 1,
  method = "ranger",
  type = "regression",
  importance = "impurity_corrected",
  case.weights = NULL
)

Arguments

`x`	matrix or data.frame of predictor variables with variables in columns and samples in rows (Note: missing values are not allowed).
`y`	vector with values of phenotype variable (Note: will be converted to factor if classification mode is used).
`pValue`	confidence level (default: 0.01 based on Boruta package)
`maxRuns`	maximal number of importance source runs (default: 100 based on Boruta package)
`ntree`	number of trees.
`mtry.prop`	proportion of variables that should be used at each split.
`nodesize.prop`	proportion of minimal number of samples in terminal nodes.
`no.threads`	number of threads used for parallel execution.
`method`	implementation to be used ("ranger").
`type`	mode of prediction ("regression", "classification" or "probability").
`importance`	Variable importance mode ('none', 'impurity', 'impurity_corrected' or 'permutation'). Default is 'impurity_corrected'.
`case.weights`	Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.

Details

This function selects only variables that are confirmed based on Boruta implementation. For more details see Boruta. Note that this function uses the ranger implementation for variable selection.

Value

List with the following components:

info data.frame with information of each variable
- run.x = original variable importance (VIM) in run x (includes min, mean and max of VIM of shadow variables)
- decision = Boruta decision (Confirmed, Rejected or Tentative)
- selected = variable has been selected
var vector of selected variables
info.shadow.var data.frame with information about minimal, mean and maximal shadow variables of each run

@examples # simulate toy data set data = simulation.data.cor(no.samples = 100, group.size = rep(10, 6), no.var.total = 200)

# select variables res = var.sel.boruta(x = data[, -1], y = data[, 1]) res$var

silkeszy/Pomona documentation built on March 31, 2022, 11:13 p.m.

silkeszy/Pomona index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

silkeszy/Pomona
Identification of relevant variables in omics data sets using Random Forests

var.sel.boruta: Variable selection using Boruta function.
In silkeszy/Pomona: Identification of relevant variables in omics data sets using Random Forests

Variable selection using Boruta function.

Description

Usage

Arguments

Details

Value

Related to var.sel.boruta in silkeszy/Pomona...

R Package Documentation

Browse R Packages

We want your feedback!

silkeszy/Pomona Identification of relevant variables in omics data sets using Random Forests

var.sel.boruta: Variable selection using Boruta function. In silkeszy/Pomona: Identification of relevant variables in omics data sets using Random Forests

Variable selection using Boruta function.

Description

Usage

Arguments

Details

Value

Related to var.sel.boruta in silkeszy/Pomona...

R Package Documentation

Browse R Packages

We want your feedback!

silkeszy/Pomona
Identification of relevant variables in omics data sets using Random Forests

var.sel.boruta: Variable selection using Boruta function.
In silkeszy/Pomona: Identification of relevant variables in omics data sets using Random Forests