var.sel.rfe: Variable selection using recursive feature elimination.
In silkeszy/Pomona: Identification of relevant variables in omics data sets using Random Forests

var.sel.rfe

R Documentation

Variable selection using recursive feature elimination.

Description

Compares random forests based on nested subsets of the variables and selects those variables leading to the forest with the smallest prediction error within a tolerance.

Usage

var.sel.rfe(
  x,
  y,
  prop.rm = 0.2,
  recalculate = TRUE,
  tol = 10,
  ntree = 500,
  mtry.prop = 0.2,
  nodesize.prop = 0.1,
  no.threads = 1,
  method = "ranger",
  type = "regression",
  importance = "impurity_corrected",
  case.weights = NULL
)

Arguments

`x`	matrix or data.frame of predictor variables with variables in columns and samples in rows (Note: missing values are not allowed).
`y`	vector with values of phenotype variable (Note: will be converted to factor if classification mode is used).
`prop.rm`	proportion of variables removed at each step (default value of `varSelRF`)
`recalculate`	logical stating if importance should be recalculated at each iteration (default: TRUE)
`tol`	acceptable difference in optimal performance (finds the smallest subset size that has a percent loss less than tol)
`ntree`	number of trees.
`mtry.prop`	proportion of variables that should be used at each split.
`nodesize.prop`	proportion of minimal number of samples in terminal nodes.
`no.threads`	number of threads used for parallel execution.
`method`	implementation to be used ("ranger").
`type`	mode of prediction ("regression", "classification" or "probability").
`importance`	Variable importance mode ('none', 'impurity', 'impurity_corrected' or 'permutation'). Default is 'impurity_corrected'.
`case.weights`	Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.

Details

Note: This function differs from the approach implemented in the R package varSelRF because it recalculates importance scores in each step. The tolerance step is based on the pickSizeTolerance function in the R package caret.

Value

List with the following components:

info data.frame with information for each variable
- included.until.subset = number of smallest subset which contains variable
- selected = variable has been selected
var vector of selected variables
info.runs data.frame with information for each run
- n = number of variables
- mse = mean squared error
- rsq = R^2

@examples # simulate toy data set data = simulation.data.cor(no.samples = 100, group.size = rep(10, 6), no.var.total = 200)

# select variables res = var.sel.rfe(x = data[, -1], y = data[, 1], prop.rm = 0.2, recalculate = TRUE) res$var

silkeszy/Pomona documentation built on March 31, 2022, 11:13 p.m.

silkeszy/Pomona index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

silkeszy/Pomona
Identification of relevant variables in omics data sets using Random Forests

var.sel.rfe: Variable selection using recursive feature elimination.
In silkeszy/Pomona: Identification of relevant variables in omics data sets using Random Forests

Variable selection using recursive feature elimination.

Description

Usage

Arguments

Details

Value

Related to var.sel.rfe in silkeszy/Pomona...

R Package Documentation

Browse R Packages

We want your feedback!

silkeszy/Pomona Identification of relevant variables in omics data sets using Random Forests

var.sel.rfe: Variable selection using recursive feature elimination. In silkeszy/Pomona: Identification of relevant variables in omics data sets using Random Forests

Variable selection using recursive feature elimination.

Description

Usage

Arguments

Details

Value

Related to var.sel.rfe in silkeszy/Pomona...

R Package Documentation

Browse R Packages

We want your feedback!

silkeszy/Pomona
Identification of relevant variables in omics data sets using Random Forests

var.sel.rfe: Variable selection using recursive feature elimination.
In silkeszy/Pomona: Identification of relevant variables in omics data sets using Random Forests