var.sel.r2vim: Variable selection using recurrent relative variable...

View source: R/variable_selection_r2vim.R

var.sel.r2vimR Documentation

Variable selection using recurrent relative variable importance (r2VIM).

Description

Generates several random forests using all variables and different random number seeds. For each run, the importance score is divided by the (absolute) minimal importance score (relative importance scores). Variables are selected if the minimal relative importance score is >= factor.

Usage

var.sel.r2vim(
  x,
  y,
  no.runs = 10,
  factor = 1,
  ntree = 500,
  mtry.prop = 0.2,
  nodesize.prop = 0.1,
  no.threads = 1,
  method = "ranger",
  type = "regression",
  importance = "impurity_corrected",
  case.weights = NULL
)

Arguments

x

matrix or data.frame of predictor variables with variables in columns and samples in rows (Note: missing values are not allowed).

y

vector with values of phenotype variable (Note: will be converted to factor if classification mode is used).

no.runs

number of random forests to be generated

factor

minimal relative importance score for a variable to be selected

ntree

number of trees.

mtry.prop

proportion of variables that should be used at each split.

nodesize.prop

proportion of minimal number of samples in terminal nodes.

no.threads

number of threads used for parallel execution.

method

implementation to be used ("ranger").

type

mode of prediction ("regression", "classification" or "probability").

importance

Variable importance mode ('none', 'impurity', 'impurity_corrected' or 'permutation'). Default is 'impurity_corrected'.

case.weights

Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.

Details

Note: This function is a reimplementation of the R package RFVarSelGWAS.

Value

List with the following components:

  • info data.frame with information for each variable

    • vim.run.x = original variable importance (VIM) in run x

    • rel.vim.run.x = relative VIM in run x

    • rel.vim.min = minimal relative VIM over all runs

    • rel.vim.med = median relative VIM over all runs

    • selected = variable has been selected

  • var vector of selected variables

@examples # simulate toy data set data = simulation.data.cor(no.samples = 100, group.size = rep(10, 6), no.var.total = 200)

# select variables res = var.sel.r2vim(x = data[, -1], y = data[, 1], no.runs = 5, factor = 1) res$var


silkeszy/Pomona documentation built on March 31, 2022, 11:13 p.m.