var.sel.vita: Variable selection using Vita approach.

View source: R/variable_selection_vita.R

var.sel.vitaR Documentation

Variable selection using Vita approach.

Description

This function calculates p-values based on the empirical null distribution from non-positive VIMs as described in Janitza et al. (2015). Note that this function uses the importance_pvalues function in the R package ranger.

Usage

var.sel.vita(
  x,
  y,
  p.t = 0.05,
  ntree = 500,
  mtry.prop = 0.2,
  nodesize.prop = 0.1,
  no.threads = 1,
  method = "ranger",
  type = "regression",
  importance = "impurity_corrected"
)

Arguments

x

matrix or data.frame of predictor variables with variables in columns and samples in rows (Note: missing values are not allowed).

y

vector with values of phenotype variable (Note: will be converted to factor if classification mode is used).

p.t

threshold for p-values (all variables with a p-value = 0 or < p.t will be selected)

ntree

number of trees.

mtry.prop

proportion of variables that should be used at each split.

nodesize.prop

proportion of minimal number of samples in terminal nodes.

no.threads

number of threads used for parallel execution.

method

implementation to be used ("ranger").

type

mode of prediction ("regression", "classification" or "probability").

importance

Variable importance mode ('none', 'impurity', 'impurity_corrected' or 'permutation'). Default is 'impurity_corrected'.

Value

List with the following components:

  • info data.frame with information for each variable

    • vim = variable importance (VIM)

    • CI_lower = lower confidence interval boundary

    • CI_upper = upper confidence interval boundary

    • pvalue = empirical p-value

    • selected = variable has been selected

  • var vector of selected variables

@references Janitza, S., Celik, E. & Boulesteix, A.-L., (2015). A computationally fast variable importance test for random forest for high dimensional data, Technical Report 185, University of Munich, https://epub.ub.uni-muenchen.de/25587.

@examples # simulate toy data set data = simulation.data.cor(no.samples = 100, group.size = rep(10, 6), no.var.total = 500)

# select variables res = var.sel.vita(x = data[, -1], y = data[, 1], p.t = 0.05) res$var


silkeszy/Pomona documentation built on March 31, 2022, 11:13 p.m.