var.sel.perm: Variable selection using a permutation approach.

View source: R/variable_selection_perm.R

var.sel.permR Documentation

Variable selection using a permutation approach.

Description

Selects variables which importance scores are larger than scores calculated after permuting the phenotype. Output is a p-value calculated as the proportion of permutations with an equal or larger importance score.

Usage

var.sel.perm(
  x,
  y,
  no.perm = 100,
  p.t = 0,
  ntree = 500,
  mtry.prop = 0.2,
  nodesize.prop = 0.1,
  no.threads = 1,
  method = "ranger",
  type = "regression",
  parametric = FALSE,
  importance = "impurity_corrected",
  case.weights = NULL
)

Arguments

x

matrix or data.frame of predictor variables with variables in columns and samples in rows (Note: missing values are not allowed).

y

vector with values of phenotype variable (Note: will be converted to factor if classification mode is used).

no.perm

number of permutations

p.t

threshold for p-values (all variables with a p-value = 0 or < p.t will be selected)

ntree

number of trees.

mtry.prop

proportion of variables that should be used at each split.

nodesize.prop

proportion of minimal number of samples in terminal nodes.

no.threads

number of threads used for parallel execution.

method

implementation to be used ("ranger").

type

mode of prediction ("regression", "classification" or "probability").

parametric

logical stating if parametric permutation approach of Altmann et al. 2010 (based on normal distribution) should be used (default: FALSE)

importance

Variable importance mode ('none', 'impurity', 'impurity_corrected' or 'permutation'). Default is 'impurity_corrected'.

case.weights

Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.

Details

Note: This function is a reimplementation of the approach in the R package rfPermute and the parametric permutation approach by Altmann et al. (2010).

Value

List with the following components:

  • info data.frame with information for each variable

    • vim.original = original variable importance (VIM)

    • vim.perm.x = VIM in permutation x

    • pvalue = proportion of permutations with a larger VIM than original VIM (nonparametric) or probability of observing the original or a larger VIM, given the fitted null importance distribution based on normal distributions (parametric)

    • selected = variable has been selected

  • var vector of selected variables

References

Altmann, A., Tolosi, L., Sander, O. & Lengauer, T. (2010). Permutation importance: a corrected feature importance measure, Bioinformatics 26:1340-1347.

@examples # simulate toy data set data = simulation.data.cor(no.samples = 100, group.size = rep(10, 6), no.var.total = 200)

# select variables based on nonparametric permutation approach res = var.sel.perm(x = data[, -1], y = data[, 1], no.perm = 10, p.t = 0) res$var

# select variables based on parametric permutation approach res.par = var.sel.perm(x = data[, -1], y = data[, 1], no.perm = 10, p.t = 0.05, parametric = TRUE) res.par$var


silkeszy/Pomona documentation built on March 31, 2022, 11:13 p.m.