var.sel.perm: Variable selection using a permutation approach.
In silkeszy/Pomona: Identification of relevant variables in omics data sets using Random Forests

View source: R/variable_selection_perm.R

var.sel.perm

R Documentation

Variable selection using a permutation approach.

Description

Selects variables which importance scores are larger than scores calculated after permuting the phenotype. Output is a p-value calculated as the proportion of permutations with an equal or larger importance score.

Usage

var.sel.perm(
  x,
  y,
  no.perm = 100,
  p.t = 0,
  ntree = 500,
  mtry.prop = 0.2,
  nodesize.prop = 0.1,
  no.threads = 1,
  method = "ranger",
  type = "regression",
  parametric = FALSE,
  importance = "impurity_corrected",
  case.weights = NULL
)

Arguments

`x`	matrix or data.frame of predictor variables with variables in columns and samples in rows (Note: missing values are not allowed).
`y`	vector with values of phenotype variable (Note: will be converted to factor if classification mode is used).
`no.perm`	number of permutations
`p.t`	threshold for p-values (all variables with a p-value = 0 or < p.t will be selected)
`ntree`	number of trees.
`mtry.prop`	proportion of variables that should be used at each split.
`nodesize.prop`	proportion of minimal number of samples in terminal nodes.
`no.threads`	number of threads used for parallel execution.
`method`	implementation to be used ("ranger").
`type`	mode of prediction ("regression", "classification" or "probability").
`parametric`	logical stating if parametric permutation approach of Altmann et al. 2010 (based on normal distribution) should be used (default: FALSE)
`importance`	Variable importance mode ('none', 'impurity', 'impurity_corrected' or 'permutation'). Default is 'impurity_corrected'.
`case.weights`	Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.

Details

Note: This function is a reimplementation of the approach in the R package rfPermute and the parametric permutation approach by Altmann et al. (2010).

Value

List with the following components:

info data.frame with information for each variable
- vim.original = original variable importance (VIM)
- vim.perm.x = VIM in permutation x
- pvalue = proportion of permutations with a larger VIM than original VIM (nonparametric) or probability of observing the original or a larger VIM, given the fitted null importance distribution based on normal distributions (parametric)
- selected = variable has been selected
var vector of selected variables

References

Altmann, A., Tolosi, L., Sander, O. & Lengauer, T. (2010). Permutation importance: a corrected feature importance measure, Bioinformatics 26:1340-1347.

@examples # simulate toy data set data = simulation.data.cor(no.samples = 100, group.size = rep(10, 6), no.var.total = 200)

# select variables based on nonparametric permutation approach res = var.sel.perm(x = data[, -1], y = data[, 1], no.perm = 10, p.t = 0) res$var

# select variables based on parametric permutation approach res.par = var.sel.perm(x = data[, -1], y = data[, 1], no.perm = 10, p.t = 0.05, parametric = TRUE) res.par$var

silkeszy/Pomona documentation built on March 31, 2022, 11:13 p.m.

silkeszy/Pomona index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

silkeszy/Pomona
Identification of relevant variables in omics data sets using Random Forests

var.sel.perm: Variable selection using a permutation approach.
In silkeszy/Pomona: Identification of relevant variables in omics data sets using Random Forests

Variable selection using a permutation approach.

Description

Usage

Arguments

Details

Value

References

Related to var.sel.perm in silkeszy/Pomona...

R Package Documentation

Browse R Packages

We want your feedback!

silkeszy/Pomona Identification of relevant variables in omics data sets using Random Forests

var.sel.perm: Variable selection using a permutation approach. In silkeszy/Pomona: Identification of relevant variables in omics data sets using Random Forests

Variable selection using a permutation approach.

Description

Usage

Arguments

Details

Value

References

Related to var.sel.perm in silkeszy/Pomona...

R Package Documentation

Browse R Packages

We want your feedback!

silkeszy/Pomona
Identification of relevant variables in omics data sets using Random Forests

var.sel.perm: Variable selection using a permutation approach.
In silkeszy/Pomona: Identification of relevant variables in omics data sets using Random Forests