View source: R/variable_selection_perm.R
var.sel.perm | R Documentation |
Selects variables which importance scores are larger than scores calculated after permuting the phenotype. Output is a p-value calculated as the proportion of permutations with an equal or larger importance score.
var.sel.perm( x, y, no.perm = 100, p.t = 0, ntree = 500, mtry.prop = 0.2, nodesize.prop = 0.1, no.threads = 1, method = "ranger", type = "regression", parametric = FALSE, importance = "impurity_corrected", case.weights = NULL )
x |
matrix or data.frame of predictor variables with variables in columns and samples in rows (Note: missing values are not allowed). |
y |
vector with values of phenotype variable (Note: will be converted to factor if classification mode is used). |
no.perm |
number of permutations |
p.t |
threshold for p-values (all variables with a p-value = 0 or < p.t will be selected) |
ntree |
number of trees. |
mtry.prop |
proportion of variables that should be used at each split. |
nodesize.prop |
proportion of minimal number of samples in terminal nodes. |
no.threads |
number of threads used for parallel execution. |
method |
implementation to be used ("ranger"). |
type |
mode of prediction ("regression", "classification" or "probability"). |
parametric |
logical stating if parametric permutation approach of Altmann et al. 2010 (based on normal distribution) should be used (default: FALSE) |
importance |
Variable importance mode ('none', 'impurity', 'impurity_corrected' or 'permutation'). Default is 'impurity_corrected'. |
case.weights |
Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees. |
Note:
This function is a reimplementation of the approach in the R package rfPermute
and the
parametric permutation approach by Altmann et al. (2010).
List with the following components:
info
data.frame
with information for each variable
vim.original = original variable importance (VIM)
vim.perm.x = VIM in permutation x
pvalue = proportion of permutations with a larger VIM than original VIM (nonparametric) or probability of observing the original or a larger VIM, given the fitted null importance distribution based on normal distributions (parametric)
selected = variable has been selected
var
vector of selected variables
Altmann, A., Tolosi, L., Sander, O. & Lengauer, T. (2010). Permutation importance: a corrected feature importance measure, Bioinformatics 26:1340-1347.
@examples # simulate toy data set data = simulation.data.cor(no.samples = 100, group.size = rep(10, 6), no.var.total = 200)
# select variables based on nonparametric permutation approach res = var.sel.perm(x = data[, -1], y = data[, 1], no.perm = 10, p.t = 0) res$var
# select variables based on parametric permutation approach res.par = var.sel.perm(x = data[, -1], y = data[, 1], no.perm = 10, p.t = 0.05, parametric = TRUE) res.par$var
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.