importance_pvalues: ranger variable importance p-values
In ranger: A Fast Implementation of Random Forests

importance_pvalues

R Documentation

ranger variable importance p-values

Description

Compute variable importance with p-values. For high dimensional data, the fast method of Janitza et al. (2016) can be used. The permutation approach of Altmann et al. (2010) is computationally intensive but can be used with all kinds of data. See below for details.

Usage

importance_pvalues(
  x,
  method = c("janitza", "altmann"),
  num.permutations = 100,
  formula = NULL,
  data = NULL,
  ...
)

Arguments

`x`	`ranger` or `holdoutRF` object.
`method`	Method to compute p-values. Use "janitza" for the method by Janitza et al. (2016) or "altmann" for the non-parametric method by Altmann et al. (2010).
`num.permutations`	Number of permutations. Used in the "altmann" method only.
`formula`	Object of class formula or character describing the model to fit. Used in the "altmann" method only.
`data`	Training data of class data.frame or matrix. Used in the "altmann" method only.
`...`	Further arguments passed to `ranger()`. Used in the "altmann" method only.

Details

The method of Janitza et al. (2016) uses a clever trick: With an unbiased variable importance measure, the importance values of non-associated variables vary randomly around zero. Thus, all non-positive importance values are assumed to correspond to these non-associated variables and they are used to construct a distribution of the importance under the null hypothesis of no association to the response. Since only the non-positive values of this distribution can be observed, the positive values are created by mirroring the negative distribution. See Janitza et al. (2016) for details.

The method of Altmann et al. (2010) uses a simple permutation test: The distribution of the importance under the null hypothesis of no association to the response is created by several replications of permuting the response, growing an RF and computing the variable importance. The authors recommend 50-100 permutations. However, much larger numbers have to be used to estimate more precise p-values. We add 1 to the numerator and denominator to avoid zero p-values.

Value

Variable importance and p-value for each variable.

Author(s)

Marvin N. Wright

References

Janitza, S., Celik, E. & Boulesteix, A.-L., (2016). A computationally fast variable importance test for random forests for high-dimensional data. Adv Data Anal Classif \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s11634-016-0276-4")}.
Altmann, A., Tolosi, L., Sander, O. & Lengauer, T. (2010). Permutation importance: a corrected feature importance measure, Bioinformatics 26:1340-1347.

Examples

## Janitza's p-values with corrected Gini importance
n <- 50
p <- 400
dat <- data.frame(y = factor(rbinom(n, 1, .5)), replicate(p, runif(n)))
rf.sim <- ranger(y ~ ., dat, importance = "impurity_corrected")
importance_pvalues(rf.sim, method = "janitza")

## Permutation p-values 
## Not run: 
rf.iris <- ranger(Species ~ ., data = iris, importance = 'permutation')
importance_pvalues(rf.iris, method = "altmann", formula = Species ~ ., data = iris)

## End(Not run)

ranger documentation built on April 4, 2025, 6:12 a.m.

ranger index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ranger
A Fast Implementation of Random Forests

importance_pvalues: ranger variable importance p-values
In ranger: A Fast Implementation of Random Forests

ranger variable importance p-values

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to importance_pvalues in ranger...

R Package Documentation

Browse R Packages

We want your feedback!

ranger A Fast Implementation of Random Forests

importance_pvalues: ranger variable importance p-values In ranger: A Fast Implementation of Random Forests

ranger variable importance p-values

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to importance_pvalues in ranger...

R Package Documentation

Browse R Packages

We want your feedback!

ranger
A Fast Implementation of Random Forests

importance_pvalues: ranger variable importance p-values
In ranger: A Fast Implementation of Random Forests