This R-package computes the Conditional Permutation Importance (CPI; Strobl, 2008)
using an alternative implementation that is both faster and more
stable (Debeer & Strobl 2020). The (C)PI can
be computed for random forest fit using (a) the original impurity
reduction method ( randomForest
-package), and (b) using the Conditional
Inference framework (party
-package). In addition, a plotting method for
the resulting VarImp
-object is included.
The package can be installed using using the devtools
-package:
install.packages("devtools")
devtools::install_github("ddebeer/permimp")
The workhorse is the permimp
-function.
?permimp
For documentation about the plotting function:
```{?plot.VarImp} ?plot.VarImp
## Example
library(party) library(randomForest) library(permimp)
set.seed(542863)
airq <- subset(airquality, !(is.na(Ozone) | is.na(Solar.R)))
cfAirq5 <- cforest(Ozone ~ ., data = airq, control = cforest_unbiased(mtry = 3, ntree = 1000, minbucket = 5, minsplit = 10))
permimp_cf <- permimp(cfAirq5, conditional = TRUE) plot(permimp_cf, type = "box", interval = "quantile")
rfAirq5 <- randomForest(Ozone ~ ., data = airq, mtry = 3, ntree = 1000, importance = TRUE, keep.forest = TRUE, keep.inbag = TRUE)
permimp_rf <- permimp(rfAirq5, conditional = TRUE) plot(permimp_rf, horizontal = TRUE) ```
For forests with large trees parallel processing may speed up the computations.
Parallel processing is possible via the cl
argument. Under the hood, the
pblapply
function from the pbapply-package.
Tip: when using parallel processing set progressBar = FALSE
. The additional communication
between the nodes for updating the progress bar will slow down the computations.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.