Example of global variable importance"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Example of global variable importance

In this vignette, we present a global variable importance measure based on Partial Dependence Profiles (PDP) for the random forest regression model.

library("ggplot2")

1 Dataset

We work on Apartments dataset from DALEX package.

library("DALEX")
data(apartments)
head(apartments)

2 Random forest regression model

Now, we define a random forest regression model and use explain() function from DALEX.

library("randomForest")
apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor +
                                      no.rooms, data = apartments)
explainer_rf <- explain(apartments_rf_model,
                        data = apartmentsTest[,2:5], y = apartmentsTest$m2.price)

3 Calculate Partial Dependence Profiles

Let see the Partial Dependence Profiles calculated with DALEX::model_profile() function. The PDP also can be calculated with DALEX::variable_profile() or ingredients::partial_dependence().

profiles <- model_profile(explainer_rf)
plot(profiles) 

4 Calculate measure of global variable importance

Now, we calculated a measure of global variable importance via oscillation based on PDP.

library("vivo")
measure <- global_variable_importance(profiles)
plot(measure)

The most important variable is surface, then no.rooms, floor, and construction.year.

5 Comparison of the importance of variables for two or more models

Let created a linear regression model and explain object.

apartments_lm_model <- lm(m2.price ~ construction.year + surface + floor +
                                      no.rooms, data = apartments)
explainer_lm <- explain(apartments_lm_model,
                        data = apartmentsTest[,2:5], y = apartmentsTest$m2.price)

We calculated Partial Dependence Profiles and measure.

profiles_lm <- model_profile(explainer_lm)

measure_lm <- global_variable_importance(profiles_lm)
plot(measure_lm, measure, type = "lines")

Now we can see the order of importance of variables by model.



Try the vivo package in your browser

Any scripts or data that you put into this service are public.

vivo documentation built on Sept. 7, 2020, 5:09 p.m.