knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Example of local variable importance

In this vignette, we present a local variable importance measure based on Ceteris Paribus profiles for random forest regression model.

library("ggplot2")

1 Dataset

We work on Apartments dataset from DALEX package.

library("DALEX")
data(apartments)
head(apartments)

2 Random forest regression model

Now, we define a random forest regression model and use explain from DALEX.

library("randomForest")
apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor +
                                      no.rooms, data = apartments)
explainer_rf <- explain(apartments_rf_model,
                        data = apartmentsTest[,2:5], y = apartmentsTest$m2.price)

3 New observation

We need to specify an observation. Let consider a new apartment with the following attributes. Moreover, we calculate predict value for this new observation.

new_apartment <- data.frame(construction.year = 1998, surface = 88, floor = 2L, no.rooms = 3)
predict(apartments_rf_model, new_apartment)

4 Calculate Ceteris Paribus profiles

Let see the Ceteris Paribus Plots calculated with DALEX::predict_profile() function. The CP also can be calculated with DALEX::individual_profile() or ingredients::ceteris_paribus().

library("ingredients")
profiles <- predict_profile(explainer_rf, new_apartment)
plot(profiles) + show_observations(profiles)

5 Calculate measure of local variable importance

Now, we calculated a measure of local variable importance via oscillation based on Ceteris Paribus profiles. We use variant with all parameters equals to TRUE.

library("vivo")
measure <- local_variable_importance(profiles, apartments[,2:5], 
            absolute_deviation = TRUE, point = TRUE, density = TRUE)
plot(measure)

For the new observation the most important variable is surface, then floor, construction.year and no.rooms.

6 Comparison of two or more methods of calculating the importance of variables

We calculated local variable importance for different parameters and we can plot together, on bar plot or lines plot.

measure_2 <- local_variable_importance(profiles, apartments[,2:5], 
            absolute_deviation = FALSE, point = TRUE, density = TRUE)
measure_3 <- local_variable_importance(profiles, apartments[,2:5], 
            absolute_deviation = FALSE, point = TRUE, density = FALSE)
plot(measure, measure_2, measure_3, color = "_label_method_")
plot(measure, measure_2, measure_3, color = "_label_method_", type = "lines")

7 Comparison of the importance of variables for two or more models

Let created a linear regression model and explain object.

apartments_lm_model <- lm(m2.price ~ construction.year + surface + floor +
                                      no.rooms, data = apartments)
explainer_lm <- explain(apartments_lm_model,
                        data = apartmentsTest[,2:5], y = apartmentsTest$m2.price)

We calculated Ceteris Paribus profiles and measure.

profiles_lm <- predict_profile(explainer_lm, new_apartment)

measure_lm <- local_variable_importance(profiles_lm, apartments[,2:5], 
            absolute_deviation = TRUE, point = TRUE, density = TRUE)
plot(measure, measure_lm, color = "_label_model_", type = "lines")

Now we can see the order of importance of variables by model for selected observation.



ModelOriented/vivo documentation built on Sept. 29, 2020, 10:53 p.m.