knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this vignette, we present a global variable importance measure based on Partial Dependence Profiles (PDP) for the random forest regression model.
library("ggplot2")
We work on Apartments dataset from DALEX
package.
library("DALEX") data(apartments) head(apartments)
Now, we define a random forest regression model and use explain()
function from DALEX
.
library("randomForest") apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:5], y = apartmentsTest$m2.price)
Let see the Partial Dependence Profiles calculated with DALEX::model_profile()
function. The PDP also can be calculated with DALEX::variable_profile()
or ingredients::partial_dependence()
.
profiles <- model_profile(explainer_rf) plot(profiles)
Now, we calculated a measure of global variable importance via oscillation based on PDP.
library("vivo") measure <- global_variable_importance(profiles)
plot(measure)
The most important variable is surface, then no.rooms, floor, and construction.year.
Let created a linear regression model and explain
object.
apartments_lm_model <- lm(m2.price ~ construction.year + surface + floor + no.rooms, data = apartments) explainer_lm <- explain(apartments_lm_model, data = apartmentsTest[,2:5], y = apartmentsTest$m2.price)
We calculated Partial Dependence Profiles and measure.
profiles_lm <- model_profile(explainer_lm) measure_lm <- global_variable_importance(profiles_lm)
plot(measure_lm, measure, type = "lines")
Now we can see the order of importance of variables by model.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.