knitr::opts_chunk$set( collapse = FALSE, comment = "#>", warning = FALSE, message = FALSE )
Let's see an example for DALEX
package for classification models for the survival problem for Titanic dataset.
Here we are using a dataset titanic_imputed
avaliable in the DALEX
package. Note that this data was copied from the stablelearner
package and changed for practicality.
library("DALEX") head(titanic_imputed)
Ok, now it's time to create a model. Let's use the Random Forest model.
# prepare model library("ranger") model_titanic_rf <- ranger(survived ~ gender + age + class + embarked + fare + sibsp + parch, data = titanic_imputed, probability = TRUE) model_titanic_rf
The third step (it's optional but useful) is to create a DALEX
explainer for random forest model.
library("DALEX") explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "Random Forest")
Use the feature_importance()
explainer to present importance of particular features. Note that type = "difference"
normalizes dropouts, and now they all start in 0.
library("ingredients") fi_rf <- feature_importance(explain_titanic_rf) head(fi_rf) plot(fi_rf)
As we see the most important feature is gender
. Next three importnat features are class
, age
and fare
. Let's see the link between model response and these features.
Such univariate relation can be calculated with partial_dependence()
.
Kids 5 years old and younger have much higher survival probability.
pp_age <- partial_dependence(explain_titanic_rf, variables = c("age", "fare")) head(pp_age) plot(pp_age)
cp_age <- conditional_dependence(explain_titanic_rf, variables = c("age", "fare")) plot(cp_age)
ap_age <- accumulated_dependence(explain_titanic_rf, variables = c("age", "fare")) plot(ap_age)
Let's see break down explanation for model predictions for 8 years old male from 1st class that embarked from port C.
First Ceteris Paribus Profiles for numerical variables
new_passanger <- data.frame( class = factor("1st", levels = c("1st", "2nd", "3rd", "deck crew", "engineering crew", "restaurant staff", "victualling crew")), gender = factor("male", levels = c("female", "male")), age = 8, sibsp = 0, parch = 0, fare = 72, embarked = factor("Southampton", levels = c("Belfast", "Cherbourg", "Queenstown", "Southampton")) ) sp_rf <- ceteris_paribus(explain_titanic_rf, new_passanger) plot(sp_rf) + show_observations(sp_rf)
And for selected categorical variables. Note, that sibsp is numerical but here is presented as a categorical variable.
plot(sp_rf, variables = c("class", "embarked", "gender", "sibsp"), variable_type = "categorical")
It looks like the most important feature for this passenger is age
and sex
. After all his odds for survival are higher than for the average passenger. Mainly because of the young age and despite of being a male.
passangers <- select_sample(titanic, n = 100) sp_rf <- ceteris_paribus(explain_titanic_rf, passangers) clust_rf <- cluster_profiles(sp_rf, k = 3) head(clust_rf) plot(sp_rf, alpha = 0.1) + show_aggregated_profiles(clust_rf, color = "_label_", size = 2)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.