knitr::opts_chunk$set( collapse = FALSE, comment = "#>", warning = FALSE, message = FALSE )
Let's see an example for
DALEX package for classification models for the survival problem for Titanic dataset.
Here we are using a dataset
titanic_imputed avaliable in the
DALEX package. Note that this data was copied from the
stablelearner package and changed for practicality.
Ok, now it's time to create a model. Let's use the Random Forest model.
# prepare model library("ranger") model_titanic_rf <- ranger(survived ~ gender + age + class + embarked + fare + sibsp + parch, data = titanic_imputed, probability = TRUE) model_titanic_rf
The third step (it's optional but useful) is to create a
DALEX explainer for random forest model.
library("DALEX") explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "Random Forest")
feature_importance() explainer to present importance of particular features. Note that
type = "difference" normalizes dropouts, and now they all start in 0.
library("ingredients") fi_rf <- feature_importance(explain_titanic_rf) head(fi_rf) plot(fi_rf)
As we see the most important feature is
gender. Next three importnat features are
fare. Let's see the link between model response and these features.
Such univariate relation can be calculated with
Kids 5 years old and younger have much higher survival probability.
pp_age <- partial_dependence(explain_titanic_rf, variables = c("age", "fare")) head(pp_age) plot(pp_age)
cp_age <- conditional_dependence(explain_titanic_rf, variables = c("age", "fare")) plot(cp_age)
ap_age <- accumulated_dependence(explain_titanic_rf, variables = c("age", "fare")) plot(ap_age)
Let's see break down explanation for model predictions for 8 years old male from 1st class that embarked from port C.
First Ceteris Paribus Profiles for numerical variables
new_passanger <- data.frame( class = factor("1st", levels = c("1st", "2nd", "3rd", "deck crew", "engineering crew", "restaurant staff", "victualling crew")), gender = factor("male", levels = c("female", "male")), age = 8, sibsp = 0, parch = 0, fare = 72, embarked = factor("Southampton", levels = c("Belfast", "Cherbourg", "Queenstown", "Southampton")) ) sp_rf <- ceteris_paribus(explain_titanic_rf, new_passanger) plot(sp_rf) + show_observations(sp_rf)
And for selected categorical variables. Note, that sibsp is numerical but here is presented as a categorical variable.
plot(sp_rf, variables = c("class", "embarked", "gender", "sibsp"), variable_type = "categorical")
It looks like the most important feature for this passenger is
sex. After all his odds for survival are higher than for the average passenger. Mainly because of the young age and despite of being a male.
passangers <- select_sample(titanic, n = 100) sp_rf <- ceteris_paribus(explain_titanic_rf, passangers) clust_rf <- cluster_profiles(sp_rf, k = 3) head(clust_rf) plot(sp_rf, alpha = 0.1) + show_aggregated_profiles(clust_rf, color = "_label_", size = 2)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.