knitr::opts_chunk$set( collapse = FALSE, comment = "#>", fig.width = 7, fig.height = 3.5, warning = FALSE, message = FALSE )
We adress the problem of insuficient interpretability of explanations for domain experts. We solve this issue by introducing
describe() function, which automaticly generates natural language descriptions of explanations generated with
ingredients package allows for generating prediction validation and predition perturbation explanations. They allow for both global and local model explanation.
decribe() generates a natural language description for explanations generated with
To show generating automatic descriptions we first load the data set and build a random forest model classifying, which of the passangers survived sinking of the titanic. Then, using
DALEX package, we generate an explainer of the model. Lastly we select a random passanger, which prediction's should be explained.
library("DALEX") library("ingredients") library("ranger") model_titanic_rf <- ranger(survived ~ ., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "Random Forest") passanger <- titanic_imputed[sample(nrow(titanic_imputed), 1) ,-8] passanger
Now we are ready for generating various explantions and then describing it with
Feature importance explanation shows the importance of all the model's variables. As it is a global explanation technique, no passanger need to be specified.
importance_rf <- feature_importance(explain_titanic_rf) plot(importance_rf)
describe() easily describes which variables are the most important.
nonsignificance_treshold as always sets the level above which variables become significant. For higher treshold, less variables will be described as significant.
Ceteris Paribus profiles shows how the model's input changes with the change of a specified variable.
perturbed_variable <- "class" cp_rf <- ceteris_paribus(explain_titanic_rf, passanger, variables = perturbed_variable) plot(cp_rf, variable_type = "categorical")
For a user with no experience, interpreting the above plot may be not straightforward. Thus we generate a natural language description in order to make it easier.
Natural lannguage descriptions should be flexible in order to provide the desired level of complexity and specificity. Thus various parameters can modify the description being generated.
describe(cp_rf, display_numbers = TRUE, label = "the probability that the passanger will survive")
Please note, that
describe() can handle only one variable at a time, so it is recommended to specify, which variables should be described.
describe(cp_rf, display_numbers = TRUE, label = "the probability that the passanger will survive", variables = perturbed_variable)
Continuous variables are described as well.
perturbed_variable_continuous <- "age" cp_rf <- ceteris_paribus(explain_titanic_rf, passanger) plot(cp_rf, variables = perturbed_variable_continuous) describe(cp_rf, variables = perturbed_variable_continuous)
Ceteris Paribus profiles are described only for a single observation. If we want to access the influence of more than one observation, we need to describe dependence profiles.
pdp <- aggregate_profiles(cp_rf, type = "partial") plot(pdp, variables = "fare") describe(pdp, variables = "fare")
pdp <- aggregate_profiles(cp_rf, type = "partial", variable_type = "categorical") plot(pdp, variables = perturbed_variable) describe(pdp, variables = perturbed_variable)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.