In this document, we explain how to use FairnessTest
for an instance and multiple instances.
knitr::opts_chunk$set( collapse = TRUE, fig.width = 7, fig.height = 4 ) options(width = 130)
We install the development version from GitHub with:
``` {r, results='hide', message=FALSE}
devtools::install_github("RifatMehreen/moccf")
```r library(dplyr) library(plyr) library(tidyverse) library(Rtsne) library(mlr3pipelines) library(mlr3learners) options(rgl.useNULL = TRUE) library(rgl) library(Rmpfr) library(checkmate) library(R6) library(paradox) library(data.table) library(miesmuschel) library(fairml) library(counterfactuals) library(randomForest) library(iml) library(ggforce) library(moccf)
To demonstrate the FairnessTest
workflow for classification task, we test fairness of a classification model for UCI adult dataset.
As training data we use the UCI adult dataset from the fairml
package.
The data set contains 30162 observations with 14 features and the binary target variable income
:
column_descr = data.frame( rbind( cbind("age", "Age (years)"), cbind("workclass", "Working class level"), cbind("education", "Education level"), cbind("education_num", "Education (years)"), cbind("marital_status", "Marital status of an individual"), cbind("occupation", "Occupation"), cbind("relationship", "Relationship the person holds"), cbind("race", "Racial indentity"), cbind("sex", "Gender"), cbind("capital_gain", "Capital gain"), cbind("capital_loss", "Capital loss"), cbind("hours_per_week", "Working hours per week"), cbind("native_country", "Country of origin"), cbind("income", "Income level") ) ) names(column_descr) <- c("Variable", "Description") knitr::kable(column_descr, escape = FALSE, format = "html", table.attr = "style='width:100%;'")
data(adult, package = "fairml")
Pre-processing our dataset by dropping the missing values and deleting the duplicates.
adult <- adult %>% drop_na() adult <- adult %>% distinct()
First we train a model to predict income
. Note that we leave out one observation from the
training data which is our x_interest
.
set.seed(142) rf = randomForest(income ~ ., data = adult[-91L, ])
An iml::Predictor
object serves as a wrapper for different model types. It contains the model and the data for its analysis.
predictor = Predictor$new(rf, type = "prob")
For x_interest
the model predicts:
x_interest = adult[91L, ] predictor$predict(x_interest)
Now we want to examine whether our model predicts similar for the counterfactuls who have different level of protected status sex = Male
.
Since we want to test the fairness of the classification model, we initialize a FairnessTest
object.
fairness_object = FairnessTest$new(predictor, df = adult, sensitive_attribute = "sex", n_generations = 175)
if (!file.exists("how-to-use-FairnessTest-res/cfactuals.RDS")) { fairness_object = moccf::FairnessTest$new(predictor, df = adult, sensitive_attribute = "sex", n_generations = 175) saveRDS(fairness_object, "how-to-use-FairnessTest-res/fairness_object.RDS") } fairness_object = readRDS("how-to-use-FairnessTest-res/fairness_object.RDS")
The resulting fairness_object
holds the found counterfactuals and has several methods for the fairness testing and
visualization.
class(fairness_object)
print(fairness_object)
The counterfactuals are generated by using $generate_counterfactuals()
method.
cfactuals = fairness_object$generate_counterfactuals(x_interest, desired_level = "Male", desired_prob = c(0.5,1), fixed_features = "race")
cfactuals
The $get_cfactuals_count()
function gives us the number of generated plausible counterfactuals.
fairness_object$get_cfactuals_count()
We can use the $get_prediction_difference()
method to find differences of predictions of the x_interest
and the cfactuals
.
fairness_object$get_prediction_difference(x_interest)
The $get_mpd()
method returns the the mean of the prediction differences
fairness_object$get_mpd()
The $prediction_percentages()
method provides the percentage of the predictions of each classes of the generated counterfactuals.
fairness_object$prediction_percentages(x_interest)
It is also possible to plot the distribution of the data instances, x_interest
and the generated plausible cfactuals
. The black circled point in the plot represents x_interest
.
fairness_object$plot_tSNE(x_interest, factor_variable = "race")
We can now test the unfairness for the model for more than one instance. Here, 2 instances are used for the test.
adult_sample = adult[c(91,95), ] result = fairnesstest_moc(adult, adult_sample, "income", sen_attribute = "sex", desired_level = "Male", fixed_features = "race", n_generation = 175, desired_prob = 1, model = "randomForest")
result[[1]] result[[2]]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.