compas: Modified COMPAS dataset
In fairness: Algorithmic Fairness Metrics

compas is a landmark dataset to study algorithmic (un)fairness. This data was used to predict recidivism (whether a criminal will reoffend or not) in the USA. The tool was meant to overcome human biases and offer an algorithmic, fair solution to predict recidivism in a diverse population. However, the algorithm ended up propagating existing social biases and thus, offered an unfair algorithmic solution to the problem. In this dataset, a model to predict recidivism has already been fit and predicted probabilities and predicted status (yes/no) for recidivism have been concatenated to the original data.

compas

A data frame with 6172 rows and 9 variables:

Two_yr_Recidivism: factor, yes/no for recidivism or no recidivism. This is the outcome or target in this dataset
Number_of_Priors: numeric, number of priors, normalized to mean = 0 and standard deviation = 1
Age_Above_FourtyFive: factor, yes/no for age above 45 years or not
Age_Below_TwentyFive: factor, yes/no for age below 25 years or not
Female: factor, female/male for gender
Misdemeanor: factor, yes/no for having recorded misdemeanor(s) or not
ethnicity: factor, Caucasian, African American, Asian, Hispanic, Native American or Other
probability: numeric, predicted probabilities for recidivism, ranges from 0 to 1
predicted: numeric, predicted values for recidivism, 0/1 for no/yes

The dataset is downloaded from Kaggle https://www.kaggle.com/danofer/compass and has undergone modifications (e.g. ethnicity was originally encoded using one-hot encoding, number or priors have been normalized, variables have been renamed, prediction model was fit and predicted probabilities and predicted status were concatenated to the original dataset).

fairness documentation built on April 14, 2021, 5:09 p.m.