compas: Modified COMPAS dataset

Description Usage Format Source

Description

compas is a landmark dataset to study algorithmic (un)fairness. This data was used to predict recidivism (whether a criminal will reoffend or not) in the USA. The tool was meant to overcome human biases and offer an algorithmic, fair solution to predict recidivism in a diverse population. However, the algorithm ended up propagating existing social biases and thus, offered an unfair algorithmic solution to the problem. In this dataset, a model to predict recidivism has already been fit and predicted probabilities and predicted status (yes/no) for recidivism have been concatenated to the original data.

Usage

1

Format

A data frame with 6172 rows and 9 variables:

Two_yr_Recidivism

factor, yes/no for recidivism or no recidivism. This is the outcome or target in this dataset

Number_of_Priors

numeric, number of priors, normalized to mean = 0 and standard deviation = 1

Age_Above_FourtyFive

factor, yes/no for age above 45 years or not

Age_Below_TwentyFive

factor, yes/no for age below 25 years or not

Female

factor, female/male for gender

Misdemeanor

factor, yes/no for having recorded misdemeanor(s) or not

ethnicity

factor, Caucasian, African American, Asian, Hispanic, Native American or Other

probability

numeric, predicted probabilities for recidivism, ranges from 0 to 1

predicted

numeric, predicted values for recidivism, 0/1 for no/yes

Source

The dataset is downloaded from Kaggle https://www.kaggle.com/danofer/compass and has undergone modifications (e.g. ethnicity was originally encoded using one-hot encoding, number or priors have been normalized, variables have been renamed, prediction model was fit and predicted probabilities and predicted status were concatenated to the original dataset).


fairness documentation built on April 14, 2021, 5:09 p.m.