compas | R Documentation |
compas
is a landmark dataset to study algorithmic (un)fairness. This data was used to
predict recidivism (whether a criminal will reoffend or not) in the USA. The tool was meant to overcome
human biases and offer an algorithmic, fair solution to predict recidivism in a diverse population.
However, the algorithm ended up propagating existing social biases and thus, offered an unfair algorithmic
solution to the problem. In this dataset, a model to predict recidivism has already been fit and predicted
probabilities and predicted status (yes/no) for recidivism have been concatenated to the original data.
compas
A data frame with 6172 rows and 9 variables:
factor, yes/no for recidivism or no recidivism. This is the outcome or target in this dataset
numeric, number of priors, normalized to mean = 0 and standard deviation = 1
factor, yes/no for age above 45 years or not
factor, yes/no for age below 25 years or not
factor, female/male for gender
factor, yes/no for having recorded misdemeanor(s) or not
factor, Caucasian, African American, Asian, Hispanic, Native American or Other
numeric, predicted probabilities for recidivism, ranges from 0 to 1
numeric, predicted values for recidivism, 0/1 for no/yes
The dataset is downloaded from Kaggle https://www.kaggle.com/danofer/compass and has undergone modifications (e.g. ethnicity was originally encoded using one-hot encoding, number or priors have been normalized, variables have been renamed, prediction model was fit and predicted probabilities and predicted status were concatenated to the original dataset).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.