compas | R Documentation |
A collection of criminal offenders screened in Florida (US) during 2013-14.
data(compas)
The data contains 5855 observations and the following variables:
age
, a continuous variable containing the age (in years) of the
person;
juv_fel_count
, a continuous variable containing the number of
juvenile felonies;
decile_score
, a continuous variable, the decile of the COMPAS
score;
juv_misd_count
, a continuous variable containing the number of
juvenile misdemeanors;
juv_other_count
, a continuous variable containing the number
of prior juvenile convictions that are not considered either felonies or
misdemeanors;
v_decile_score
, a continuous variable containing the predicted
decile of the COMPAS score;
priors_count
, a continuous variable containing the number of
prior crimes committed;
sex
, a factor with levels "Female"
and "Male"
;
two_year_recid
, a factor with two levels "Yes"
and
"No"
(if the person has recidivated within two years);
race
, a factor encoding the race of the person;
c_jail_in
, a numeric variable containing the date in which the
person entered jail (normalized between 0 and 1);
c_jail_out
, a numeric variable containing the date in which the
person was released from jail (normalized between 0 and 1);
c_offense_date
, a numeric variable containing the date the
offense was committed;
screening_date
, a numeric variable containing the date in which
the person was screened (normalized between 0 and 1);
in_custody
, a numeric variable containing the date in which the
person was placed in custody (normalized between 0 and 1);
out_custody
, a numeric variable containing the date in which
the person was released from custody (normalized between 0 and 1);
The data set has been pre-processed as in Komiyama et al. (2018), with the following exceptions:
the race
variable has not been reduced to a binary factor with
levels "African-American"
and "not African-American"
;
the variables type_of_assessment
, v_type_of_assessment
have been dropped from the analysis because they take the same value for
all observations;
variables like c_jail_in
and c_jail_out
that encode
dates have been jointly rescaled to preserve the temporal ordering of
events.
In that paper, two_year_recid
is the response variable, sex
and
race
are the sensitive attributes and the remaining variables are
used as predictors.
Angwin J, Larson J, Mattu S, Kirchner L (2016). "Machine Bias: Theres Software
Used Around the Country to Predict Future Criminals."
https://www.propublica.org
data(compas)
# convert the response back to a numeric variable.
compas$two_year_recid = as.numeric(compas$two_year_recid) - 1
# short-hand variable names.
r = compas[, "two_year_recid"]
s = compas[, c("sex", "race")]
p = compas[, setdiff(names(compas), c("two_year_recid", "sex", "race"))]
m = nclm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)
m = frrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.