adult | R Documentation |
Predict whether income exceeds $50K per year using the U.S. 1994 Census data.
data(adult)
The data contains 30162 observations and 14 variables. See the UCI Machine Learning Repository for details.
The data set has been pre-processed as in Zafar et al. (2019), with the following exceptions:
the data do not include the test sample from the UCI repository;
the variables "capital_gain"
and "capital_loss"
have
been scaled by 1/1000
.
In that paper, income
is the response variable, sex
and
race
are the sensitive attributes and the remaining variables are
used as predictors.
The data contain the following variables:
age
as a numeric variable;
workclass
, a factor with 8 levels encoding the type of
employment ("Private"
, "Self-emp-not-inc"
,
"Federal-gov"
, etc.);
education
, a factor with 10 levels from "Preschool"
to
"Doctorate"
;
education-num
, the number of years in education;
marital-status
, a factor with 7 levels from
"Married-civ-spouse"
to "Divorced"
and
"Never-married"
;
occupation
, a factor with 14 levels encoding the field of
employment ("Tech-support"
, "Craft-repair"
, etc.);
relationship
a factor with 6 levels ("Wife"
,
"Own-child"
, etc.);
race
, a factor with levels "White"
,
"Asian-Pac-Islander"
, "Amer-Indian-Eskimo"
, "Other"
and "Black"
;
sex
, a factor with levels "Female"
and "Male"
;
capital-gain
as a numeric variable;
capital-loss
as a numeric variable;
native-country
as a factor with two levels
"United-States"
and "Non-United-States"
;
hours-per-week
as a numeric variable.
UCI Machine Learning Repository.
https://archive.ics.uci.edu/ml/datasets/adult
data(adult)
# short-hand variable names.
r = adult[, "income"]
s = adult[, c("sex", "race")]
p = adult[, setdiff(names(adult), c("income", "sex", "race"))]
## Not run:
m = zlrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.