| adult | R Documentation |
Predict whether income exceeds $50K per year using the U.S. 1994 Census data.
data(adult)
The data contains 30162 observations and 14 variables. See the UCI Machine Learning Repository for details.
The data set has been pre-processed as in Zafar et al. (2019), with the following exceptions:
the data do not include the test sample from the UCI repository;
the variables "capital_gain" and "capital_loss" have
been scaled by 1/1000.
In that paper, income is the response variable, sex and
race are the sensitive attributes and the remaining variables are
used as predictors.
The data contain the following variables:
age as a numeric variable;
workclass, a factor with 8 levels encoding the type of
employment ("Private", "Self-emp-not-inc",
"Federal-gov", etc.);
education, a factor with 10 levels from "Preschool" to
"Doctorate";
education-num, the number of years in education;
marital-status, a factor with 7 levels from
"Married-civ-spouse" to "Divorced" and
"Never-married";
occupation, a factor with 14 levels encoding the field of
employment ("Tech-support", "Craft-repair", etc.);
relationship a factor with 6 levels ("Wife",
"Own-child", etc.);
race, a factor with levels "White",
"Asian-Pac-Islander", "Amer-Indian-Eskimo", "Other"
and "Black";
sex, a factor with levels "Female" and "Male";
capital-gain as a numeric variable;
capital-loss as a numeric variable;
native-country as a factor with two levels
"United-States" and "Non-United-States";
hours-per-week as a numeric variable.
UCI Machine Learning Repository.
https://archive.ics.uci.edu/ml/datasets/adult
data(adult)
# short-hand variable names.
r = adult[, "income"]
s = adult[, c("sex", "race")]
p = adult[, setdiff(names(adult), c("income", "sex", "race"))]
## Not run:
m = zlrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.