national.longitudinal.survey: Income and Labour Market Activities
In fairml: Fair Models in Machine Learning

national.longitudinal.survey

R Documentation

Income and Labour Market Activities

Description

Survey results from the U.S. Bureau of Labor Statistics to gather information on the labour market activities and other life events of several groups.

Usage

data(national.longitudinal.survey)

Format

The data contains 4908 observations and the following variables:

age, a numeric variable containing the interviewee's age in years;
race, a factor with 20 levels denoting various racial/ethnic origins;
gender, a factor with levels "Male" and "Female".
grade90, a factor containing the highest completed school grade from "3RD GRADE" to "8TH YR COL OR MORE", with 18 levels;
income06, a numeric variable, income in 2006 in 10000-USD units;
income96, a numeric variable, income in 1996 in 10000-USD units;
income90, a numeric variable, income in 1990 in 10000-USD units;
partner, a factor encoding whether the interviewee has a partner, with levels "No" and "Yes";
height, a numeric variable, the height of the interviewee;
weight, a numeric variable, the weight of the interviewee;
famsize, a numeric variable, the number of family members;
genhealth, a factor with levels "Excellent", "Very Good", "Good", "Fair", "Poor" encoding the general health status of the interviewee;
illegalact, a numeric variable containing the number of illegal acts committed by the interviewee;
charged, a numeric variable containing the number of illegal acts for which the interviewee has been charged;
jobsnum90, a numeric value, the number of different jobs ever reported;
afqt89, a numeric value, the percentile score of the "Profiles, Armed Forces Qualification Test" (AFQT);
typejob90, a factor with 13 levels encoding different job types;
jobtrain90, a factor with levels "No" and "Yes" encoding whether the job was classified as training.

Note

The data set has been pre-processed differently from Komiyama et al. (2018). In particular:

the variables income96 and income06 have been retained as alternative responses;
the variables height, weight, race, partner and famsize have been retained;
the variables grade90 and genhealth are coded as ordered factors because they do not make sense on a numeric scale.

In that paper, income90 is the response variable, gender and age are the sensitive attributes.

References

U.S. Bureau of Labor Statistics.
https://www.bls.gov/nls/

Examples

data(national.longitudinal.survey)

# short-hand variable names.
nn = national.longitudinal.survey
# remove alternative response variables.
nn = nn[, setdiff(names(nn), c("income96", "income06"))]
# short-hand variable names.
r = nn[, "income90"]
s = nn[, c("gender", "age")]
p = nn[, setdiff(names(nn), c("income90", "gender", "age"))]

m = nclm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)

m = frrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)

fairml documentation built on June 8, 2025, 11:38 a.m.