national.longitudinal.survey | R Documentation |
Survey results from the U.S. Bureau of Labor Statistics to gather information on the labour market activities and other life events of several groups.
data(national.longitudinal.survey)
The data contains 4908 observations and the following variables:
age
, a numeric variable containing the interviewee's age in
years;
race
, a factor with 20 levels denoting various racial/ethnic
origins;
gender
, a factor with levels "Male"
and "Female"
.
grade90
, a factor containing the highest completed school
grade from "3RD GRADE" to "8TH YR COL OR MORE", with 18 levels;
income06
, a numeric variable, income in 2006 in 10000-USD
units;
income96
, a numeric variable, income in 1996 in 10000-USD
units;
income90
, a numeric variable, income in 1990 in 10000-USD
units;
partner
, a factor encoding whether the interviewee has a
partner, with levels "No"
and "Yes"
;
height
, a numeric variable, the height of the interviewee;
weight
, a numeric variable, the weight of the interviewee;
famsize
, a numeric variable, the number of family members;
genhealth
, a factor with levels "Excellent"
,
"Very Good"
, "Good"
, "Fair"
, "Poor"
encoding
the general health status of the interviewee;
illegalact
, a numeric variable containing the number of illegal
acts committed by the interviewee;
charged
, a numeric variable containing the number of illegal
acts for which the interviewee has been charged;
jobsnum90
, a numeric value, the number of different jobs ever
reported;
afqt89
, a numeric value, the percentile score of the "Profiles,
Armed Forces Qualification Test" (AFQT);
typejob90
, a factor with 13 levels encoding different job
types;
jobtrain90
, a factor with levels "No"
and "Yes"
encoding whether the job was classified as training.
The data set has been pre-processed differently from Komiyama et al. (2018). In particular:
the variables income96
and income06
have been retained
as alternative responses;
the variables height
, weight
, race
,
partner
and famsize
have been retained;
the variables grade90
and genhealth
are coded as ordered
factors because they do not make sense on a numeric scale.
In that paper, income90
is the response variable, gender
and
age
are the sensitive attributes.
U.S. Bureau of Labor Statistics.
https://www.bls.gov/nls/
data(national.longitudinal.survey)
# short-hand variable names.
nn = national.longitudinal.survey
# remove alternative response variables.
nn = nn[, setdiff(names(nn), c("income96", "income06"))]
# short-hand variable names.
r = nn[, "income90"]
s = nn[, c("gender", "age")]
p = nn[, setdiff(names(nn), c("income90", "gender", "age"))]
m = nclm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)
m = frrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.