health.retirement: Health and Retirement Survey
In fairml: Fair Models in Machine Learning

health.retirement

R Documentation

Health and Retirement Survey

Description

The University of Michigan Health and Retirement Study (HRS) longitudinal dataset.

Usage

data(health.retirement)

Format

The data contains 38653 observations and 27 variables.

Note

The data set has been minimally pre-processed: the redundant variables HISPANIC and BITHYR were removed, along with the patient ID PID. A single patient was recorded twice: the duplicate has been removed. However, incomplete observations have been left in the data set.

The number of dependencies in daily activities score is the response (count) variable and marriage, gender, race, race.ethnicity and age are the sensitive attributes. The remaining variables are used as predictors.

The data contain the following variables:

year, the year of retirement as a numeric variable;
age, the age as a numeric variable;
educa, the number of years in education as a numeric variable;
networth, household net worth as a numeric variable;
cognition_catnew cognistion assessment as a numeric variable;
bmi as a numeric variable;
hlthrte, a numeric health rating;
bloodp, blood pressure diagnosis as a numeric variable;
diabetes, diabetes diagnosis as a numeric variable;
cancer, cancer diagnosis as a numeric variable;
lung, lung disease diagnosis as a numeric variable;
heart, heart condition diagnosis as a numeric variable;
stroke, stroke diagnosis as a numeric variable;
pchiat, psychiatric condition diagnosis as a numeric variable;
arthrit, arthritis diagnosis as a numeric variable;
fall, recently falling as a numeric variable;
pain, pain conditions as a numeric variable;
A1c_adj, biomarker for hemoglobin A1C;
CRP_adj, biomarker for C-reactive protein;
CYSC_adj, biomarker for Cystatin C;
HDL_adj, biomarker for HDL cholesterol;
TC_adj, biomarker for total cholesterol;
score, another numeric health rating;
gender, a factor with levels "Female" and "Male";
marriage, a factor with levels "Married/Partner" and "Not Married";
race, a factor withe levels "Black", "Other" and "White";
race.ethnicity, a factor withe levels "Hispanic", "NHB", "NHW" and "Other".

References

https://hrs.isr.umich.edu/about

Examples

data(health.retirement)

# complete data analysis.
health.retirement = health.retirement[complete.cases(health.retirement), ]
# short-hand variable names.
r = health.retirement[, "score"]
s = health.retirement[, c("marriage", "gender", "race", "age")]
p = health.retirement[, setdiff(names(health.retirement), c(names(r), names(s)))]
# drop the second race variable.
p = p[, colnames(p) != "race.ethnicity"]

## Not run: 
# the lambda = 0.1 is very helpful in making model estimation succeed.
m = fgrrm(response = r, sensitive = s, predictors = p, ,
      family = "poisson", unfairness = 0.05, lambda = 0.1)
summary(m)

## End(Not run)

fairml documentation built on June 8, 2025, 11:38 a.m.