knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE )
This vignette serves as an example of training biologial aging measures using the
NHANES III (1988 - 1994) and projecting into NHANES IV (1999 - 2018) dataset. It also provides
documentation for fit parameters contained in the BioAge
package. The
cleaned NHANES dataset is loaded as the dataset NHANES3
and NHANES4
.
The original KDM bioage and phenoage values are saved as kdm0
and
phenoage0
as part of NHANES dataset.
You can install the released version of BioAge from (https://github.com/dayoonkwon/BioAge) with:
library(BioAge) #topic of vignette library(dplyr)
I train in the NHANES III and project biological aging measures into the
NHANES IV by using the hd_nhanes
, kdm_nhanes
, and phenoage_nhanes
function of the BioAge
package.
#specify bioamarkers included in the algorithms biomarkers = c("albumin","alp","lymph","mcv","lncreat","lncrp","hba1c","wbc","rdw") #HD using NHANES (separate training for men and women) hd = hd_nhanes(biomarkers) #KDM bioage using NHANES (separate training for men and women) kdm = kdm_nhanes(biomarkers) #phenoage uinsg NHANES phenoage = phenoage_nhanes(biomarkers)
The projected data and estimated models are saved as part of the
list structure. The dataset can be drawn by typing data
. The model can be drawn by typing fit
.
#assemble NHANES IV dataset with projected biological aging measures for analysis data = merge(hd$data, kdm$data) %>% merge(., phenoage$data)
In the figure below, the graphs titled "KDM Biological Age" and "Levine Phenotypic Age" show measures based on the original biomarker sets published in Levine 2013 J Geron A and Levine et al. 2018 AGING. The remaining graphs shows the new measures computed with the biomarker set specified within this code.
#select biological age variables agevar = c("kdm0","phenoage0","kdm","phenoage","hd","hd_log") #prepare labels label = c("KDM\nBiological Age", "Levine\nPhenotypic Age", "Modified-KDM\nBiological Age", "Modified-Levine\nPhenotypic Age", "Homeostatic\nDysregulation", "Log\nHomeostatic\nDysregulation") #plot age vs bioage plot_ba(data, agevar, label)
The figure plots associations among the different biological aging measures. Cells below the diagonal show scatter plots of the measures listed above the cell (x-axis) and to the right (y-axis). Cells above the diagonal show the Pearson correlations for the measures listed below the cell and to the left. For this analysis, KDM Biological Age and Levine Phenotypic Age measures are differenced from chronological age (i.e. plotted values = BA-CA).
#select biological age variables agevar = c("kdm_advance0","phenoage_advance0","kdm_advance","phenoage_advance","hd","hd_log") #prepare lables #values should be formatted for displaying along diagonal of the plot #names should be used to match variables and order is preserved label = c( "kdm_advance0"="KDM\nBiological Age\nAdvancement", "phenoage_advance0"="Levine\nPhenotypic Age\nAdvancement", "kdm_advance"="Modified-KDM\nBiological Age\nAdvancement", "phenoage_advance"="Modified-Levine\nPhenotypic Age\nAdvancement", "hd" = "Homeostatic\nDysregulation", "hd_log" = "Log\nHomeostatic\nDysregulation") #use variable name to define the axis type ("int" or "float") axis_type = c( "kdm_advance0"="float", "phenoage_advance0"="float", "kdm_advance"="float", "phenoage_advance"="flot", "hd"="flot", "hd_log"="float") #plot BAA corplot plot_baa(data,agevar,label,axis_type)
table_surv(data, agevar, label)
The linear regression models and sample sizes in “Table 2” and “Table 3” below are saved as part of the list structure.
Regression model can be drawn by typing table
.
Sample size can be drawn by typing n
.
table2 = table_health(data,agevar,outcome = c("health","adl","lnwalk","grip_scaled"), label) #pull table table2$table #pull sample sizes table2$n
table3 = table_ses(data,agevar,exposure = c("edu","annual_income","poverty_ratio"), label) #pull table table3$table #pull sample sizes table3$n
In this example, the projection dataset is the 2016 Venous Blood Study dataset from the US Health and Retirement Study (data are not included in the package but are available from the HRS). For this analysis, HRS data were previously cleaned and units of measure and variable names were harmonized to match the NHANES data included with the package. All algorithms were trained using the NHANES III data and projected into the HRS using the hd_calc
, kdm_calc
, and phenoage_calc
functions of the BioAge
package.
#The HRS dataset is loaded from my local drive that has previously been downloaded and cleaned newdata = HRS %>% select(hhidpn, sex, age, raracem, albumin, alp, lymphpct, mcv, creat, lncreat, crp, lncrp, hba1c, wbc, rdw, glucose) %>% rename(sampleID = hhidpn, gender = sex, race = raracem, lymph = lymphpct) %>% mutate(gender = ifelse(gender == "Women", 2, ifelse(gender == "Men", 1, NA)), albumin_gL = albumin * 10, creat_umol = creat * 88.4017, lncreat_umol = log(creat_umol), crp = crp / 10, lncrp = log(crp), glucose_mmol = glucose*0.0555) %>% group_by(gender) %>% mutate_at(vars(albumin:creat,crp,hba1c:glucose), list(~ifelse((. > (mean(., na.rm = TRUE) + 5 * sd(., na.rm = TRUE)))| (. < (mean(., na.rm = TRUE) - 5 * sd(., na.rm = TRUE))), NA, .))) %>% ungroup() %>% mutate(lncreat = ifelse(is.na(creat), NA, lncreat), lncrp = ifelse(is.na(crp), NA, lncrp), albumin_gL = ifelse(is.na(albumin), NA, albumin_gL), creat_umol = ifelse(is.na(creat), NA, creat_umol), lncreat_umol = ifelse(is.na(creat), NA, lncreat_umol), glucose_mmol = ifelse(is.na(glucose), NA, glucose_mmol))
For HD, the constructed variable is based on a malhanobis distance statistic, which is theoretically the distance between observations and a hypothetically healthy, young cohort. In this example, I train separately for men and women who are between the ages of 20 and 30 and not pregnant, and have observe biomarker data within clinically accpetable distributions. For clinical guidelines, I relied upon the ranges reported by the Mayo Clinic website.
#projecting HD into the HRS using NHANES III (seperate training for gender) hd_fem = hd_calc(data = newdata %>% filter(gender == 2), reference = NHANES3 %>% filter(gender == 2 & age >= 20 & age <= 30 & pregnant == 0) %>% mutate(albumin = ifelse(albumin >= 3.5 & albumin <= 5, albumin, NA), alp = ifelse(alp >= 37 & alp <= 98, alp, NA), lymph = ifelse(lymph >= 20 & lymph <= 40, lymph, NA), mcv = ifelse(mcv >= 78 & mcv <= 101, mcv, NA), creat = ifelse(creat >= 0.6 & creat <= 1.1, creat, NA), lncreat = ifelse(is.na(creat), NA, lncreat), crp = ifelse(crp < 2, crp, NA), lncrp = ifelse(is.na(crp), NA, lncrp), hba1c = ifelse(hba1c >= 4 & hba1c <= 5.6, hba1c, NA), wbc = ifelse(wbc >= 4.5 & wbc <= 11, wbc, NA), rdw = ifelse(rdw >= 11.5 & rdw <= 14.5, rdw, NA)), biomarkers) hd_male = hd_calc(data = newdata %>% filter(gender == 1), reference = NHANES3 %>% filter(gender == 1 & age >= 20 & age <= 30 & pregnant == 0) %>% mutate(albumin = ifelse(albumin >= 3.5 & albumin <= 5, albumin, NA), alp = ifelse(alp >= 45 & alp <= 115, alp, NA), lymph = ifelse(lymph >= 20 & lymph <= 40, lymph, NA), mcv = ifelse(mcv >= 82 & mcv <= 102, mcv, NA), creat = ifelse(creat >= 0.8 & creat <= 1.3, creat, NA), lncreat = ifelse(is.na(creat), NA, lncreat), crp = ifelse(crp < 2, crp, NA), lncrp = ifelse(is.na(crp), NA, lncrp), hba1c = ifelse(hba1c >= 4 & hba1c <= 5.6, hba1c, NA), wbc = ifelse(wbc >= 4.5 & wbc <= 11, wbc, NA), rdw = ifelse(rdw >= 11.5 & rdw <= 14.5, rdw, NA)), biomarkers) #pull the HD dataset hd_data = rbind(hd_fem$data, hd_male$data)
Having estimated biological aging models using NHANES III in "Step 1", I can project KDM bioage and phenoage into the HRS data by running kdm_calc
and phenoage_calc
and supplying a fit
argument.
#projecting KDM bioage into the HRS using NHANES III (seperate training for gender) kdm_fem = kdm_calc(data = newdata %>% filter (gender ==2), biomarkers, fit = kdm$fit$female, s_ba2 = kdm$fit$female$s_b2) kdm_male = kdm_calc(data = newdata %>% filter (gender ==1), biomarkers, fit = kdm$fit$male, s_ba2 = kdm$fit$male$s_b2) #pull the KDM dataset kdm_data = rbind(kdm_fem$data, kdm_male$data)
phenoage_hrs = phenoage_calc(data = newdata %>% mutate(albumin = albumin_gL, lncreat = lncreat_umol), biomarkers, fit = phenoage$fit, orig = TRUE) #this calculate original phenoage phenoage_data = phenoage_hrs$data #pull the full dataset newdata = left_join(newdata, hd_data[, c("sampleID", "hd", "hd_log")], by = "sampleID") %>% left_join(., kdm_data[, c("sampleID", "kdm", "kdm_advance")], by = "sampleID") %>% left_join(., phenoage_data[, c("sampleID","phenoage0","phenoage_advance0", "phenoage","phenoage_advance")], by = "sampleID")
summary(newdata %>% select(phenoage0, kdm, phenoage, hd, hd_log))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.