This vignette (1) uses the NHANES package to download publicly curated data with biomarkers; and (2) splits the data in half for a training and validation set. Once the data is prepared, it uses the bioage package to (1) train biological age on the training subset; and (2) uses the estimated fit to calculate biological age for the test data.
This vignette uses the package nhanesA (not developed or associated with the bioage package) to draw publicly available data collected by the US Governemnt's National Health and Nutrition Survey (NHANES).
As noted in the code comments, this vignette uses the unassociated (but excellent) library, "nhanesA".
#load library to pull nhanes data #nhanesA is a nonaffiliated library, but is connected to a #commonly used repository that contains #biomarkers library(nhanesA) #draw standard biochemestry profile from NHANES 2001-2002 #the excelelnet nhanesA library has information on how to #select and identify variables nhdata = nhanes('L40_B') #merge this with demographic data, which includes age #SEQN is the individual identifier for nhanes data nhdata = merge(nhdata,nhanes('DEMO_B'),by='SEQN') #show data print(head(nhdata[,c('LBXSAL','LBDSAPSI','LBXSBU', 'LBXSCH','LBDSCR', #biomarkers 'RIDAGEYR' #age (in years) )]))
In this section I split the 2002 NHANES data to illustrate both how to train data, and how to use estimated parameters to generate out-of-sample biological age values.
#select the first 1/2 of the data train = nhdata[1:ceiling(nrow(nhdata)/2),] #put the rest of the data in validation dataset validate = nhdata[!nhdata$SEQN %in% train$SEQN,]
To train and validate the data, I use 5 biomarkers as oultined below:
|Variable Name|Description| |:------------|:----------| |LBXSAL|Albumin (g/dL)| |LBDSAPSI|Alkaline Phosphate (U/L)| |LBXSBU|Blood Urea Nitrogen (mg/dL)| |LBXSCH|Cholesterol (mg/dL)| |LBDSCR|Creatnine (mg/dL)|
Note: While this example includes individuals of all ages and pregnant women, this approach will likely give poor results for biological age. Researchers should therefore exclude these individuals in their analysis.
library(bioage) kdmtrain = kdm_calc(data=train, agevar='RIDAGEYR', biomarkers=c('LBXSAL','LBDSAPSI','LBXSBU', 'LBXSCH','LBDSCR') )
With the trained data, we can export the training parameters as a dataset, or extract the trained dataset.
The biological age variable is named as the concatenation of agevar with the prefix of 'bio'.
Since the agevar is "RIDAGEYR", the calculated biological age is 'bioRIDAGEYR'. Here is a peak at the data with the biological age variable:
newdata = extract_data(kdmtrain) print(head(newdata[,c('LBXSAL','LBDSAPSI','LBXSBU', 'LBXSCH','LBDSCR', #biomarkers 'RIDAGEYR', #age (in years) 'bioRIDAGEYR' #biological age )]))
We can also extract and print the parameters for the KDM algorithm. These parameters have special meanings relative to the algorithm as explained in papers discussing biological aging.
params = extract_fit(kdmtrain) print(params)
Finall, we can use the trained parameters to estimate biological ages for out of sample data. For this, the kdm_calc fucntion needs to be given training data, as follows:
kdmvalidate = kdm_calc(data=validate, agevar='RIDAGEYR', biomarkers=c('LBXSAL','LBDSAPSI','LBXSBU', 'LBXSCH','LBDSCR'), fit=kdmtrain$fit #this argument supplies the fit ) #the same functions apply as to the trianing data new_validate = extract_data(kdmvalidate) print(head(new_validate[,c('LBXSAL','LBDSAPSI','LBXSBU', 'LBXSCH','LBDSCR', #biomarkers 'RIDAGEYR', #age (in years) 'bioRIDAGEYR' #biological age )]))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.