Ring et al. (2017) HTTK-Pop: Generating subpopulations"
In httk: High-Throughput Toxicokinetics

Please send questions to ring.caroline@epa.gov

from "Identifying populations sensitive to environmental chemicals by simulating toxicokinetic variability"

Caroline L.Ring, Robert G.Pearce, R. Woodrow Setzer, Barbara A.Wetmore, and John F.Wambaugh

Environment International, Volume 106, September 2017, Pages 105-118

https://doi.org/10.1016/j.envint.2017.06.004

Abstract

The thousands of chemicals present in the environment (USGAO, 2013) must be triaged to identify priority chemicals for human health risk research. Most chemicals have little of the toxicokinetic (TK) data that are necessary for relating exposures to tissue concentrations that are believed to be toxic. Ongoing efforts have collected limited, in vitro TK data for a few hundred chemicals. These data have been combined with biomonitoring data to estimate an approximate margin between potential hazard and exposure. The most “at risk” 95th percentile of adults have been identified from simulated populations that are generated either using standard “average” adult human parameters or very specific cohorts such as Northern Europeans. To better reflect the modern U.S. population, we developed a population simulation using physiologies based on distributions of demographic and anthropometric quantities from the most recent U.S. Centers for Disease Control and Prevention National Health and Nutrition Examination Survey (NHANES) data. This allowed incorporation of inter-individual variability, including variability across relevant demographic subgroups. Variability was analyzed with a Monte Carlo approach that accounted for the correlation structure in physiological parameters. To identify portions of the U.S. population that are more at risk for specific chemicals, physiologic variability was incorporated within an open-source high-throughput (HT) TK modeling framework. We prioritized 50 chemicals based on estimates of both potential hazard and exposure. Potential hazard was estimated from in vitro HT screening assays (i.e., the Tox21 and ToxCast programs). Bioactive in vitro concentrations were extrapolated to doses that produce equivalent concentrations in body tissues using a reverse dosimetry approach in which generic TK models are parameterized with: 1) chemical-specific parameters derived from in vitro measurements and predicted from chemical structure; and 2) with physiological parameters for a virtual population. For risk-based prioritization of chemicals, predicted bioactive equivalent doses were compared to demographic-specific inferences of exposure rates that were based on NHANES urinary analyte biomonitoring data. The inclusion of NHANES-derived inter-individual variability decreased predicted bioactive equivalent doses by 12% on average for the total population when compared to previous methods. However, for some combinations of chemical and demographic groups the margin was reduced by as much as three quarters. This TK modeling framework allows targeted risk prioritization of chemicals for demographic groups of interest, including potentially sensitive life stages and subpopulations.

This vignette provides the code used to generate the virtual populations for the ten subpopulations of interest, plus a non-obese adult subpopulation.

HTTK Version

This vignette was created with httk v1.5. Although we attempt to maintain backward compatibility, if you encounter issues with the latest release of httk and cannot easily address the changes, historical versions of httk are available from: https://cran.r-project.org/src/contrib/Archive/httk/

Prepare for session

R package knitr generates html and PDF documents from this RMarkdown file, Each bit of code that follows is known as a "chunk". We start by telling knitr how we want our chunks to look.

knitr::opts_chunk$set(collapse = TRUE, comment = '#>')

Clear the memory

It is a bad idea to let variables and other information from previous R sessions float around, so we first remove everything in the R memory.

rm(list=ls())

eval = execute.vignette

If you are using the RMarkdown version of this vignette (extension, .RMD) you will be able to see that several chunks of code in this vignette have the statement "eval = execute.vignette". The next chunk of code, by default, sets execute.vignette = FALSE. This means that the code is included (and necessary) but was not run when the vignette was built. We do this because some steps require extensive computing time and the checks on CRAN limit how long we can spend building the package. If you want this vignette to work, you must run all code, either by cutting and pasting it into R. Or, if viewing the .RMD file, you can either change execute.vignette to TRUE or press "play" (the green arrow) on each chunk in RStudio.

# Set whether or not the following chunks will be executed (run):
execute.vignette <- FALSE

Load the relevant libraries

To use the code in this vignette, you'll first need to load a few packages. If you get the message "Error in library(X) : there is no package called 'X'" then you will need to install that package:

From the R command prompt:

install.packages("X")

Or, if using RStudio, look for 'Install Packages' under 'Tools' tab.

library("httk")
library("data.table")

Because we'll be writing some files (and then reading them in other vignettes), be aware of where your current working directory is -- the files will be written there.

Set up subpopulation specs

Here, we set the number of individuals in each virtual population (1000). We also specify a list of names for the virtual populations. Then we specify corresponding lists of gender, age limits, and BMI categories. HTTK-Pop default values will be used for other population specifications (e.g. race/ethnicity, kidney function categories).

nsamp<-1000
#List subpop names
ExpoCast.group<-list("Total",
                     "Age.6.11",
                     "Age.12.19",
                     "Age.20.65",
                     "Age.GT65",
                     "BMIgt30",
                     "BMIle30",
                     "Females",
                     "Males",
                     "ReproAgeFemale",
                     "Age.20.50.nonobese")
#List subpop gender specifications
gendernum <- c(rep(list(NULL),7), 
               list(list(Male=0, Female=1000)), 
               list(list(Male=1000, Female=0)), 
               list(list(Male=0, Female=1000)), 
               list(NULL))
#List subpop age limits in years
agelim<-c(list(c(0,79),
               c(6,11),
               c(12,19),
               c(20,65),
               c(66,79)),
          rep(list(c(0,79)),4),
          list(c(16,49)),
          list(c(20,50)))
#List subpop weight categories
bmi_category <- c(rep(list(c('Underweight', 
                             'Normal',
                             'Overweight',
                             'Obese')),
                      5),
                  list('Obese', c('Underweight','Normal', 'Overweight')),
                  rep(list(c('Underweight', 
                             'Normal',
                             'Overweight',
                             'Obese')),
                      3),
                  list(c('Underweight', 'Normal', 'Overweight')))

Generate populations

First, define the loop body as a function; then use parallel::clusterMap to parallelize it. Warning: This might take a couple of minutes to run.

tmpfun <- function(gendernum, agelim, bmi_category, ExpoCast_grp,
                   nsamp, method){
  result <- tryCatch({
                     pops <- httk::httkpop_generate(
                  method=method,
                  nsamp = nsamp,
                  gendernum = gendernum,
                  agelim_years = agelim,
                  weight_category = bmi_category)

                  filepart <- switch(method,
                                     'virtual individuals' = 'vi',
                                     'direct resampling' = 'dr')
saveRDS(object=pops,
          file=paste0(paste('data/httkpop',
                      filepart, ExpoCast_grp, 
                      sep='_'),
                      '.Rdata'))
return(getwd())
}, error = function(err){
  print(paste('Error occurred:', err))
  return(1)
})
}

cluster <- parallel::makeCluster(2, # Reduced from 40 to 2 cores 
                       outfile='subpopulations_parallel_out.txt')

evalout <- parallel::clusterEvalQ(cl=cluster,
             {library(data.table)
              library(httk)})
parallel::clusterExport(cl = cluster,
              varlist = 'tmpfun')
#Set seeds on all workers for reproducibility
parallel::clusterSetRNGStream(cluster, 
                              010180)
out_vi <- parallel::clusterMap(cl=cluster,
                  fun = tmpfun,
                  gendernum=gendernum,
                  agelim=agelim,
                  bmi_category=bmi_category,
                  ExpoCast_grp = ExpoCast.group,
                  MoreArgs = list(nsamp = nsamp,
                                  method = 'virtual individuals'))
out_dr <- parallel::clusterMap(cl=cluster,
                  fun = tmpfun,
                  gendernum=gendernum,
                  agelim=agelim,
                  bmi_category=bmi_category,
                  ExpoCast_grp = ExpoCast.group,
                  MoreArgs = list(nsamp = nsamp,
                                  method = 'direct resampling'))
parallel::stopCluster(cluster)