The OzoneExposure package was created to facilitate the fitting and comparison of two model for the effects of ozone exposure in humans. These two models are from the following publications:
William F. McDonnell, Paul W. Stewart, Marjo V. Smith, Chong S. Kim & Edward S. Schelegle (2012) Prediction of lung function response for populations exposed to a wide range of ozone conditions, Inhalation Toxicology, 24:10, 619-633, DOI: 10.3109/08958378.2012.705919
Edward S. Schelegle, William C. Adams, William F. Walby & M. Susan Marion (2012) Modelling of individual subject ozone exposure response kinetics, Inhalation Toxicology, 24:7, 401-415
For convenience, we have named the models according to the last name of the first author, e.g. the McDonnell model and the Schelegle model.
Both models predict changes in the FEV1 measure (delta FEV1 or dFEV1 hereafter) over time after exposure to different ozone concentrations. The data used to fit these models is from a series of experiments conducted at UC-Davis and by the EPA. Each model has been coded in Stan, a language for probabilistic inference (see mc-stan.org), to increase the speed at which calculations are done and hence to speed up the model fitting process. Each model is compiled on package install, so the installation process takes a little while, but thereafter should be pretty quick.
We discovered that both models can confound a traditional numerical optimizer due to multiple local maxima present in the response surface. Therefore, neither model can be reliably optimized in a single shot. However, each model needs a slightly different approach in order to find the global optima.
For the Schelegle model: This model only has 4 parameters when fit using the log-likelihood: DOS, K, A, and the residual error (sigma). Therefore, we found that a grid search was possible to explore the response surface. In short, we form a grid of possible parameter combinations, then run the model with these parameters set and calculate the fit statistic (e.g., the AIC) for each combination. Since the model code is compiled for computational speed, each iteration of this takes roughly 0.001 seconds on a reasonably fast laptop. Moreover, the task of the grid search can use multiple CPU cores, which has the potential to speed up the model fit.
For the McDonnell model: Since this model utilizes a random effect parameter for each individual, as well as 8-10 unique fixed parameters, the number of possible combinations of parameters quickly escalates to the point where a grid search is not possible. Therefore, the model is fit repeatedly from different random points, with the optima for each random start recorded. Given a high number of random starting points, different optima can be found and compared to identify the one that gives the best model fit.
These models have been coded in a way that they can be fit to the exact same input data. This has been done to facilitate comparisons between models when subjects are removed from the data set.
First, the data have been archived as SAS formatted files with the following names:
To read these into R, we first provide the file names, with the
complete file path, to the function readSAS
,
library(OzoneExposure) ozone_files = c( "~/Downloads/z9312_epa.sas7bdat", "~/Downloads/z9312_kim.sas7bdat", "~/Downloads/z9312_ucd.sas7bdat") ozone_data = readSAS(ozone_files)
The readSAS
function will take multiple file names concatenated
together, allowing us to read one or all of the data into R at once.
The data are formatted for SAS, with multiple rows per
individual/experiment representing different dFEV1 measurement times,
but with the intervals for changes in O3 or exercise represented by
multiple columns. To fit the models to these data, we need to change
the format. The function mungeSAS
will re-format the data, but we should first do any subsetting of
individuals or experiments first.
For example, lets say we are only interested in the data from the "ARO" experiment,
aro_data = subset(ozone_data, STUDY == "ARO")
We can now re-format these data,
aro_model_data = mungeSAS(aro_data)
The reformatted data are in a list, with multiple matrices representing the different variables (e.g., O3, Ve, or BSA levels).
The list object created by mungeSAS
can directly be fed into a
model, for example the McDonnell model;
result = fit_McDonnell(aro_model_data, n_optim = 100) min(result$aic)
or the Schelegle model,
result2 = fit_Schelegle(aro_model_data, n_interval = 5)
result2[which.min(result2$aic),]
We can then explore which combination of parameters give us the lowest AIC values,
par(mfrow=c(2,2)) sapply(result2[,c("dos","k","a")], function(x) plot(result2$aic ~ x))
Each model produces an estimated AIC value for each parameter combination, as well as the log-likelihood. The models can be compared directly via the AIC. However, you should be aware the the McDonnell model will be penalized rather strongly for having many more parameters than the Schelegle model.
Writing the models in Stan involved several trade-offs. The Stan models are compiled, which makes the computations much faster compared to native R code. However, if you want to modify the models, you need to follow these steps:
stan::stan_model()
model=
argument for the function being
run, e.g. fit_Schelegle
Most of the computations in either model is happening in the
functions
block of the Stan code, and users would be best focusing
their modifications there.
In this example, we are going to start with the existing models as a starting point. We can find where the Stan code for these have been saved on our machine with:
system.file("stan_files", "schelegle.stan", package = "OzoneExposure")
We can make a copy of this file, and then start modifying that copy as we want. Suppose we saved this modified model to our home directory and called it "schelegle_modified.stan", we could compile it with,
my_mod= rstan::stan_model("~/schelegle_modified.stan")
Then, we can use this modified model in the fitting algorithm by substituting the default model,
fit = fit_Schelegle(avo_model_data, model = my_mod)
Please note: in the above workflow, the modified model will need to be re-compiled for every new R session.
For reference on how to write Stan code, we recommend the resources provided by the Stan development team here.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.