Introduction to the package

The OzoneExposure package was created to facilitate the fitting and comparison of two model for the effects of ozone exposure in humans. These two models are from the following publications:

For convenience, we have named the models according to the last name of the first author, e.g. the McDonnell model and the Schelegle model.

Both models predict changes in the FEV1 measure (delta FEV1 or dFEV1 hereafter) over time after exposure to different ozone concentrations. The data used to fit these models is from a series of experiments conducted at UC-Davis and by the EPA. Each model has been coded in Stan, a language for probabilistic inference (see mc-stan.org), to increase the speed at which calculations are done and hence to speed up the model fitting process. Each model is compiled on package install, so the installation process takes a little while, but thereafter should be pretty quick.

Model fitting procedures

We discovered that both models can confound a traditional numerical optimizer due to multiple local maxima present in the response surface. Therefore, neither model can be reliably optimized in a single shot. However, each model needs a slightly different approach in order to find the global optima.

For the Schelegle model: This model only has 4 parameters when fit using the log-likelihood: DOS, K, A, and the residual error (sigma). Therefore, we found that a grid search was possible to explore the response surface. In short, we form a grid of possible parameter combinations, then run the model with these parameters set and calculate the fit statistic (e.g., the AIC) for each combination. Since the model code is compiled for computational speed, each iteration of this takes roughly 0.001 seconds on a reasonably fast laptop. Moreover, the task of the grid search can use multiple CPU cores, which has the potential to speed up the model fit.

For the McDonnell model: Since this model utilizes a random effect parameter for each individual, as well as 8-10 unique fixed parameters, the number of possible combinations of parameters quickly escalates to the point where a grid search is not possible. Therefore, the model is fit repeatedly from different random points, with the optima for each random start recorded. Given a high number of random starting points, different optima can be found and compared to identify the one that gives the best model fit.

Using the package

Quick start

Reading data

These models have been coded in a way that they can be fit to the exact same input data. This has been done to facilitate comparisons between models when subjects are removed from the data set.

First, the data have been archived as SAS formatted files with the following names:

To read these into R, we first provide the file names, with the complete file path, to the function readSAS,

library(OzoneExposure)
ozone_files = c(
    "~/Downloads/z9312_epa.sas7bdat",
    "~/Downloads/z9312_kim.sas7bdat",
    "~/Downloads/z9312_ucd.sas7bdat")

ozone_data = readSAS(ozone_files)

The readSAS function will take multiple file names concatenated together, allowing us to read one or all of the data into R at once.

Subsetting and Formatting Data

The data are formatted for SAS, with multiple rows per individual/experiment representing different dFEV1 measurement times, but with the intervals for changes in O3 or exercise represented by multiple columns. To fit the models to these data, we need to change the format. The function mungeSAS will re-format the data, but we should first do any subsetting of individuals or experiments first.

For example, lets say we are only interested in the data from the "ARO" experiment,

aro_data = subset(ozone_data, STUDY == "ARO")

We can now re-format these data,

aro_model_data = mungeSAS(aro_data)

The reformatted data are in a list, with multiple matrices representing the different variables (e.g., O3, Ve, or BSA levels).

Fitting the Models

The list object created by mungeSAS can directly be fed into a model, for example the McDonnell model;

result = fit_McDonnell(aro_model_data, n_optim = 100)
min(result$aic)

or the Schelegle model,

result2 = fit_Schelegle(aro_model_data, n_interval = 5)
result2[which.min(result2$aic),]

We can then explore which combination of parameters give us the lowest AIC values,

par(mfrow=c(2,2))
sapply(result2[,c("dos","k","a")], function(x) plot(result2$aic ~ x))

Model comparison

Each model produces an estimated AIC value for each parameter combination, as well as the log-likelihood. The models can be compared directly via the AIC. However, you should be aware the the McDonnell model will be penalized rather strongly for having many more parameters than the Schelegle model.

Model expansion and modification

Writing the models in Stan involved several trade-offs. The Stan models are compiled, which makes the computations much faster compared to native R code. However, if you want to modify the models, you need to follow these steps:

  1. Create .stan code for the model
  2. Compile the model using stan::stan_model()
  3. Provide this model as the model= argument for the function being run, e.g. fit_Schelegle

Most of the computations in either model is happening in the functions block of the Stan code, and users would be best focusing their modifications there.

Example: Modifying the model

In this example, we are going to start with the existing models as a starting point. We can find where the Stan code for these have been saved on our machine with:

system.file("stan_files", "schelegle.stan", package = "OzoneExposure")

We can make a copy of this file, and then start modifying that copy as we want. Suppose we saved this modified model to our home directory and called it "schelegle_modified.stan", we could compile it with,

my_mod= rstan::stan_model("~/schelegle_modified.stan")

Then, we can use this modified model in the fitting algorithm by substituting the default model,

fit = fit_Schelegle(avo_model_data, model = my_mod)

Please note: in the above workflow, the modified model will need to be re-compiled for every new R session.

For reference on how to write Stan code, we recommend the resources provided by the Stan development team here.



dsidavis/Ozone2 documentation built on June 26, 2019, 7:35 a.m.