mungeSAS: Format ozone exposure data from SAS for Stan models
In dsidavis/Ozone2: Modeling of Human Ozone Exposure

Description Usage Arguments Value Author(s) Examples

This function takes a data.frame from readSAS, and formats it to be used by a Stan model.

1	mungeSAS(df)

`df`	`data.frame` from `readSAS` (or similarly formatted).

A list, with the following:

`max_timepts`	integer, maximum number of measurement timepoints for all observations
`max_n_dFEV1`	integer, maximum number of measured dFEV1 overall measurements
`n_obs`	integer, the number of observations
`n_ind`	integer, the number of individuals
`n_dFEV1`	integer vector, the number of dFEV1 measurement points for each observation, i.e., how often dFEV1 was measured per observation
`n_timepts`	integer vector, the number of other measurements for each observation
`ind`	integer vector, with one value per observation specifying the individual ID for that observation
`age`	numeric vector, with one value per observation specifying the individual's age at that observation. There is one per observation, as individuals age between experiments
`BMI`	numeric vector, with one value per observation
`BSA`	numeric vector, with one value per observation
`Ve`	matrix of dims [max_timepts, n_obs], with each column corresponding the values for a single experimental observation. Values are padded with 0 where there number of observed points is less than max_timepts
`Cm`	matrix, O3 measurements in similar configuration as Ve
`Cs`	matrix, O3 slope measurements, similar configuration as Ve
`Time`	matrix, integer values of the time, in minutes, when each Ve, Cm, etc. was measured. Extra spaces are padded with zeros.
`dFEV1_measure_idx`	matrix, the index of which times above dFEV1 was measured during
`dFEV1`	matrix, measured delta FEV1 at each time point

Matt Espe

##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (df) 
{
    tmp = split(df, paste(df$ID, df$STUDY, df$Lab, df$EXPOSURE))
    t_vars = lapply(tmp, extract_t_vars)
    ind_vars = as.data.frame(do.call(rbind, lapply(tmp, extract_ind_vars)))
    dFEV1 = lapply(tmp, extract_dFEV1)
    t_vars = collapse_results(t_vars)
    n_timepts = lapply(t_vars, function(x) apply(x, 2, function(x) sum(!is.na(x))))
    n_timepts = apply(do.call(rbind, n_timepts), 2, unique)
    t_vars = lapply(t_vars, trim_vars)
    t_vars = lapply(t_vars, function(x) {
        x[is.na(x)] = 0
        x
    })
    dFEV1 = collapse_results(dFEV1)
    dFEV1$DELFEV1[is.na(dFEV1$DELFEV1)] = 0
    n_obs = length(tmp)
    n_ind = length(unique(ind_vars$ID))
    n_dFEV1 = sapply(tmp, nrow)
    list(max_timepts = max(n_timepts), max_n_dFEV1 = max(n_dFEV1), 
        n_obs = n_obs, n_ind = n_ind, n_dFEV1 = n_dFEV1, n_timepts = n_timepts, 
        ind = to_id(ind_vars$ID), age = as.numeric(ind_vars$AGE), 
        BMI = as.numeric(ind_vars$BMI), BSA = as.numeric(apply(t_vars$BSA, 
            2, function(x) unique(x[x != 0]))), Ve = t(as.matrix(t_vars$Ve)), 
        Cm = t(as.matrix(t_vars$O3_mean)), Cs = t(as.matrix(t_vars$O3_slope)), 
        Time = t(as.matrix(t_vars$T)), dFEV1_measure_idx = t(dFEV1$TIME_ID), 
        dFEV1 = t(dFEV1$DELFEV1))
  }