mungeSAS: Format ozone exposure data from SAS for Stan models

Description Usage Arguments Value Author(s) Examples

Description

This function takes a data.frame from readSAS, and formats it to be used by a Stan model.

Usage

1

Arguments

df

data.frame from readSAS (or similarly formatted).

Value

A list, with the following:

max_timepts

integer, maximum number of measurement timepoints for all observations

max_n_dFEV1

integer, maximum number of measured dFEV1 overall measurements

n_obs

integer, the number of observations

n_ind

integer, the number of individuals

n_dFEV1

integer vector, the number of dFEV1 measurement points for each observation, i.e., how often dFEV1 was measured per observation

n_timepts

integer vector, the number of other measurements for each observation

ind

integer vector, with one value per observation specifying the individual ID for that observation

age

numeric vector, with one value per observation specifying the individual's age at that observation. There is one per observation, as individuals age between experiments

BMI

numeric vector, with one value per observation

BSA

numeric vector, with one value per observation

Ve

matrix of dims [max_timepts, n_obs], with each column corresponding the values for a single experimental observation. Values are padded with 0 where there number of observed points is less than max_timepts

Cm

matrix, O3 measurements in similar configuration as Ve

Cs

matrix, O3 slope measurements, similar configuration as Ve

Time

matrix, integer values of the time, in minutes, when each Ve, Cm, etc. was measured. Extra spaces are padded with zeros.

dFEV1_measure_idx

matrix, the index of which times above dFEV1 was measured during

dFEV1

matrix, measured delta FEV1 at each time point

Author(s)

Matt Espe

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (df) 
{
    tmp = split(df, paste(df$ID, df$STUDY, df$Lab, df$EXPOSURE))
    t_vars = lapply(tmp, extract_t_vars)
    ind_vars = as.data.frame(do.call(rbind, lapply(tmp, extract_ind_vars)))
    dFEV1 = lapply(tmp, extract_dFEV1)
    t_vars = collapse_results(t_vars)
    n_timepts = lapply(t_vars, function(x) apply(x, 2, function(x) sum(!is.na(x))))
    n_timepts = apply(do.call(rbind, n_timepts), 2, unique)
    t_vars = lapply(t_vars, trim_vars)
    t_vars = lapply(t_vars, function(x) {
        x[is.na(x)] = 0
        x
    })
    dFEV1 = collapse_results(dFEV1)
    dFEV1$DELFEV1[is.na(dFEV1$DELFEV1)] = 0
    n_obs = length(tmp)
    n_ind = length(unique(ind_vars$ID))
    n_dFEV1 = sapply(tmp, nrow)
    list(max_timepts = max(n_timepts), max_n_dFEV1 = max(n_dFEV1), 
        n_obs = n_obs, n_ind = n_ind, n_dFEV1 = n_dFEV1, n_timepts = n_timepts, 
        ind = to_id(ind_vars$ID), age = as.numeric(ind_vars$AGE), 
        BMI = as.numeric(ind_vars$BMI), BSA = as.numeric(apply(t_vars$BSA, 
            2, function(x) unique(x[x != 0]))), Ve = t(as.matrix(t_vars$Ve)), 
        Cm = t(as.matrix(t_vars$O3_mean)), Cs = t(as.matrix(t_vars$O3_slope)), 
        Time = t(as.matrix(t_vars$T)), dFEV1_measure_idx = t(dFEV1$TIME_ID), 
        dFEV1 = t(dFEV1$DELFEV1))
  }

dsidavis/Ozone2 documentation built on June 26, 2019, 7:35 a.m.