Introduction

Rerr is a package that fits excess relative risk (ERR) models in survival analysis, while allowing for measurement error in exposures, with or without effect modifications.

Measurement errors are assumed to be given by a Monte Carlo dosimetry system (MCDS), which consists of multiple dose replications that reflect the distribution of the true dose. In linear dose effect models, where ERR models belong, parameter estimates are asymptotically unbiased, when the average dose of the MCDS is used for model fitting. Rerr corrects the effect of measurement errors by adjusting the confidence intervals of parameter estimates.

ERR models are widely used in Radiation epidemiology. The general form is

$$ h(t) = \mathrm{background}(t)\left(1 + \sum_i b_i X_i(t)\mathrm{modifier}i(t)\right)$$ $$ \mathrm{background}(t) = \exp\left(a_0 + \sum_j a_j C_j(t)\right) $$ $$ \mathrm{modifier}_i(t) = \exp\left(\sum{k_i} a_{k_i} A_{k_i}(t)\right) $$

where $t$ is time, $h(t)$ is the hazard function, $X_i$ are exposure variables, $C_j$ are baseline covariates and $A_{k_i}$ are effect modifer variables. In Rerr, we model time dependent variables by constructing person-time tables.

Currently Rerr only allow one of the $X_i$'s to be measured with error. This document will walk you through the package from installation to analysis of a simple survival dataset, simulated by a function provided by this package.

Installation

The package can be downloaded from Github at here. After unzipping the downloaded file, assuming the full path to the folder of the package is path_to_foler, type the following command in R console to install the package (replace path_to_folder with the real path):

install.packages(path_to_folder, repos=NULL, type="source")

Quick Start

Here we give an example, by simulating a simple survival study. The underlying hazard function is chosen as a ERR model that depends on two exposures, as well as age and sex. In this study, the MCDS is given by a matrix, where each column corresponds to one dose replication. The survival data is generated from one dose replication from the MCDS, while we use the remaining dose replications for model fitting and confidence interval correction.

Model specification

The ERR model is specified using a formula object in R. The simulation example provided in this document is based on a hypothetical ERR model, formulated as follows (ommiting the paramters)

$$ h(t) = \exp\left(1 + log^2(age/40) + sex\right)\left(1 + x_1 \exp(sex) + x_2\right) $$

where $x_1, x_2$ are some time-dependent or time-independent exposure variables. The corresponding formula for model specification is

case ~ I(log(age/60)^2) + sex + ERR( U(x1) | sex) + ERR(x2) + offset(pt)

As can be seen,

There are two more variables introduced in the model specification, case and offset(pt). offset(pt) specify that in the returned data.frame (from simulateERR using this model), pt represents the amount of effective person-time in each period (due to early termination or censoring), and case is the outcome indicator variable. These two variables are standard variables used in Poisson table used to analyze survival data. For details, refer to Holford 1980, Laird and Olivier 1981.

Data preparation

Now we simulate a cohort consists of 5,000 people. For simplicity we assume that each person is followed for 10 years.

library(Rerr)

n <- 5000                              # size of the cohort
n.year <- 10                           # years of follow-up
n.rep <- 400                           # number of dose realizations to generate
id <- rep(1:n, each = n.year)          # id for py table
start.age <- sample(20:50, n, TRUE)    # age of enrollment of each person
age <- unlist(lapply(start.age, function(x)
    x + 1:n.year - 1))                 # age for py table
sex <- rep(rbinom(n, 1, 0.5),
           each = n.year)              # sex for py table
time <- rep(1:n.year - 1, n)           # time variable for py table
censor.time <- rep(n.year, n * n.year) # censor time for py table
x1 <- rchisq(n * n.year, 1)            # exposure 1 for py table
x2 <- rchisq(n * n.year, 1)            # exposure 2 for py table

To generate a Monte Carlo dosimetry system (MCDS), we use the SUMA model proposed by Stram and Kopecky 2003, with only multiplicative errors. We assume the variance is 0.2 for the shared error and 0.5 for the unshared error. Here we generated 400 dose replications, as defined by n.rep.

sm.var <- 0.2                # variance of shared multiplicative error
sm.log.se <- log(1 + sm.var) # standard error on log scale
sm.log.mu <- - 1/2 * sm.var  # mean on log scale
um.var <- 0.5                # variance of unshared multiplicative error
um.log.se <- log(1 + um.var) # standard error on log scale
um.log.mu <- - 1/2 * um.var  # mean on log scale
x1.reps <- do.call(cbind, lapply(1:n.rep, function(i) {
    x1 * exp(rnorm(1, sm.log.mu, sm.log.se)) *
        exp(rnorm(n * n.year, um.log.mu, um.log.se))}))

We use the first dose replication for x1 as the true dose for the simulation of the survival data.

data <- data.frame(sex = sex,  # data frame used for simulation
                   age = age,
                   x1 = x1.reps[, 1],
                   x2 = x2,
                   id = id,
                   time = time,
                   censor.time = censor.time)
data[1:10, ]

Shown above is the structure of the data.frame object constructed for the survival data simulation.

Simulation and Inference

We use the function simulateERR to generate the outcome data using the specified ERR model. We also specify the parameters for the model, following the order of the variables in the formula object for the model.

model <- case ~ I(log(age/40)^2) + sex +
    ERR(U(x1) | sex) + ERR(x2) +
    offset(py)
theta <- c(-5, log(2), 1, 2, 0.5, 1)

data.new <- simulateERR(model, data, theta,
                        id = id,
                        time = time,
                        censor.time = censor.time)
number.case <- sum(data.new$case)

number.case

data.new[1:10,]

We can see that there are r number.case cases. There is a index column in the generated data.frame object, data.new. This column indexes the rows of the original data that are kept. The reason of having such column is, whenever an event occurs, the simulated exposure in the 'future' should not exist (this column is only useful for simulations). This column is necessary for aligning the dose replications to the generated data.frame.

We fit the model using the average dose in the remaining dose replications, and then calculate the confidence intervals of the parameters. We use three functions, parseERR, fitERR, and inferERR.

rep.index <- 1:n.rep
rep.index <- rep.index[-1]
x1.mean <- apply(x1.reps[, rep.index], 1, mean) 
data.new$x1 <- x1.mean[data.new$index]
data.err <- parseERR(model, data.new)
xm <- fitERR(data.err)
xm$conv
xm$theta
theta.CI <- inferERR(data.err, xm$theta,
                     x1.reps[data.new$index, rep.index],
                     naive = TRUE)
theta.CI

This concludes the simple simulation. As we can see here, for x1 the naive confidence interval is narrower compared to the corrected one, since it does not account for the variance introduced by shared error component in x1.



zhuozhang/Rerr documentation built on May 4, 2019, 11:22 p.m.