Rerr is a package that fits excess relative risk (ERR) models in survival analysis, while allowing for measurement error in exposures, with or without effect modifications.
Measurement errors are assumed to be given by a Monte Carlo dosimetry
system (MCDS), which consists of multiple dose replications that
reflect the distribution of the true dose. In linear dose effect
models, where ERR models belong, parameter estimates are
asymptotically unbiased, when the average dose of the MCDS is used for
model fitting. Rerr
corrects the effect of measurement errors by
adjusting the confidence intervals of parameter estimates.
ERR models are widely used in Radiation epidemiology. The general form is
$$ h(t) = \mathrm{background}(t)\left(1 + \sum_i b_i X_i(t)\mathrm{modifier}i(t)\right)$$ $$ \mathrm{background}(t) = \exp\left(a_0 + \sum_j a_j C_j(t)\right) $$ $$ \mathrm{modifier}_i(t) = \exp\left(\sum{k_i} a_{k_i} A_{k_i}(t)\right) $$
where $t$ is time, $h(t)$ is the hazard function, $X_i$ are exposure
variables, $C_j$ are baseline covariates and $A_{k_i}$ are effect
modifer variables. In Rerr
, we model time dependent variables by
constructing person-time tables.
Currently Rerr
only allow one of the $X_i$'s to be measured with
error. This document will walk you through the package from
installation to analysis of a simple survival dataset, simulated by a
function provided by this package.
The package can be downloaded from Github at
here. After unzipping the
downloaded file, assuming the full path to the folder of the package
is path_to_foler
, type the following command in R console to
install the package (replace path_to_folder
with the real path):
install.packages(path_to_folder, repos=NULL, type="source")
Here we give an example, by simulating a simple survival study. The underlying hazard function is chosen as a ERR model that depends on two exposures, as well as age and sex. In this study, the MCDS is given by a matrix, where each column corresponds to one dose replication. The survival data is generated from one dose replication from the MCDS, while we use the remaining dose replications for model fitting and confidence interval correction.
The ERR model is specified using a formula
object in R. The
simulation example provided in this document is based on a
hypothetical ERR model, formulated as follows (ommiting the paramters)
$$ h(t) = \exp\left(1 + log^2(age/40) + sex\right)\left(1 + x_1 \exp(sex) + x_2\right) $$
where $x_1, x_2$ are some time-dependent or time-independent exposure
variables. The corresponding formula
for model specification is
case ~ I(log(age/60)^2) + sex + ERR( U(x1) | sex) + ERR(x2) + offset(pt)
As can be seen,
x1
, x2
are enclosed by
ERR()
;|
;U()
;There are two more variables introduced in the model specification,
case
and offset(pt)
. offset(pt)
specify that in the returned
data.frame
(from simulateERR
using this model), pt
represents
the amount of effective person-time in each period (due to early
termination or censoring), and case
is the outcome indicator
variable. These two variables are standard variables used in Poisson
table used to analyze survival data. For details, refer to Holford
1980, Laird and Olivier 1981.
Now we simulate a cohort consists of 5,000 people. For simplicity we assume that each person is followed for 10 years.
library(Rerr) n <- 5000 # size of the cohort n.year <- 10 # years of follow-up n.rep <- 400 # number of dose realizations to generate id <- rep(1:n, each = n.year) # id for py table start.age <- sample(20:50, n, TRUE) # age of enrollment of each person age <- unlist(lapply(start.age, function(x) x + 1:n.year - 1)) # age for py table sex <- rep(rbinom(n, 1, 0.5), each = n.year) # sex for py table time <- rep(1:n.year - 1, n) # time variable for py table censor.time <- rep(n.year, n * n.year) # censor time for py table x1 <- rchisq(n * n.year, 1) # exposure 1 for py table x2 <- rchisq(n * n.year, 1) # exposure 2 for py table
To generate a Monte Carlo dosimetry system (MCDS), we use the SUMA
model proposed by Stram and Kopecky 2003, with only multiplicative
errors. We assume the variance is 0.2 for the shared error and 0.5
for the unshared error. Here we generated 400 dose replications, as
defined by n.rep
.
sm.var <- 0.2 # variance of shared multiplicative error sm.log.se <- log(1 + sm.var) # standard error on log scale sm.log.mu <- - 1/2 * sm.var # mean on log scale um.var <- 0.5 # variance of unshared multiplicative error um.log.se <- log(1 + um.var) # standard error on log scale um.log.mu <- - 1/2 * um.var # mean on log scale x1.reps <- do.call(cbind, lapply(1:n.rep, function(i) { x1 * exp(rnorm(1, sm.log.mu, sm.log.se)) * exp(rnorm(n * n.year, um.log.mu, um.log.se))}))
We use the first dose replication for x1
as the true dose for
the simulation of the survival data.
data <- data.frame(sex = sex, # data frame used for simulation age = age, x1 = x1.reps[, 1], x2 = x2, id = id, time = time, censor.time = censor.time) data[1:10, ]
Shown above is the structure of the data.frame
object constructed
for the survival data simulation.
We use the function simulateERR
to generate the outcome data using
the specified ERR model. We also specify the parameters for the
model, following the order of the variables in the formula
object
for the model.
model <- case ~ I(log(age/40)^2) + sex + ERR(U(x1) | sex) + ERR(x2) + offset(py) theta <- c(-5, log(2), 1, 2, 0.5, 1) data.new <- simulateERR(model, data, theta, id = id, time = time, censor.time = censor.time) number.case <- sum(data.new$case) number.case data.new[1:10,]
We can see that there are r number.case
cases. There is a index
column in the generated data.frame
object, data.new
. This column
indexes the rows of the original data
that are kept. The reason of
having such column is, whenever an event occurs, the simulated
exposure in the 'future' should not exist (this column is only useful
for simulations). This column is necessary for aligning the dose
replications to the generated data.frame
.
We fit the model using the average dose in the remaining dose
replications, and then calculate the confidence intervals of the
parameters. We use three functions, parseERR
, fitERR
, and
inferERR
.
parseERR
parses the data and the model into a ERR object, which
contains essential information for model fitting and inference;fitERR
use the parsed ERR object to fit the ERR model to get the
MLE;inferERR
takes the parsed ERR object and the MLE to calculate the
confidence intervals of the parameters.rep.index <- 1:n.rep rep.index <- rep.index[-1] x1.mean <- apply(x1.reps[, rep.index], 1, mean) data.new$x1 <- x1.mean[data.new$index] data.err <- parseERR(model, data.new) xm <- fitERR(data.err) xm$conv xm$theta theta.CI <- inferERR(data.err, xm$theta, x1.reps[data.new$index, rep.index], naive = TRUE) theta.CI
This concludes the simple simulation. As we can see here, for x1
the
naive confidence interval is narrower compared to the corrected one,
since it does not account for the variance introduced by shared error
component in x1
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.