We simulate for time-to-event data assuming constant hazards and then investigate whether we can estimate the underlying parameters. Note that the binary variable $X$ is essentially a coin toss and we have used a large variance for the normally distributed $U$.
library('knitr') read_chunk('../q14.R') opts_chunk$set(cache=FALSE)
You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest")
for each package you require.
The assumed causal diagram is reproduced below:
\usetikzlibrary{arrows,decorations.pathmorphing,backgrounds,positioning,fit,petri,matrix} \begin{tikzpicture}[->,bend angle=20,semithick,>=stealth'] \matrix [matrix of nodes,row sep=10mm, column sep=15mm] { |(X)| $X$ & |(C)| $C$ \\ & |(T)| $T$ & |(Y)| $(Y,\Delta)$ \\ |(U)| $U$ \\ }; \begin{scope}[every node/.style={auto}] \draw (X) to node[anchor=south] {1} (T); \draw (U) to node[anchor=south] {1} (T); \draw (T) to node[anchor=north] {} (Y); \draw (C) to node[anchor=north] {} (Y); \end{scope} \end{tikzpicture}
For constant hazards, we can fit (i) Poisson regression, (ii) Cox regression and (iii) flexible parametric survival models. With both covariates, these models are expected to estimate the parameters for $X$ and $U$, with values close to 1:
If we fit a time-varying hazard ratio for $X$, we see that the hazard ratio looks reasonably constant.
We now model by excluding the variable $U$. This variable could be excluded when it is not measured or perhaps when the variable is not considered to be a confounding variable -- from the causal diagram, the two variables $X$ and $U$ are not correlated and are only connected through the time variable $T$.
We clearly see that the estimate for $X$ is different in the models without $U$. If we now allow the the hazard ratio to vary by time:
We see that the hazard ratio is clearly time-varying, starting at an initial value at time = 0 of close to exp(1), but then declining rapidly. As discussed in class, this is an example of unmeasured heterogeneity, which is a normal random effect or, in this context, a log-normal frailty. Initially, there is no selection in the at-risk population, and the estimated marginal hazard ratio from the model without $U$ is similar to the conditional hazrd ratio from the model that adjusts for $U$. However, at later times, the at-risk population has been selected for those with a smaller frailty (because they were less likely to have had the event), and for a log-normal frailty the marginal hazard ratio is attenuated towards 1.
Let us stress that $U$ is not a confounder. The issue is that fitting proportional hazards models with unmodelled heterogeneity for variables that are not confounders can lead to time-varying hazard ratios -- and marginal estimates that do not have a causal interpretation.
We now simulate for rarer outcomes by changing the censoring distribution.
For rarer outcomes, the marginal estimates that do not model for $U$ are closer to the conditional estimates that have modelled for $U$.
We now simulate for less heterogeneity by changing the reducing the standard deviation for the random effect $U$ from 3 to 1.
The marginal estimates are now much closer to the conditional estimates. For little or no heterogeneity, the marginal and conditional estimates would be very similar.
We re-run the baseline simulations:
The accelerated failure time models have the form $S(t|x,u)=S_0(t/\exp( \beta_1 x + \beta_2 u))$, where the function $S_0(t)$ is expressed in terms of splines. For an exponential distribution, the hazard ratio is equal to the inverse of the ratio of the means --- so we expect values of $-1$ for $x$ and $u$.
We see that the model with both $x$ and $u$ estimates values are very close to $-1$ --- which is as expected. Importantly, the model with only $x$ also estimates a value close to $-1$, which shows that the acclerated failure time models are robust to omitted covariates. The standard error for $x$ was smaller in the model with both $x$ and $u$ compared with that for the model with only $x$.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.