Biostatistics III in R

Exercise 14. Non-collapsibility of proportional hazards models


We simulate for time-to-event data assuming constant hazards and then investigate whether we can estimate the underlying parameters. Note that the binary variable $X$ is essentially a coin toss and we have used a large variance for the normally distributed $U$.

library('knitr')
read_chunk('../q14.R')
opts_chunk$set(cache=FALSE)

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.


The assumed causal diagram is reproduced below:

    \usetikzlibrary{arrows,decorations.pathmorphing,backgrounds,positioning,fit,petri,matrix}
    \begin{tikzpicture}[->,bend angle=20,semithick,>=stealth']
      \matrix [matrix of nodes,row sep=10mm, column sep=15mm]
      {
        |(X)| $X$ & |(C)| $C$ \\
        & |(T)| $T$ & |(Y)| $(Y,\Delta)$ \\
        |(U)| $U$ \\
      };
      \begin{scope}[every node/.style={auto}]
        \draw (X) to node[anchor=south] {1} (T);
        \draw (U) to node[anchor=south] {1} (T);
        \draw (T) to node[anchor=north] {} (Y);
        \draw (C) to node[anchor=north] {} (Y);
      \end{scope}
    \end{tikzpicture}

(a) Fitting models with both $X$ and $U$ ##

For constant hazards, we can fit (i) Poisson regression, (ii) Cox regression and (iii) flexible parametric survival models. With both covariates, these models are expected to estimate the parameters for $X$ and $U$, with values close to 1:


If we fit a time-varying hazard ratio for $X$, we see that the hazard ratio looks reasonably constant.


(b) Fitting models with only $X$ ##

We now model by excluding the variable $U$. This variable could be excluded when it is not measured or perhaps when the variable is not considered to be a confounding variable -- from the causal diagram, the two variables $X$ and $U$ are not correlated and are only connected through the time variable $T$.



We clearly see that the estimate for $X$ is different in the models without $U$. If we now allow the the hazard ratio to vary by time:


We see that the hazard ratio is clearly time-varying, starting at an initial value at time = 0 of close to exp(1), but then declining rapidly. As discussed in class, this is an example of unmeasured heterogeneity, which is a normal random effect or, in this context, a log-normal frailty. Initially, there is no selection in the at-risk population, and the estimated marginal hazard ratio from the model without $U$ is similar to the conditional hazrd ratio from the model that adjusts for $U$. However, at later times, the at-risk population has been selected for those with a smaller frailty (because they were less likely to have had the event), and for a log-normal frailty the marginal hazard ratio is attenuated towards 1.

Let us stress that $U$ is not a confounder. The issue is that fitting proportional hazards models with unmodelled heterogeneity for variables that are not confounders can lead to time-varying hazard ratios -- and marginal estimates that do not have a causal interpretation.

(c) Rarer outcomes ##

We now simulate for rarer outcomes by changing the censoring distribution.




For rarer outcomes, the marginal estimates that do not model for $U$ are closer to the conditional estimates that have modelled for $U$.

(d) Less heterogeneity ##

We now simulate for less heterogeneity by changing the reducing the standard deviation for the random effect $U$ from 3 to 1.




The marginal estimates are now much closer to the conditional estimates. For little or no heterogeneity, the marginal and conditional estimates would be very similar.

(e) Accelerated failure time models

We re-run the baseline simulations:


The accelerated failure time models have the form $S(t|x,u)=S_0(t/\exp( \beta_1 x + \beta_2 u))$, where the function $S_0(t)$ is expressed in terms of splines. For an exponential distribution, the hazard ratio is equal to the inverse of the ratio of the means --- so we expect values of $-1$ for $x$ and $u$.


We see that the model with both $x$ and $u$ estimates values are very close to $-1$ --- which is as expected. Importantly, the model with only $x$ also estimates a value close to $-1$, which shows that the acclerated failure time models are robust to omitted covariates. The standard error for $x$ was smaller in the model with both $x$ and $u$ compared with that for the model with only $x$.



Try the biostat3 package in your browser

Any scripts or data that you put into this service are public.

biostat3 documentation built on Oct. 29, 2024, 5:07 p.m.