knitr::opts_chunk$set(
  tidy.opts  = list(width.cutoff = 80),
  echo       = TRUE,
  message    = FALSE,
  fig.align  = "center",
  crop       = TRUE,
  cache      = TRUE,
  fig.width  = 7,
  fig.height = 3.5)

Introduction

We illustrate application of PAMMs in the analysis of the effect of HIV exposure on the time to staphylococcus aureaus infection in children, with possible recurrences. The children are anonymized and random ids were generated. The following packages are used in this analysis:

library(mgcv)
library(pammtools)
library(dplyr) #data wrangling
library(ggplot2) #visualization
theme_set(theme_bw())

Time until staphylococcus aureaus infection in children

The data set (staph), is contained within the pammtools package and includes 374 observations from 137 children (from the Drakenstein child health study) with a maximum of 6 recurrences. The staph data is in longitudinal format reflecting the recurrences for children over different rows. Here

data("staph")
head(staph)

The data for the first two children are

staph %>% filter(id %in% c("1", "2"))

Transformation to PED format

In order to apply PAMMs to recurrent events, we first transform the data to the piece-wise exponential data (PED) format (see here for details). We use the as_ped() function in pammtools. For the data transformation in the recurrent events context, we first need to decide whether analysis should be performed on the calendar- or gap-time scale. This is controlled by the timescale argument (see below). For these data, we use the gap-time.

The individual inputs for the data transformation are given as follows:

ped <- as_ped(
  formula     = Surv(t.start,t.stop,event) ~ hiv,
  id          = "id",
  data        = staph,
  transition  = "enum",
  timescale   = "gap",
  max_time    = 366)

The resulting data for the first two infants is indicated below (we show the first and last observation of each infant for each event number they were at risk):

ped %>%
  filter(id %in% c("1", "2")) %>%
  group_by(id, enum) %>%
  slice(1, n())

Baseline model {#id}

Model specification

We first model the baseline hazards over time. Biologically, the infection incidence in gap time may be different for the first event compared with the recurrences. However, estimation of the baseline hazard for each of the event numbers is not useful/feasible since only a few subjects experienced 3 or more events. Therefore, we will create a new variable to indicate whether the event a child is at risk for is the first event or a recurrence. Note, however, that we do this after PED data transformation and use the full information to create the PED data. We only use enum_strata for stratification when estimating the baseline hazard.

ped <- ped %>%
  mutate(enum_strata = as.factor(ifelse(enum > 1,"recurrent","first")))

Using this data, we fit the following model

$$ \lambda(t,k|z_i) = \exp(\beta_{0k} + f_{0,k}(t) + z_i), $$

where $z_i\sim N(0, \sigma_z^2)$ are Gaussian, child specific random effects that account for child-specific frailty and $\beta_{0k}$ and $f_{0k}$ are the constant (intercept) and non-linear parts of the log-baseline hazard of the $k$-th event. Given the data transformation, this is equivalent to fitting a stratified PAM (see here for details), where stratification is done w.r.t. to enum_strata.

Fitting the model

Details on the model specification in R are given below. Note that we use the pamm function to fit the model, which is a wrapper to mgcv::gam or mgcv::bam, but direct modeling of the data using any suitable package/function would also be possible. The other arguments are passed to the respective fitting function.

In the formula argument,

pamm0 <- pamm(
  formula  = ped_status ~ enum_strata + s(tend, by = enum_strata) + s(id, bs = "re"),
  data     = ped,
  engine   = "bam",
  method   = "fREML",
  discrete = TRUE)

As usual for mgcv objects, the resulting model summary output shown below is separated into two parts, one for the "parametric coefficients" and one for the "smooth terms". In case of smooth terms, the estimated degrees of freedom (edf) gives us an idea of how "wiggly" the respective smooth functions are, and the p-values test whether overall these are different from a flat line [@wood_p-values_2013]. For the random effects terms, we report their estimated variances (gam.vcomp) and p-values (summary output). From the output, we see that the random effects are statistically insignificant $(p=0.249)$.

summary(pamm0)

The standard deviation of the random effects is also relatively small and can be extracted using the gam.vcomp command:

gam.vcomp(pamm0)

The results indicate that random effects are not required in this model. We therefore omit the random effects in the following.

pam0 <- update(pamm0, .~. - s(id, bs = "re"))
summary(pam0)

Estimates over time and visualization

To obtain and visualize any quantity based on the fitted model, it is easiest to create a suitable data set and calculate the quantity of interest. pammtools provides the respective convenience functions.

Below we illustrate the calculation and visualization of the baseline hazard over time, where we use

Once the data is created and augmented with the predicted hazard, we can use standard visualization function to create the graphics:

newdata <- ped %>%
  make_newdata(
    tend = unique(tend),
    enum_strata = unique(enum_strata)) %>%
  add_hazard(pamm0, type = "response")

ggplot(newdata, aes(x = tend/(365.25/12), y = hazard*365.25)) +
  geom_line() +
  geom_ribbon(aes(ymin  = ci_lower*365.25, ymax = ci_upper*365.25), alpha = .3) +
  ylab(expression(hat(h)(t))) + xlab("Time (months)") +
  scale_x_continuous(limits = c(0, 12.5),breaks=seq(0,12,2),expand=c(0,0)) +
  facet_wrap(~enum_strata)

We can see, that the hazard for the first event is high in the beginning and declines over time, as most children will have had the first infection after one year of life. Consequently, the hazard for recurrence is low in the beginning and increases over time before it flattens out after about two months.

Calculation of survival probabilities works equivalently, except that we use the add_surv_prob function instead of add_hazard. Note that calculating the survival probability involves calculating a cumulative hazard. Thus, when there is a grouping structure in the data (here the different strata), we have to group by this variable before we calculate cumulative hazards or survival probabilities. Note also, that we use geom_surv in order to force the function to start at a survival probability of 1.

newdata <-newdata %>%
  group_by(enum_strata) %>%
  add_surv_prob(pamm0)

ggplot(newdata, aes(x = tend/(365.25/12), y = surv_prob)) +
  geom_surv() +
  geom_ribbon(aes(ymin  = surv_lower, ymax = surv_upper), alpha = .3) +
  ylab(expression(hat(S)(t))) + xlab("Time (months)") +
  scale_x_continuous(limits = c(0, 12.1)) +
  facet_wrap(~enum_strata)

Modeling the effects of HIV assuming proportional hazards

The HIV exposure variable indicates whether children were HIV exposed and uninfected (HEU) by being born to HIV positive mothers or HIV uninfected (HU). We will fit this model in the PAMM framework to evaluate the effect of HIV exposure while assuming proportional hazards. This means that the effects of HIV shift the log-hazard by some constant over time. We fit two models. In the first model, we assume that the hazard ratio is the same for first and recurrent infections. In the second model, we allow different hazard ratios for first and recurrent infections, but we still assume both these hazard ratios are proportional over time.

Model 1

pam1 <- pamm(
  formula  = ped_status ~ enum_strata + s(tend, by = enum_strata) + hiv,
  data     = ped,
  engine   = "bam",
  method   = "fREML",
  discrete = TRUE)

summary(pam1)

We can see that estimated effect of HIV exposure in terms of the hazard ratio is $HR=\exp(0.2681)=1.31\;(p=0.062)$.

Model 2

pam2 <- pamm(
  formula  = ped_status ~ enum_strata + s(tend, by = enum_strata) + hiv:enum_strata,
  data     = ped,
  engine   = "bam",
  method   = "fREML",
  discrete = TRUE)

summary(pam2)

We can see that estimated effect of HIV exposure in terms of the hazard ratio is $HR=\exp(-0.0201)=0.98\;(p=0.925)$ for the first infection and $HR=\exp(0.5427)=1.72\;(p=0.006)$ for recurrent infections.

References



adibender/pammtools documentation built on June 12, 2025, 11:41 p.m.