dsurvbin: Create data frame with binary response variables for discrete...
In trobinj/trtools: Miscellaneous Tools for Teaching Statistics

dsurvbin

R Documentation

Create data frame with binary response variables for discrete survival analysis (experimental).

Description

This is a function takes an existing data frame with a discrete time variable and coverts the time variable into a set of binary response variables for modeling discrete survival time using the binary variables to model the discrete hazard function. It can also be use to code binary responses for a sequential (continuation ratio) regression model.

Usage

dsurvbin(
  data,
  y,
  event,
  unit.name = "unit",
  time.name = "t",
  resp.name = "y",
  open = FALSE,
  reverse = FALSE,
  long = TRUE
)

Arguments

`data`	The data frame containing the time variable.
`y`	Name of the time variable in `data`.
`event`	Indicator variable for observed (i.e., not censored) events where `event = 1` if the event was observed at `y` and `event = 0` if the event had not yet occurred by time `y`. If missing then it is assumed that no observations are censored.
`unit.name`	Variable name for observational units.
`time.name`	Variable name prefix for the time point of each binary response.
`resp.name`	Variable name for the binary response variables.
`open`	Logical for whether the maximum observed time point (`k`) should be considered as corresponding to an interval where the right endpoint is infinity so that `P(T = k\|T \ge k) = 1`. It is assumed in this case that all observations of `Y = k` are effectively censored at `Y = k - 1`. This requires one less binary response variable. Default is FALSE.
`reverse`	Reverse the binary indicator so that `P(Y_t = 1) = P(T > t\|T \ge t)`. Default is FALSE.
`long`	Should the data be output in long-form (one binary response per row) (default is TRUE).

Details

Assuming survival time is integer-valued as T = 1,2,\dots,k, the probability of a given response can be modeled as P(T = 1) = \lambda(1) and

P(T = t) = \lambda(t)(1 - \lambda(j-1))(1 - \lambda(j-2))\dots(1 - \lambda(1))

for T > 1, where \lambda(t) = P(T = t|T \ge t) is the hazard function. If we define a set of binary response variables as Y_k = 1 if k = t and Y_k = 0 if t > k, then P(T = 1) = P(Y_1 = 1) and

P(T = t) = P(Y_t = 1)P(Y_{t-1} = 0)P(Y_{t-2} = 0)\dots P(Y_1 = 0)

for T > 1. If Y is censored at T = t, meaning that it is only known that T > t, then

P(T = t) = (1 - \lambda(t))(1 - \lambda(t-1))\dots (1 - \lambda(1)) = P(Y_t = 0)P(Y_{t-1} = 0) \dots P(Y_1 = 0).

Because the likelihood function for T is equivalent to that of the product of T independent binary responses, discrete survival time can be modeled as a set of binary response variables using logistic regression or other models for independent binary responses to model the hazard function.

A related case is the continuation ratio or sequential regression model for ordinal response variables. There T represents an ordinal response (not necessarily time) and typically one models P(T > t|T \ge t) and assumes that a response will necessarily be in the last category if it is not in the previous category meaning that P(T = k|T \ge k) = 1 if k is the highest "value" of T. This is effectively equivalent to a discrete survival model for the probability that the event does not occur at time T given that it has not yet occurred, and all observations of Y = k are censored at Y = k - 1. This can be achieved by using the open = TRUE and reverse = TRUE (see the example below).

Examples

# setup for discrete survival model where the first five times are right-censored
d <- data.frame(time = rep(1:5, 2), x = rnorm(10), status = rep(0:1, each = 5))
dsurvbin(d, "time", "status")
# setup for a continuation ratio (sequential regression) model
d <- data.frame(time = rep(1:5, 2), x = rnorm(10))
dsurvbin(d, "time", open = TRUE, reverse = TRUE)

trobinj/trtools documentation built on Jan. 3, 2025, 4:14 a.m.