dsurvbin: Create data frame with binary response variables for discrete...

dsurvbinR Documentation

Create data frame with binary response variables for discrete survival analysis (experimental).

Description

This is a function takes an existing data frame with a discrete time variable and coverts the time variable into a set of binary response variables for modeling discrete survival time using the binary variables to model the discrete hazard function. It can also be use to code binary responses for a sequential (continuation ratio) regression model.

Usage

dsurvbin(
  data,
  y,
  event,
  unit.name = "unit",
  time.name = "t",
  resp.name = "y",
  open = FALSE,
  reverse = FALSE,
  long = TRUE
)

Arguments

data

The data frame containing the time variable.

y

Name of the time variable in data.

event

Indicator variable for observed (i.e., not censored) events where event = 1 if the event was observed at y and event = 0 if the event had not yet occurred by time y. If missing then it is assumed that no observations are censored.

unit.name

Variable name for observational units.

time.name

Variable name prefix for the time point of each binary response.

resp.name

Variable name for the binary response variables.

open

Logical for whether the maximum observed time point (k) should be considered as corresponding to an interval where the right endpoint is infinity so that P(T = k|T \ge k) = 1. It is assumed in this case that all observations of Y = k are effectively censored at Y = k - 1. This requires one less binary response variable. Default is FALSE.

reverse

Reverse the binary indicator so that P(Y_t = 1) = P(T > t|T \ge t). Default is FALSE.

long

Should the data be output in long-form (one binary response per row) (default is TRUE).

Details

Assuming survival time is integer-valued as T = 1,2,\dots,k, the probability of a given response can be modeled as P(T = 1) = \lambda(1) and

P(T = t) = \lambda(t)(1 - \lambda(j-1))(1 - \lambda(j-2))\dots(1 - \lambda(1))

for T > 1, where \lambda(t) = P(T = t|T \ge t) is the hazard function. If we define a set of binary response variables as Y_k = 1 if k = t and Y_k = 0 if t > k, then P(T = 1) = P(Y_1 = 1) and

P(T = t) = P(Y_t = 1)P(Y_{t-1} = 0)P(Y_{t-2} = 0)\dots P(Y_1 = 0)

for T > 1. If Y is censored at T = t, meaning that it is only known that T > t, then

P(T = t) = (1 - \lambda(t))(1 - \lambda(t-1))\dots (1 - \lambda(1)) = P(Y_t = 0)P(Y_{t-1} = 0) \dots P(Y_1 = 0).

Because the likelihood function for T is equivalent to that of the product of T independent binary responses, discrete survival time can be modeled as a set of binary response variables using logistic regression or other models for independent binary responses to model the hazard function.

A related case is the continuation ratio or sequential regression model for ordinal response variables. There T represents an ordinal response (not necessarily time) and typically one models P(T > t|T \ge t) and assumes that a response will necessarily be in the last category if it is not in the previous category meaning that P(T = k|T \ge k) = 1 if k is the highest "value" of T. This is effectively equivalent to a discrete survival model for the probability that the event does not occur at time T given that it has not yet occurred, and all observations of Y = k are censored at Y = k - 1. This can be achieved by using the open = TRUE and reverse = TRUE (see the example below).

Examples

# setup for discrete survival model where the first five times are right-censored
d <- data.frame(time = rep(1:5, 2), x = rnorm(10), status = rep(0:1, each = 5))
dsurvbin(d, "time", "status")
# setup for a continuation ratio (sequential regression) model
d <- data.frame(time = rep(1:5, 2), x = rnorm(10))
dsurvbin(d, "time", open = TRUE, reverse = TRUE)

trobinj/trtools documentation built on Jan. 28, 2024, 3:20 a.m.