| dsurvbin | R Documentation |
This is a function takes an existing data frame with a discrete time variable and coverts the time variable into a set of binary response variables for modeling discrete survival time using the binary variables to model the discrete hazard function. It can also be use to code binary responses for a sequential (continuation ratio) regression model.
dsurvbin(
data,
y,
event,
unit.name = "unit",
time.name = "t",
resp.name = "y",
open = FALSE,
reverse = FALSE,
long = TRUE
)
data |
The data frame containing the time variable. |
y |
Name of the time variable in |
event |
Indicator variable for observed (i.e., not censored) events where |
unit.name |
Variable name for observational units. |
time.name |
Variable name prefix for the time point of each binary response. |
resp.name |
Variable name for the binary response variables. |
open |
Logical for whether the maximum observed time point ( |
reverse |
Reverse the binary indicator so that |
long |
Should the data be output in long-form (one binary response per row) (default is TRUE). |
Assuming survival time is integer-valued as T = 1,2,\dots,k, the probability of a given response can be modeled as P(T = 1) = \lambda(1) and
P(T = t) = \lambda(t)(1 - \lambda(j-1))(1 - \lambda(j-2))\dots(1 - \lambda(1))
for T > 1, where \lambda(t) = P(T = t|T \ge t) is the hazard function. If we define a set of binary response variables as Y_k = 1 if k = t and Y_k = 0 if t > k, then P(T = 1) = P(Y_1 = 1) and
P(T = t) = P(Y_t = 1)P(Y_{t-1} = 0)P(Y_{t-2} = 0)\dots P(Y_1 = 0)
for T > 1. If Y is censored at T = t, meaning that it is only known that T > t, then
P(T = t) = (1 - \lambda(t))(1 - \lambda(t-1))\dots (1 - \lambda(1)) = P(Y_t = 0)P(Y_{t-1} = 0) \dots P(Y_1 = 0).
Because the likelihood function for T is equivalent to that of the product of T independent binary responses, discrete survival time can be modeled as a set of binary response variables using logistic regression or other models for independent binary responses to model the hazard function.
A related case is the continuation ratio or sequential regression model for ordinal response variables. There T represents an ordinal response (not necessarily time) and typically one models P(T > t|T \ge t) and assumes that a response will necessarily be in the last category if it is not in the previous category meaning that P(T = k|T \ge k) = 1 if k is the highest "value" of T. This is effectively equivalent to a discrete survival model for the probability that the event does not occur at time T given that it has not yet occurred, and all observations of Y = k are censored at Y = k - 1. This can be achieved by using the open = TRUE and reverse = TRUE (see the example below).
# setup for discrete survival model where the first five times are right-censored
d <- data.frame(time = rep(1:5, 2), x = rnorm(10), status = rep(0:1, each = 5))
dsurvbin(d, "time", "status")
# setup for a continuation ratio (sequential regression) model
d <- data.frame(time = rep(1:5, 2), x = rnorm(10))
dsurvbin(d, "time", open = TRUE, reverse = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.