dsurvbin | R Documentation |
This is a function takes an existing data frame with a discrete time variable and coverts the time variable into a set of binary response variables for modeling discrete survival time using the binary variables to model the discrete hazard function. It can also be use to code binary responses for a sequential (continuation ratio) regression model.
dsurvbin(
data,
y,
event,
unit.name = "unit",
time.name = "t",
resp.name = "y",
open = FALSE,
reverse = FALSE,
long = TRUE
)
data |
The data frame containing the time variable. |
y |
Name of the time variable in |
event |
Indicator variable for observed (i.e., not censored) events where |
unit.name |
Variable name for observational units. |
time.name |
Variable name prefix for the time point of each binary response. |
resp.name |
Variable name for the binary response variables. |
open |
Logical for whether the maximum observed time point ( |
reverse |
Reverse the binary indicator so that |
long |
Should the data be output in long-form (one binary response per row) (default is TRUE). |
Assuming survival time is integer-valued as T = 1,2,\dots,k
, the probability of a given response can be modeled as P(T = 1) = \lambda(1)
and
P(T = t) = \lambda(t)(1 - \lambda(j-1))(1 - \lambda(j-2))\dots(1 - \lambda(1))
for T > 1
, where \lambda(t) = P(T = t|T \ge t)
is the hazard function. If we define a set of binary response variables as Y_k = 1
if k = t
and Y_k = 0
if t > k
, then P(T = 1) = P(Y_1 = 1)
and
P(T = t) = P(Y_t = 1)P(Y_{t-1} = 0)P(Y_{t-2} = 0)\dots P(Y_1 = 0)
for T > 1
. If Y
is censored at T = t
, meaning that it is only known that T > t
, then
P(T = t) = (1 - \lambda(t))(1 - \lambda(t-1))\dots (1 - \lambda(1)) = P(Y_t = 0)P(Y_{t-1} = 0) \dots P(Y_1 = 0).
Because the likelihood function for T
is equivalent to that of the product of T
independent binary responses, discrete survival time can be modeled as a set of binary response variables using logistic regression or other models for independent binary responses to model the hazard function.
A related case is the continuation ratio or sequential regression model for ordinal response variables. There T
represents an ordinal response (not necessarily time) and typically one models P(T > t|T \ge t)
and assumes that a response will necessarily be in the last category if it is not in the previous category meaning that P(T = k|T \ge k) = 1
if k
is the highest "value" of T
. This is effectively equivalent to a discrete survival model for the probability that the event does not occur at time T
given that it has not yet occurred, and all observations of Y = k
are censored at Y = k - 1
. This can be achieved by using the open = TRUE
and reverse = TRUE
(see the example below).
# setup for discrete survival model where the first five times are right-censored
d <- data.frame(time = rep(1:5, 2), x = rnorm(10), status = rep(0:1, each = 5))
dsurvbin(d, "time", "status")
# setup for a continuation ratio (sequential regression) model
d <- data.frame(time = rep(1:5, 2), x = rnorm(10))
dsurvbin(d, "time", open = TRUE, reverse = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.