dataLongTimeDep: Data Long Time Dependent Covariates

View source: R/DiscSurvDataTransform.R

dataLongTimeDepR Documentation

Data Long Time Dependent Covariates

Description

Transforms short data format to long format for discrete survival modelling of single event analysis with right censoring. Covariates may vary over time.

Usage

dataLongTimeDep(
  dataSemiLong,
  timeColumn,
  eventColumn,
  idColumn,
  timeAsFactor = FALSE
)

Arguments

dataSemiLong

Original data in semi-long format (class "data.frame"). Descriptions of data formats are available in discSurv-package.

timeColumn

Character giving the column name of the observed times (class "character"). It is required that the observed times are discrete (class "integer").

eventColumn

Column name of the event indicator (class "character"). It is required that this is a binary variable with 1=="event" and 0=="censored".

idColumn

Name of column of identification number of persons (class "character").

timeAsFactor

Should the time intervals be coded as factor (class "logical")? Default is FALSE. In case of default settings the discrete time intervals are treated as quantitative (class "numeric").

Details

There may be some intervals, where no additional information on the covariates is observed (e. g. observed values in interval one and three but two is missing). In this case it is assumed, that the values from the last observation stay constant over time until a new measurement was done.

In contrast to continuous survival (see e. g. Surv) the start and stop time notation is not used here. In discrete time survival analysis the only relevant information is to use the stop time. Start time does not matter, because all discrete intervals need to be included in the long data set format to ensure consistent estimation. It is assumed that the supplied data set "dataSemiLong" contains all repeated measurements of each cluster in semi-long format (e. g. persons). For further information see example Start-stop notation.

Value

Original data in long format with three additional columns:

  • obj Index of persons as class "integer"

  • timeInt Index of time intervals (factor)

  • y Response in long format as binary vector. 1=="event happens in period timeInt" and zero otherwise

Note

Arguments to this function have to be specified in the required formats. Other objects are not supported. For example a common mistake is the usage of tibble data formats, that are not of class "data.frame".

Author(s)

Thomas Welchowski t.welchowski@psychologie.uzh.ch

References

\insertRef

fahrmeirDiscSurvdiscSurv

\insertRefputhTreeTimeVarydiscSurv

\insertRefthompsonTreatmentdiscSurv

See Also

contToDisc, dataLong, dataLongCompRisks

Examples


# Example Primary Biliary Cirrhosis data
library(survival)
dataSet1 <- pbcseq

# Only event death is of interest
dataSet1$status [dataSet1$status == 1] <- 0
dataSet1$status [dataSet1$status == 2] <- 1
table(dataSet1$status)

# Convert to months
dataSet1$day <- ceiling(dataSet1$day/30) + 1
names(dataSet1) [7] <- "month"

# Convert to long format for time varying effects
pbcseqLong <- dataLongTimeDep (dataSemiLong = dataSet1, timeColumn = "month", 
eventColumn = "status", idColumn = "id")
pbcseqLong [pbcseqLong$obj == 1, ]

#####################
# Start-stop notation

library(survival)
?survival::heart

# Assume that time was measured on a discrete scale.
# Discrete interval lengths are assumed to vary.
intervalLimits <- quantile(heart$stop, probs = seq(0.1, 1, by=0.1))
intervalLimits[length(intervalLimits)] <- intervalLimits[length(intervalLimits)] + 1
heart_disc <- contToDisc(dataShort = heart, timeColumn = "stop", 
intervalLimits = intervalLimits, equi = FALSE)
table(heart_disc$timeDisc)

# Conversion to long format
heart_disc_long <- dataLongTimeDep(dataSemiLong = heart_disc, timeColumn = "timeDisc", 
eventColumn = "event", idColumn = "id")
head(heart_disc_long)


discSurv documentation built on April 29, 2026, 9:07 a.m.