dataLongCompRisksTimeDep: Data Long Time Dependent Covariates Transformation For...

dataLongCompRisksTimeDepR Documentation

Data Long Time Dependent Covariates Transformation For Competing Risks

Description

Transforms short data format to long format for discrete survival modelling in the case of competing risks with right censoring. Covariates may vary over time.

Usage

dataLongCompRisksTimeDep(
  dataSemiLong,
  timeColumn,
  eventColumns,
  eventColumnsAsFactor = FALSE,
  idColumn,
  timeAsFactor = FALSE,
  responseAsFactor = FALSE
)

Arguments

dataSemiLong

Original data in semi-long format (class "data.frame"). Descriptions of data formats are available in discSurv-package.

timeColumn

Character giving the column name of the observed times (class "logical"). It is required that the observed times are discrete (class "integer").

eventColumns

Character vector giving the column names of the event indicators without censoring column (class "character"). It is required that all events are binary encoded. If the sum of all event indicators is zero, then this is interpreted as a censored observation. Alternatively a column name of a factor representing competing events can be given. In this case the argument eventColumnsAsFactor has to be set TRUE and the first level is assumed to represent censoring.

eventColumnsAsFactor

Should the argument eventColumns be intepreted as column name of a factor variable (class "logical")? Default is FALSE.

idColumn

Name of column of identification number of persons as character (class "character").

timeAsFactor

Should the time intervals be coded as factor (class "logical")? Default is FALSE. In the default settings the discrete time intervals are treated as quantitative (class "numeric").

responseAsFactor

Should the response columns be given as factor (class "logical")? Default is FALSE.

Details

There may be some intervals, where no additional information on the covariates is observed (e. g. observed values in interval one and three but two is missing). In this case it is assumed, that the values from the last observation stay constant over time until a new measurement was done.

In contrast to continuous survival (see e. g. Surv) the start and stop time notation is not used here. In discrete time survival analysis the only relevant information is to use the stop time. Start time does not matter, because all discrete intervals need to be included in the long data set format to ensure consistent estimation. It is assumed that the supplied data set dataSemiLong contains all repeated measurements of each cluster in semi-long format (e. g. persons). For further information see example Start-stop notation.

Value

Original data set in long format with additional columns

  • obj Gives identification number of objects (row index in short format) (integer)

  • timeInt Gives number of discrete time intervals (factor)

  • responses Columns with dimension count of events + 1 (censoring)

    • e0 No event (observation censored in specific interval)

    • e1 Indicator of first event, 1 if event takes place and 0 otherwise

    • ... ...

    • ek Indicator of last k-th event, 1 if event takes place and 0 otherwise

    If argument responseAsFactor=TRUE, then responses will be coded as factor in one column.

Note

Arguments to this function have to be specified in the required formats. Other objects are not supported. For example a common mistake is the usage of tibble data formats, that are not of class "data.frame".

Author(s)

Thomas Welchowski t.welchowski@psychologie.uzh.ch

References

\insertRef

fahrmeirDiscSurvdiscSurv

\insertRefthompsonTreatmentdiscSurv

See Also

contToDisc, dataLong, dataLongCompRisks

Examples


# Example Primary Biliary Cirrhosis data
library(survival)
pbcseq_example <- pbcseq

# Convert to months
pbcseq_example$day <- ceiling(pbcseq_example$day/30) + 1
names(pbcseq_example)[7] <- "month"
pbcseq_example$status <- factor(pbcseq_example$status)

# Convert to long format for time varying effects
pbcseq_exampleLong <- dataLongCompRisksTimeDep(dataSemiLong = pbcseq_example, timeColumn = "month", 
eventColumns = "status", eventColumnsAsFactor = TRUE, idColumn = "id", 
timeAsFactor = TRUE)
head(pbcseq_exampleLong)

#####################
# Start-stop notation

library(survival)
?pbcseq

# Choose subset of patients
subsetID <- unique(pbcseq$id)[1:100]
pbcseq_mod <- pbcseq[pbcseq$id %in% subsetID, ]

# Convert to start stop notation
pbcseq_mod_split <- split(pbcseq_mod, pbcseq_mod$id)
pbcseq_mod_split <- lapply(1:length(pbcseq_mod_split), function(x) {

 cbind(pbcseq_mod_split[[x]], 
 start_time=c(0, pbcseq_mod_split[[x]][ - dim(pbcseq_mod_split[[x]])[1], "day"]),
 stop_time=pbcseq_mod_split[[x]][, "day"])
 
})
pbcseq_mod <- do.call(rbind, pbcseq_mod_split)

# Convert stop time to months
intervalDef <- c(quantile(pbcseq_mod$stop_time, probs = seq(0.1, 0.9, by=0.1)), Inf)
names(pbcseq_mod)
pbcseq_mod <- contToDisc(dataShort = pbcseq_mod, timeColumn = "stop_time", 
                         intervalLimits = intervalDef, equi = FALSE)
pbcseq_mod$status <- factor(pbcseq_mod$status)

# Conversion to data long format
pbcseq_mod_long <- dataLongCompRisksTimeDep(dataSemiLong = pbcseq_mod, timeColumn = "timeDisc", 
                                           eventColumns = "status",
                                          idColumn = "id", 
                                           eventColumnsAsFactor = TRUE, 
                                          responseAsFactor = TRUE,
                                          timeAsFactor = TRUE)
head(pbcseq_mod_long)


discSurv documentation built on April 29, 2026, 9:07 a.m.