View source: R/DiscSurvDataTransform.R
dataLongMultiSpell | R Documentation |
Transform data from short format into long format for discrete multi spell survival analysis and right censoring.
dataLongMultiSpell( dataSemiLong, timeColumn, eventColumn, idColumn, timeAsFactor = FALSE, spellAsFactor = FALSE )
dataSemiLong |
Original data in semi-long format ("class data.frame"). |
timeColumn |
Character giving the column name of the observed times. It is required that the observed times are discrete ("character vector"). |
eventColumn |
Column name of the event status ("character vector"). The events can take multiple values on a discrete scale (0, 1, 2, ...) and repetition of events is allowed (integer vector or class factor). It is assumed that the number zero corresponds to censoring and all number > 0 represent the observed states between transitions. |
idColumn |
Name of column of identification number of persons as character("character vector"). |
timeAsFactor |
Should the time intervals be coded as factor ("logical vector")? Default is FALSE. In the default settings the discrete time intervals are treated as quantitative ("numeric vector"). |
spellAsFactor |
Should the spells be coded as factor ("logical vector")? Default is not to use factor. If the argument is false, the column is coded as numeric. |
If the data has continuous survival times, the response may be transformed
to discrete intervals using function contToDisc
. The discrete
time variable needs to be strictly increasing for each person, because
otherwise the order of the events is not distinguishable. Here is an example
data structure in short format prior augmentation with three possible
states: \ idColumn=1, 1, ... , 1, 2, 2, ... , n \ timeColumn= t_ID1_1 <
t_ID1_1 < ... < t_ID1_k, t_ID2_1 < t_ID2_2 < ... < t_ID2_k, ... \
eventColumn = 0, 1, ... , 2, 1, 0, ... , 0
The starting state of each individual is assumed to given with time interval equals zero. For example in an illness-death model with three states ("healthy", "illness", "death") if an individual was healthy at the beginning of the study this has to be encoded with discrete time interval set to zero and event state "healthy".
Original data.frame with three additional columns:
obj Index of persons as integer vector
timeInt Index of time intervals (factor or integer vector)
spell The spell gives the actual state of each individual within a given discrete interval.
e0 Response transition in long format as binary vector. Column e0 represents censoring. If e0 is coded one in the in the last observed time interval timeInt of a person, then this observation was censored.
e1 Response in long format as binary vector. The column e1 represents the transition to the first event state.
eX Response in long format as binary vector. The column eX represents the transition to the last event state out of the set of possible states "1, 2, 3, ..., X".
... Expanded columns of original data set.
Thomas Welchowski welchow@imbie.meb.uni-bonn.de
tutzModelDiscdiscSurv
\insertReffahrmeirDiscSurvdiscSurv
\insertRefthompsonTreatmentdiscSurv
contToDisc
, dataLongTimeDep
,
dataLongCompRisks
, dataLongCompRisks
################################ # Example with unemployment data data(unempMultiSpell) # Select subsample of first 500 persons unempSub <- unempMultiSpell[unempMultiSpell$id %in% 1:250,] # Expansion from semi-long to long format unempLong <- dataLongMultiSpell(dataSemiLong=unempSub, timeColumn = "year", eventColumn="spell", idColumn="id", spellAsFactor=TRUE, timeAsFactor=FALSE) head(unempLong, 25) # Fit discrete multi-state model regression model library(VGAM) model <- vgam(cbind(e0, e1, e2, e3, e4) ~ 0 + s(timeInt) + age:spell, data = unempLong, family = multinomial(refLevel="e0")) ############################ # Example with artificial data # Seed specification set.seed(-2578) # Construction of data set # Censoring and three possible states (0, 1, 2, 3) # Discrete time intervals (1, 2, ... , 10) # Noninfluential variable x ~ N(0, 1) datFrame <- data.frame( ID = c(rep(1, 6), rep(2, 4), rep(3, 3), rep(4, 2), rep(5, 4), rep(6, 5), rep(7, 7), rep(8, 8)), time = c(c(0, 2, 5, 6, 8, 10), c(0, 1, 6, 7), c(0, 9, 10), c(0, 6), c(0, 2, 3, 4), c(0, 3, 4, 7, 9), c(0, 2, 3, 5, 7, 8, 10), c(0, 1, 3, 4, 6, 7, 8, 9) ), state = c(c(2, 1, 3, 2, 1, 0), c(3, 1, 2, 2), c(2, 2, 1), c(1, 2), c(3, 2, 2, 0), c(1, 3, 2, 1, 3), c(1, 1, 2, 3, 2, 1, 3), c(3, 2, 3, 2, 1, 1, 2, 3) ), x = rnorm(n=6+4+3+2+4+5+7+8) ) # Transformation to long format datFrameLong <- dataLongMultiSpell(dataSemiLong=datFrame, timeColumn="time", eventColumn="state", idColumn="ID", spellAsFactor=TRUE) head(datFrameLong, 25) library(VGAM) cRm <- vglm(cbind(e0, e1, e2, e3) ~ 0 + timeInt + x:spell, data = datFrameLong, family = "multinomial") summary(cRm)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.