View source: R/DiscSurvDataTransform.R
dataLongSubDist | R Documentation |
Generates the augmented data matrix and the weights required for discrete subdistribution hazard modeling with right censoring.
dataLongSubDist( dataShort, timeColumn, eventColumns, eventColumnsAsFactor = FALSE, eventFocus, timeAsFactor = FALSE, aggTimeFormat = FALSE, lastTheoInt = NULL )
dataShort |
Original data in short format ("class data.frame"). |
timeColumn |
Character specifying the column name of the observed event times ("logical vector"). It is required that the observed times are discrete ("integer vector"). |
eventColumns |
Character vector specifying the column names of the event indicators (excluding censoring events) ("logical vector"). It is required that a 0-1 coding is used for all events. The algorithm treats row sums of zero of all event columns as censored. |
eventColumnsAsFactor |
Should the argument eventColumns be interpreted as column name of a factor variable ("logical vector")? Default is FALSE. |
eventFocus |
Column name of the event of interest (type 1 event) ("character vector"). |
timeAsFactor |
Logical indicating whether time should be coded as a factor in the augmented data matrix("logical vector"). If FALSE, a numeric coding will be used. |
aggTimeFormat |
Instead of the usual long format, should every observation have all time intervals? ("logical vector") Default is standard long format. In the case of nonlinear risk score models, the time effect has to be integrated out before these can be applied to the C-index. |
lastTheoInt |
Gives the number of the last theoretic interval ("integer vector"). Only used, if aggTimeFormat==TRUE. |
This function sets up the augmented data matrix and the weights that are needed for weighted maximum likelihood (ML) estimation of the discrete subdistribution model proposed by Berger et al. (2018). The model is a discrete-time extension of the original subdistribution model proposed by Fine and Gray (1999).
Data frame with additional column "subDistWeights". The latter column contains the weights that are needed for fitting a weighted binary regression model, as described in Berger et al. (2018). The weights are calculated by a life table estimator for the censoring event.
Thomas Welchowski welchow@imbie.meb.uni-bonn.de
bergerSubdistdiscSurv
\insertReffinePropHazdiscSurv
dataLong
################################ # Example with unemployment data library(Ecdat) data(UnempDur) # Generate subsample, reduce number of intervals to k = 5 SubUnempDur <- UnempDur [1:500, ] SubUnempDur$time <- as.numeric(cut(SubUnempDur$spell, c(0,4,8,16,28))) # Convert competing risks data to long format # The event of interest is re-employment at full job SubUnempDurLong <- dataLongSubDist (dataShort=SubUnempDur, timeColumn = "time", eventColumns=c("censor1", "censor2", "censor3"), eventFocus="censor1") head(SubUnempDurLong) # Fit discrete subdistribution hazard model with logistic link function logisticSubDistr <- glm(y ~ timeInt + ui + age + logwage, family=binomial(), data = SubUnempDurLong, weights = SubUnempDurLong$subDistWeights) summary(logisticSubDistr) ######################################## # Simulation # Discrete subdistribution hazards model # Simulate covariates as multivariate normal distribution library(mvnfast) set.seed(1980) X <- mvnfast::rmvn(n = 1000, mu = rep(0, 4), sigma = diag(4)) # Specification of two discrete cause specific hazards with four intervals # Event 1 theoInterval <- 4 betaCoef_event1 <- seq(-1, 1, length.out = 5)[-3] timeInt_event1 <- seq(0.1, -0.1, length.out = theoInterval-1) linPred_event1 <- c(X %*% betaCoef_event1) # Event 2 betaCoef_event2 <- seq(-0.5, 0.5, length.out = 5)[-3] timeInt_event2 <- seq(-0.1, 0.1, length.out = theoInterval-1) linPred_event2 <- c(X %*% betaCoef_event2) # Discrete cause specific hazards in last theoretical interval theoHaz_event1 <- 0.5 theoHaz_event2 <- 0.5 # Derive discrete all cause hazard haz_event1_X <- cbind(sapply(1:length(timeInt_event1), function(x) exp(linPred_event1 + timeInt_event1[x]) / (1 + exp(linPred_event1 + timeInt_event1[x]) + exp(linPred_event2 + timeInt_event2[x])) ), theoHaz_event1) haz_event2_X <- cbind(sapply(1:length(timeInt_event2), function(x) exp(linPred_event2 + timeInt_event2[x]) / (1 + exp(linPred_event1 + timeInt_event1[x]) + exp(linPred_event2 + timeInt_event2[x]) ) ), theoHaz_event2) allCauseHaz_X <- haz_event1_X + haz_event2_X # Derive discrete cumulative incidence function of event 1 given covariates p_T_event1_X <- haz_event1_X * cbind(1, (1-allCauseHaz_X)[, -dim(allCauseHaz_X)[2]]) cumInc_event1_X <- t(sapply(1:dim(p_T_event1_X)[1], function(x) cumsum(p_T_event1_X[x, ]))) # Calculate all cause probability P(T=t | X) pT_X <- t(sapply(1:dim(allCauseHaz_X)[1], function(i) estMargProb(allCauseHaz_X[i, ]) )) # Calculate event probability given time interval P(R=r | T=t, X) pR_T_X_event1 <- haz_event1_X / (haz_event1_X + haz_event2_X) # Simulate discrete survival times survT <- sapply(1:dim(pT_X)[1], function(i) sample(x = 1:(length(timeInt_event1)+1), size = 1, prob = pT_X[i, ]) ) censT <- sample(x = 1:(length(timeInt_event1)+1), size = dim(pT_X)[1], prob = rep(1/(length(timeInt_event1) + 1), (length(timeInt_event1) + 1)), replace = TRUE) # Calculate observed times obsT <- ifelse(survT <= censT, survT, censT) obsEvent <- rep(0, length(obsT)) obsEvent <- sapply(1:length(obsT), function(i) if(survT[i] <= censT[i]){ return(sample(x = c(1, 2), size = 1, prob = c(pR_T_X_event1[i, obsT[i] ], 1 - pR_T_X_event1[i, obsT[i] ]) )) } else{ return(0) } ) # Recode last interval to censored lastInterval <- obsT == theoInterval obsT[lastInterval] <- theoInterval-1 obsEvent[lastInterval] <- 0 obsT <- factor(obsT) obsEvent <- factor(obsEvent) # Data preparation datShort <- data.frame(event = factor(obsEvent), time=obsT, X) # Conversion to long data format datLongSub <- dataLongSubDist(dataShort = datShort, timeColumn = "time", eventColumns = "event", eventFocus = 1, eventColumnsAsFactor = TRUE) # Estimate discrete subdistribution hazard model estSubModel <- glm(formula = y ~ timeInt + X1 + X2 + X3 + X4, data = datLongSub, family = binomial(link = "logit"), weights = datLongSub$subDistWeights) # Predict cumulative incidence function of first event predSubHaz1 <- predict(estSubModel, newdata = datLongSub[datLongSub$obj == 2, ], type = "response") mean(((1 - estSurv(predSubHaz1)) - cumInc_event1_X[2, 1:3])^2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.